Code Monkey home page Code Monkey logo

eurlex's Issues

return title via SPARQL query

it should be possible to return document titles via SPARQL queries, but need to move from WORK to EXPRESSION (language)

suggestions for improving elx_dowload_xml and make query

Hi Michal,

Thanks for releasing v0.4.0, I updated R and eurlex and i am using it.
I recently used elx_dowload_xml and I wanted to suggest some improvements:

  1. line 28 should likely be : notice type must be correctly specified" = notice %in% c("tree", "branch", "object")) (this is more of an issue)
  2. file = basename(url) could be file = paste(basename(url), ".xml)"
  3. With the current settings when object is passed to notice the object expression notice is retrieved (p 44 of cellar), however this does not contain metadata. I'd suggest to drop the language header and use ?language= a the end of the url when object is passed (p 42 of cellar), so that the object notice with the object metadata is retrieved.
  4. elx_dowload_xml could encapsulate a function that returns the xml notice as a string. So a user could decide wether to directly dowload the xml notice, or to get the xml notice as a string an parse it to get other fields and complement the make_query and run_query functions.
  5. About elx_make_query, you remember that there was the issue of the 10e6 limit? A workaraound/improvement could be to group together multiple items of the same property of a work. e.g. if i pass include_authors = TRUE, it could help to use (group_concat(distinct ?author_;separator=", ") as ?author) in the select statement and OPTIONAL{?work cdm:work_created_by_agent ?author_.} in the where statement of the sparql query. The uri would still be inside, but i see this less of an issue to clean it afterwards. This would help in not having duplicated works when running queries.

What do you think about theese?

All the best

SPARQL query by directory code (CC)

Like the EUR-Lex expert search, is it possible to add a directory code (CC) argument to the elx_make_query() function?

This would be incredibly useful for finding all legal acts in a larger policy area. For example, in tracking a country's EU defence policy, you would need to find all acts relating to Common Foreign and Security Policy (CC = 18).

On the expert search function of the EUR-Lex website, you are able to find EU legal acts by directory code, which is very useful for finding acts within larger areas, e.g. Common Foreign and Security Policy (CC = 18). I have attached a screenshot of this below.

EUR-Lex Expert Search: https://eur-lex.europa.eu/expert-search-form.html

EUR-Lex Expert Search

Query result are limited to 10e6

Dear Michal,

Thank you for developing such a useful package, writing useful and clear documentation, and also congratulation for the very interesting article published in Political Research Exchange.

I tried you package and a noticed that when I run a large query the results are limited to 10e6 rows. Is there a way to resolve this limit?

A reproducible example is provided here:
"
library(eurlex)
library(dplyr)
library(ggplot2)
legal <- elx_make_query(resource_type = "any", sector = 3,
include_celex = TRUE, include_force = TRUE,
include_date = TRUE, include_date_force = TRUE,
include_date_endvalid = TRUE, include_eurovoc = TRUE,
include_directory = TRUE, include_citations = TRUE) %>%
elx_run_query()
preparatory <- elx_make_query(resource_type = "any", sector = 5,
include_celex = TRUE, include_date = TRUE,
include_eurovoc = TRUE, include_directory = TRUE,
include_citations = TRUE) %>%
elx_run_query()
dat <- as_tibble(data.frame(X=rep(0, 16000000),y=rep(0, 16000000),z=rep(0, 16000000)))
"

I provide you also with the sessionInfo output
"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C LC_TIME=Italian_Italy.1252

  attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     
  
  other attached packages:
  [1] ggplot2_3.3.3        dplyr_1.0.5          eurlex_0.3.5         RevoUtils_11.0.2     RevoUtilsMath_11.0.0
  
  loaded via a namespace (and not attached):
   [1] rstudioapi_0.11  xml2_1.3.2       magrittr_1.5     tidyselect_1.1.0 munsell_0.5.0    colorspace_2.0-0
   [7] R6_2.4.1         rlang_0.4.10     httr_1.4.1       tools_4.0.2      grid_4.0.2       gtable_0.3.0    
  [13] withr_2.2.0      ellipsis_0.3.1   digest_0.6.25    tibble_3.0.2     lifecycle_1.0.0  crayon_1.3.4    
  [19] farver_2.1.0     tidyr_1.1.3      purrr_0.3.4      vctrs_0.3.7      curl_4.3         glue_1.4.1      
  [25] compiler_4.0.2   pillar_1.4.6     generics_0.1.0   scales_1.1.1     pkgconfig_2.0.3 

"

Another very useful feature would the possibility to define a start date and an end date for the query.

Thank you once again.

Best regards.

"summary" support for elx_fetch_data

First of all, thanks for this great project.

I wonder if there is any way of getting summary of regulation documents using elx_fetch_data method?

event data

in latest iterations of Eur-Lex there seems to be an increasing focus on event data. It would be useful to be able to retrieve these, but likely to require a completely new function and type of SPARQL queries

Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: 32001E0555

I am suddenly getting a weird problem after using the eurlex package for a few months. When I run the elx_fetch_data() function, I get the following error:

Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: 32001E0555

Reprex:

# Load package
library(eurlex)

# Run function
elx_fetch_data("32001E0555", "title")

Thanks for all your work. Amazing package.

On another note, is it possible to gather the CELEX IDs for all acts in a given directory coder, e.g. CC = 18 which is Common Foreign and Security Policy, via REST instead of SPARQL?
Directory code CC = 18 is Common Foreign and Security Policy.

Proposal for enchancements of elx_fetch_data

Dear Michal,

I use eurlex package on a regular basis to extract EU policy documents
with the purspose of mapping of terms related to UN 2030 SDGs in theese EU policy documents.
I use elx_fetch_data to batch dowload the raw text of the documents.

I would like to propose tow enhancements for this function:

  1. Rather than return just the 'out' for the type of resource requested, the function could return the 'out' and the code of the HTTP request for a resource type. The function returns a named list where the first element is 'out' and the second the HTTP code.
    This would allow to easily check if a resource was not retrieved, which is useful when dealing with a large number of documents.

  2. Insert the document XML notice among the resource type option.
    This could be an useful and efficient way to get a plethora of information for each document. The xml could be then parsed locally to extract data of interest like directory codes, the subject matter, the instruments cited, related documents etc.
    In many cases it might be easier and faster to work with the XML notice than to develop and run a (complex) SPARQL query.
    For an easier implementation, this option could ignore the language paramenters, so that one would get the document xml notice the same way is obtained from eurlex.

What do you think about theese enhancements, would it be difficult to implement them?

Once again, many thanks for developing and releasing such a useful and easy to use package.

Many thanks and have a nice day.

Best

alternative identifiers

provide options for alternative identifiers, in particular Official Journal number. Many documents are not CELEX indexed (especially preparatory, eg COM proposals, sector 5 more generally)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.