Code Monkey home page Code Monkey logo

pangaear's Introduction

pangaear

cran checks R-check codecov rstudio mirror downloads cran version

pangaear is a data retrieval interface for the World Data Center PANGAEA (https://www.pangaea.de/). PANGAEA archieves published Earth & Environmental Science data under the following subjects: agriculture, atmosphere, biological classification, biosphere, chemistry, cryosphere, ecology, fisheries, geophysics, human dimensions, lakes & rives, land surface, lithosphere, oceans, and paleontology.

This package offers tools to interact with the PANGAEA Database, including functions for searching for data, fetching datasets by dataset ID, and working with the PANGAEA OAI-PMH service.

Info

Package API

  • pg_data
  • pg_list_metadata_formats
  • pg_identify
  • pg_list_records
  • pg_list_sets
  • pg_list_identifiers
  • pg_search
  • pg_get_record
  • pg_cache
  • pg_search_es
  • pg_cache_list
  • pg_cache_clear

Installation

Stable version

install.packages("pangaear")

Dev version

remotes::install_github('ropensci/pangaear')
library('pangaear')

Search for data

This is a thin wrapper around the GUI search interface on the page https://www.pangaea.de/. Everything you can do there, you can do here.

pg_search(query = 'water', bbox = c(-124.2, 41.8, -116.8, 46.1), count = 3)
#> # A tibble: 3 × 6
#>   score doi                     size size_measure citation               suppl…¹
#>   <dbl> <chr>                  <dbl> <chr>        <chr>                  <chr>  
#> 1 13.3  10.1594/PANGAEA.812094     2 datasets     Simonyan, AV; Dultz, … Simony…
#> 2 12.8  10.1594/PANGAEA.774629     4 datasets     Krylova, EM; Sahling,… Krylov…
#> 3  9.05 10.1594/PANGAEA.406110   598 data points  WOCE Surface Velocity… <NA>   
#> # … with abbreviated variable name ¹​supplement_to

Get data

res <- pg_data(doi = '10.1594/PANGAEA.807580')
res[[1]]
#> <Pangaea data> 10.1594/PANGAEA.807580
#>   parent doi: 10.1594/PANGAEA.807580
#>   url:        https://doi.org/10.1594/PANGAEA.807580
#>   citation:   Schiebel, Ralf; Waniek, Joanna J; Bork, Matthias; Hemleben, Christoph (2001): Physical oceanography during METEOR cruise M36/6. PANGAEA, https://doi.org/10.1594/PANGAEA.807580, In supplement to: Schiebel, R et al. (2001): Planktic foraminiferal production stimulated by chlorophyll redistribution and entrainment of nutrients. Deep Sea Research Part I: Oceanographic Research Papers, 48(3), 721-740, https://doi.org/10.1016/S0967-0637(00)00065-0
#>   path:       /Users/sckott/Library/Caches/R/pangaear/10_1594_PANGAEA_807580.txt
#>   data:
#> # A tibble: 32,179 × 13
#>    Event   Date/…¹ Latit…² Longi…³ Eleva…⁴ Depth…⁵ Press…⁶ Temp …⁷   Sal Tpot …⁸
#>    <chr>   <chr>     <dbl>   <dbl>   <int>   <dbl>   <int>   <dbl> <dbl>   <dbl>
#>  1 M36/6-… 1996-1…    49.0   -16.5   -4802    0          0    15.7  35.7    15.7
#>  2 M36/6-… 1996-1…    49.0   -16.5   -4802    0.99       1    15.7  35.7    15.7
#>  3 M36/6-… 1996-1…    49.0   -16.5   -4802    1.98       2    15.7  35.7    15.7
#>  4 M36/6-… 1996-1…    49.0   -16.5   -4802    2.97       3    15.7  35.7    15.7
#>  5 M36/6-… 1996-1…    49.0   -16.5   -4802    3.96       4    15.7  35.7    15.7
#>  6 M36/6-… 1996-1…    49.0   -16.5   -4802    4.96       5    15.7  35.7    15.7
#>  7 M36/6-… 1996-1…    49.0   -16.5   -4802    5.95       6    15.7  35.7    15.7
#>  8 M36/6-… 1996-1…    49.0   -16.5   -4802    6.94       7    15.7  35.7    15.7
#>  9 M36/6-… 1996-1…    49.0   -16.5   -4802    7.93       8    15.7  35.7    15.7
#> 10 M36/6-… 1996-1…    49.0   -16.5   -4802    8.92       9    15.7  35.7    15.7
#> # … with 32,169 more rows, 3 more variables: `Sigma-theta [kg/m**3]` <dbl>,
#> #   `Sigma in situ [kg/m**3]` <dbl>, `Cond [mS/cm]` <dbl>, and abbreviated
#> #   variable names ¹​`Date/Time`, ²​Latitude, ³​Longitude, ⁴​`Elevation [m]`,
#> #   ⁵​`Depth water [m]`, ⁶​`Press [dbar]`, ⁷​`Temp [°C]`, ⁸​`Tpot [°C]`

Search for data then pass DOI to data function.

res <- pg_search(query = 'water', bbox = c(-124.2, 41.8, -116.8, 46.1), count = 3)
pg_data(res$doi[3])[1:3]
#> [[1]]
#> <Pangaea data> 10.1594/PANGAEA.406110
#>   parent doi: 10.1594/PANGAEA.406110
#>   url:        https://doi.org/10.1594/PANGAEA.406110
#>   citation:   WOCE Surface Velocity Program, SVP (2006): Water temperature and current velocity from surface drifter SVP_9616641. PANGAEA, https://doi.org/10.1594/PANGAEA.406110
#>   path:       /Users/sckott/Library/Caches/R/pangaear/10_1594_PANGAEA_406110.txt
#>   data:
#> # A tibble: 101 × 10
#>    Date/…¹ Latit…² Longi…³ Depth…⁴ Temp …⁵ Cur v…⁶ Cur v…⁷ Latit…⁸ Longi…⁹  Code
#>    <chr>     <dbl>   <dbl>   <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <int>
#>  1 1996-1…    41.6   -125.       0    12.7   NA      NA     0       0          1
#>  2 1996-1…    41.6   -125.       0    12.5   11.3     0.44  0.0001  0.0001     1
#>  3 1996-1…    41.6   -124.       0    12.4    2.91   13.4   0.0002  0.0002     1
#>  4 1996-1…    41.7   -124.       0    12.3    3.64   17.2   0.0005  0.0004     1
#>  5 1996-1…    41.7   -124.       0    11.9   23.4    11.8   0.0001  0.0001     1
#>  6 1996-1…    41.7   -124.       0    11.4   21.4    15.4   0.0002  0.0002     1
#>  7 1996-1…    41.8   -124.       0    11.1    0.21   24.7   0.0005  0.0004     1
#>  8 1996-1…    41.8   -124.       0    11.2   -0.86   20.5   0.0002  0.0002     1
#>  9 1996-1…    41.8   -124.       0    11.1    1.51    9.12  0.0001  0.0001     1
#> 10 1996-1…    41.9   -124.       0    11.0   -5.58   -1.96  0.0001  0.0001     1
#> # … with 91 more rows, and abbreviated variable names ¹​`Date/Time`, ²​Latitude,
#> #   ³​Longitude, ⁴​`Depth water [m]`, ⁵​`Temp [°C]`, ⁶​`Cur vel U [cm/s]`,
#> #   ⁷​`Cur vel V [cm/s]`, ⁸​`Latitude e`, ⁹​`Longitude e`
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL

OAI-PMH metadata

# Identify the service
pg_identify()

# List metadata formats
pg_list_metadata_formats()

# List identifiers
pg_list_identifiers(from = Sys.Date() - 2, until = Sys.Date())

# List sets
pg_list_sets()

# List records
pg_list_records(from = Sys.Date() - 1, until = Sys.Date())

# Get a record
pg_get_record(identifier = "oai:pangaea.de:doi:10.1594/PANGAEA.788382")

Contributors (reverse alphabetical)

  • Naupaka Zimmerman
  • Kara Woo
  • Gavin Simpson
  • Andrew MacDonald
  • Scott Chamberlain

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for pangaear in R doing citation(package = 'pangaear')
  • Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

pangaear's People

Contributors

aammd avatar gavinsimpson avatar karawoo avatar katieroserice avatar katrinleinweber avatar naupaka avatar richardjtelford avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pangaear's Issues

set download directory

Hi I was wondering if there was a way to set the download directory without reseting the environmental variables?

-Kathe

Session Info
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] curl_3.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     rstudioapi_0.7   xml2_1.2.0       bindr_0.1.1      magrittr_1.5     rappdirs_0.3.1  
 [7] hms_0.4.2        tidyselect_0.2.4 R6_2.2.2         rlang_0.2.2      plyr_1.8.4       httr_1.3.1      
[13] stringr_1.3.1    SoilDataR_0.0.1  pangaear_0.6.0   dplyr_0.7.6      tools_3.5.0      utf8_1.1.3      
[19] cli_1.0.0        yaml_2.1.19      readxl_1.1.0     assertthat_0.2.0 httpcode_0.2.0   tibble_1.4.2    
[25] crayon_1.3.4     bindrcpp_0.2.2   purrr_0.2.5      readr_1.1.1      tidyr_0.8.1      triebeard_0.3.0 
[31] crul_0.6.0       glue_1.3.0       oai_0.2.2        stringi_1.2.4    compiler_3.5.0   pillar_1.2.2    
[37] cellranger_1.1.0 urltools_1.7.1   lubridate_1.7.4  pkgconfig_2.0.2 

consistent event handling

Hi

I would like to compare different datasets from pangaea in an automatic way using such an input list:

pangdois <- list()
if (T) {
    pangdois <- c(pangdois,
                  list("lorius_etal_1985"=
                       list(pdoi="10.1594/PANGAEA.860950",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                    dims=list("kyr_before_1950"="Age [ka BP]"))))))

if (T) {
    pangdois <- c(pangdois, 
                  list("masson-delmotte_etal_2011"=
                       list(pdoi="10.1594/PANGAEA.785228",
                            vars=list("d18op"=list(inputname="δ18O H2O [‰ SMOW]", 
                                                   dims=list("kyr_before_1950"="Age [ka BP]"))))))
}

if (length(pangdois) > 0) {
    for (pangi in seq_along(pangdois)) {
        if (pangi == 1) library(pangaear)
        message("run `pangaear::pg_data(", pangdois[[pangi]]$pdoi, ")` ...")
        tmp <- pangaear::pg_data(pangdois[[pangi]]$pdoi)
        for (eventi in seq_along(tmp)) { # search wanted variables in every event of current doi
            event <- NA # default
            # <non-consistent event-handling; see below>
            for (vi in seq_along(pangdois[[pangi]]$vars)) { # check if any wanted variable exists in current event of current doi
                if (any(names(tmp[[eventi]]$data) == pangdois[[pangi]]$vars[[vi]]$inputname)) {
                    # do further stuff
                } # if current variable exists in current event of current doi
            } # for vi in wanted vars
        } # for eventi in seq_along(tmp)
    } # for pangi in pangdois
} # if length(pangdois) > 0

However, I realized that the usage of the event handler is not consistent. So far I figured out 3 different cases:

# case 1:
$ parent_doi: chr "10.1594/PANGAEA.785228"
$ metadata  :List of 7
..$ events    :List of 7
 .. ..$ Dome_Fuji (DF): chr NA
# --> if `metadata$events` is a list, use first entry that is NA to identify the data?

 # case 2:
$ parent_doi: chr "10.1594/PANGAEA.860950"
$ metadata  :List of 7
 ..$ events    : chr "Vostok * LATITUDE: -78.464420 * LONGITUDE: 106.837320 * DATE/TIME: 1980-01-01T00:00:00 * ELEVATION: 3488.0 m * Recovery: 2755 m * LOCATION: Antarctica * CAMPAIGN: Ice_core_diverse * BASIS: Sampling/drilling ice * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: annual pressure 624 mbar; mean annual temperature -55.5°C; snow accumulation between 2.2 and; 22.5 g/cm**2/yr, about 250 ka"
# --> if `metadata$events` is not a list and `data$Event` is null, find a way to reduce the long event string to identify the data?

# case 3:
..$ parent_doi: chr "10.1594/PANGAEA.863978"
..$ metadata  :List of 9
 ..$ events    : chr "177-1089A * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-19T16:15:00 * DATE/TIME END: 1997-12-21T13:15:00 * ELEVATION: -4619.3 m * Penetration: 216.3 m * Recovery: 149.64 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 23 cores; 216.3 m cored; 0 m drilled; 69.2 % recovery; 177-1089B * LATITUDE: -40.936400 * LONGITUDE: 9.894100 * DATE/TIME START: 1997-12-22T13:16:00 * DATE/TIME END: 1997-12-22T22:45:00 * ELEVATION: -4623.8 m * Penetration: 264.9 m * Recovery: 246.62 m * LOCATION: South Atlantic Ocean * CAMPAIGN: Leg177 (URI: https://doi.org/10.2973/odp.proc.ir.177.1999) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 29 cores; 264.9 m cored; 0 m drilled; 93.1 % recovery; 306-U1313B * LATITUDE: 41.000023 * LONGITUDE: -32.957300 * ELEVATION: -3413.5 m * Recovery: 306.54 m * CAMPAIGN: Exp306 (North Atlantic Climate 2) (URI: https://doi.org/10.2204/iodp.proc.303306.2006) * BASIS: Joides Resolution (URI: http://www-odp.tamu.edu/resolutn.html) * METHOD/DEVICE: Drilling/drill rig (DRILL) * COMMENT: 32 cores; 300.4 m cored; 102 % recovered; 2 m drilled; 302.4 m penetrated; GeoB1515-1 * LATITUDE: 4.238333 * LONGITUDE: -43.666667 * DATE/TIME: 1991-05-15T00:00:00 * ELEVATION: -3129.0 m * Recovery: 6.58 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); GeoB1523-1 * LATITUDE: 3.831667 * LONGITUDE: -41.621667 * DATE/TIME: 1991-05-17T00:00:00 * ELEVATION: -3292.0 m * Recovery: 6.65 m * LOCATION: Amazon Fan * CAMPAIGN: M16/2 (URI: https://doi.org/10.2312/cr_m16) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL); KNR140-12JPC (KNR140-2-12JPC) * LATITUDE: 29.080000 * LONGITUDE: -72.900000 * ELEVATION: -4250.0 m * LOCATION: North Atlantic Ocean * CAMPAIGN: KNR140 * BASIS: Knorr * METHOD/DEVICE: Piston corer (PC); M35003-4 * LATITUDE: 12.090000 * LONGITUDE: -61.243333 * DATE/TIME: 1996-04-19T00:00:00 * ELEVATION: -1299.0 m * Recovery: 9.63 m * CAMPAIGN: M35/1 (URI: https://doi.org/10.2312/cr_m35) * BASIS: Meteor (1986) (URI: https://de.wikipedia.org/wiki/Meteor_(Schiff,_1986)) * METHOD/DEVICE: Gravity corer (Kiel type) (SL)"          
 $ data      : tibble [138 × 27] (S3: tbl_df/tbl/data.frame)
  ..$ Event                               : chr [1:138] "177-1089A" "177-1089A" "177-1089A" "177-1089A" ...
# --> if `metadata$events` is not a list and `data$Event` is not null, use maybe `unique(data$Event)` to identify the data?

Probably I do not understand the correct usage of the event handler. Is there a better way to identify each individual data set per DOI in an automatic way?

Thanks a lot for any help,
Chris

Session Info
devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.3 (2020-10-10)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       X11                         
 language en_US #de_DE                
 collate  C                           
 ctype    en_US.UTF-8                 
 tz       Europe/Berlin               
 date     2020-10-22Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source                            
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)                    
 backports     1.1.10  2020-09-15 [1] CRAN (R 4.0.2)                    
 bookdown    * 0.21    2020-10-13 [1] CRAN (R 4.0.3)                    
 callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.3)                    
 cli           2.1.0   2020-10-12 [1] CRAN (R 4.0.3)                    
 colorout    * 1.2-2   2020-04-27 [1] Github (jalvesaq/colorout@726d681)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)                    
 crul          1.0.0   2020-07-30 [1] CRAN (R 4.0.2)                    
 curl          4.3     2019-12-02 [1] CRAN (R 4.0.0)                    
 desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)                    
 devtools    * 2.3.2   2020-09-18 [1] CRAN (R 4.0.2)                    
 digest        0.6.26  2020-10-17 [1] CRAN (R 4.0.3)                    
 dotCall64   * 1.0-0   2018-07-30 [1] CRAN (R 4.0.0)                    
 dplyr         1.0.2   2020-08-18 [1] CRAN (R 4.0.2)                    
 dtupdate    * 1.5     2020-04-27 [1] Github (hrbrmstr/dtupdate@58056ea)
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)                    
 extrafont   * 0.17    2014-12-08 [1] CRAN (R 4.0.0)                    
 extrafontdb   1.0     2012-06-11 [1] CRAN (R 4.0.0)                    
 fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)                    
 fields      * 11.6    2020-10-09 [1] CRAN (R 4.0.3)                    
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                    
 generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)                    
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                    
 gsw         * 1.0-5   2017-08-09 [1] CRAN (R 4.0.0)                    
 hoardr        0.5.2   2018-12-02 [1] CRAN (R 4.0.0)                    
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.0.0)                    
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                    
 knitr         1.30    2020-09-22 [1] CRAN (R 4.0.2)                    
 lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)                    
 magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)                    
 maps          3.3.0   2018-04-03 [1] CRAN (R 4.0.0)                    
 memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)                    
 ncdf4       * 1.17    2019-10-23 [1] CRAN (R 4.0.0)                    
 oai           0.3.0   2019-09-07 [1] CRAN (R 4.0.0)                    
 oce         * 1.2-0   2020-02-21 [1] CRAN (R 4.0.0)                    
 pangaear    * 1.0.0   2020-01-22 [1] CRAN (R 4.0.0)                    
 pbapply       1.4-3   2020-08-18 [1] CRAN (R 4.0.2)                    
 pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.2)                    
 pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)                    
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)                    
 pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)                    
 plyr          1.8.6   2020-03-03 [1] CRAN (R 4.0.0)                    
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)                    
 processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)                    
 ps            1.4.0   2020-10-07 [1] CRAN (R 4.0.3)                    
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)                    
 R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)                    
 rappdirs      0.3.1   2016-03-28 [1] CRAN (R 4.0.0)                    
 Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.2)                    
 remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)                    
 rlang         0.4.8   2020-10-08 [1] CRAN (R 4.0.3)                    
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)                    
 Rttf2pt1      1.3.8   2020-01-10 [1] CRAN (R 4.0.0)                    
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)                    
 spam        * 2.5-1   2019-12-12 [1] CRAN (R 4.0.0)                    
 stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                    
 stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)                    
 testthat    * 2.3.2   2020-03-02 [1] CRAN (R 4.0.0)                    
 tibble        3.0.4   2020-10-12 [1] CRAN (R 4.0.3)                    
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)                    
 usethis     * 1.6.3   2020-09-17 [1] CRAN (R 4.0.2)                    
 vctrs         0.3.4   2020-08-29 [1] CRAN (R 4.0.2)                    
 withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)                    
 xfun          0.18    2020-09-29 [1] CRAN (R 4.0.2)                    
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.0)  

Verbs: GetRecord

This verb is used to retrieve an individual metadata record from a repository. Required arguments specify the identifier of the item from which the record is requested and the format of the metadata that should be included in the record. Depending on the level at which a repository tracks deletions, a header with a "deleted" value for the status attribute may be returned, in case the metadata format specified by the metadataPrefix is no longer available from the repository or from the specified item.

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages

Change `pg_getrecord` to `pg_get_record`

The other functions are using _ for spacing between words in functions names, all except pg_getrecord(). Suggest we change this to pg_get_record().

Thoughts? +1/-1?

Avoid repeating the URL in every function

Each function thus far has hard-coded the URL for Pangaea. This is a pain to maintain if they ever change the URL. Instead, stick the URL somewhere in the package namespace such that we can get at it but it doesn't exist as an object outside of it.

We could just assign the URL to pg_url and refer to it as pangaear:::pg_url? Or perhaps stick it inside an environment or other object containing this and other options that are constant across the package?

Verbs: ListRecords

This verb is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted. No metadata will be present for records with deleted status.

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages

Datasets with png files failing

e..g.,

aa <- pg_data(doi = "10.1594/PANGAEA.825428")
#> Downloading 5 datasets from 10.1594/PANGAEA.825428
#> Error in png::readPNG(x) : unable to initialize libpng
#> In addition: Warning message:
#> In png::readPNG(x) :
#>   libpng warning: Application built with libpng-1.5.18 but running with 1.6.29

httr::verbose() not found with just pangaear loaded

One of the examples in ?pg_search is

pg_search(query='citation:Archer', config=verbose())

This fails with error

Error in as.request(config) : could not find function "verbose"

Not sure what the intention was here; if you expect `httr::verbose()' to be available then you should re-export it from pangaear's namespace. Or the example could become

pg_search(query='citation:Archer', config=httr::verbose())

(Also, it would be handy to run all examples as tests when not on CRAN so that issues like this are caught quickly)

Verbs: ListIdentifiers

This verb is an abbreviated form of ListRecords, retrieving only headers rather than records. Optional arguments permit selective harvesting of headers based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted.

http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages

Function to download data

This isn't part of the OAI process

We need to get links to data files from other functions...then download data, should be in a zip file

pg_search: searching for datasets that cross 180/-180

someone raised issue that some datasets are hard to get in results e.g,. https://doi.pangaea.de/10.1594/PANGAEA.898389

I think it's because they cross 180/-180, but not sure. e.g.,

pg_search(query = "pollen", bbox = c(51.8, 42.3, -171.7, 74.6))

with that bbox, it should find the dataset above, but does not.

If you just remove the bbox search it does find the dataset

pg_search(query = "standardized fossil pollen data from Siberia")

OAI fxn work yet to do

reworked to do OAI requests from scratch, still need to

  • handle resumptionToken cases
  • add parameter to give back raw XML

Need to declare a licence for use/distribution of the code

R CMD check warns about an unstated licence as one is not specified in DESCRIPTION. Also we need to explicitly state what licence the code base is under via an explicit LICENCE file in the top level.

To do:

  • Add Licence: field to DESCRIPTION
  • Add a LICENCE file to the top level
  • Add the created ./LICENCE to .Rbuildignore

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.