ropensci / comtradr Goto Github PK

View Code? Open in Web Editor NEW

61.0 13.0 17.0 10.33 MB

Functions for Interacting with the UN Comtrade API

Home Page: https://docs.ropensci.org/comtradr

R 100.00%

comtrade api supply-chain r rstats r-package peer-reviewed

comtradr's Introduction

comtradr

Interface with and extract data from the United

Nations Comtrade API https://comtradeplus.un.org/. Comtrade provides country level shipping data for a variety of commodities, these functions allow for easy API query and data returned as a tidy data frame. More info can be found here. Full API documentation can be found here.

Please report issues, comments, or feature requests. We are very much looking for feedback on the usability of the new functions.

Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

For information on citation of this package, use citation("comtradr")

Installation 🛠️

You can install the package with:

install.packages("comtradr")

To install the dev version from github, use:

# install.packages("devtools")
devtools::install_github("ropensci/comtradr@dev")

Usage

Authentication 🔐

Do not be discouraged by the complicated access to the token - you can do it! 💪

As stated above, you need an API token, see the FAQ of Comtrade for details on how to obtain it:

➡️ https://uncomtrade.org/docs/api-subscription-keys/

You need to follow the detailed explanations, which include screenshots, in the Wiki of Comtrade to the letter. ☝️ I am not writing them out here, because they might be updated regularly. However, once you are signed up, select the comtrade - v1 product, which is the free API.

Storing the API key

If you are in an interactive session, you can call the following function to save your API token to the environment file for the current session.

library(comtradr)

set_primary_comtrade_key()

If you are not in an interactive session, you can register the token once in your session using the following base-r function.

Sys.setenv('COMTRADE_PRIMARY' = 'xxxxxxxxxxxxxxxxx')

If you would like to set the comtrade key permanently, we recommend editing the project .Renviron file, where you need to add a line with COMTRADE_PRIMARY = xxxx-your-key-xxxx.

ℹ️ Do not forget the line break after the last entry. This is the easiest by taking advantage of the great usethis package.

usethis::edit_r_environ(scope = 'project')

Example 1 ⛴️

Now we can get to actually request some data. Let us query the total trade between China and Germany and Argentina, as reported by China.

# Country names passed to the API query function must be spelled in ISO3 format. 
# For details see: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3 

# You can request a maximum interval of twelve years from the API
example1 <- comtradr::ct_get_data(
  reporter = 'CHN',
  partner = c('ARG', 'DEU'),
  start_date = 2010,
  end_date = 2012
)

# Inspect the return data
str(example1)

Example 2 ⛴️

Return all exports related to Wine from Argentina to all other countries, for years 2007 through 2011.

library(comtradr)

# Fetch all shrimp related commodity codes from the Comtrade commodities DB.
# This vector of codes will get passed to the API query.
wine_codes <- ct_commodity_lookup("wine", return_code = TRUE, return_char = TRUE)

# API query.
example2 <- ct_get_data(
  reporter =  "ARG",
  flow_direction = "export",
  partner = "all_countries",
  start_date = 2007,
  end_date = 2011,
  commodity_code = wine_codes
)

# Inspect the output
str(example2)

Bulk Download Example 📦

To download bulk files, use the function ct_get_bulk. Usage is documented in the package vignettes, see here for an example.

Attention, this downloads large files (often more than one Gigabyte in size) and requires a premium key.

hs0_all <- comtradr::ct_get_bulk(
  reporter = c("DEU"), # only some examples here,
  commodity_classification = 'H0',
  frequency = 'A',
  verbose = T,
  start_date = 2020, # only one year here
  end_date = 2020)

Data availability

See here for an overview of available commodity classifications.

Comtradr has relaunched recently 🚧

The Comtrade API has been undergoing extensive updates. At this point the legacy API has been taken offline (see here).

In order to accommodate the new syntax, as well as adding the new functionality, the comtradr package has been undergoing an extensive re-write. Additionally, it is not possible anymore to query the API without an API token. There still is a free version, but unlike before you need to be a registered user to obtain the token. See the FAQ for details on how to obtain the new access tokens.

comtradr's People

Contributors

Stargazers

Watchers

Forkers

rtaph aliciaschep ioannispapadakis circlesquaresss johnbaums hancky452 elfatherbrown amannj deluair pachadotdev ernestguevarra potterzot erdeyl sanguovbobo jonthegeek sroh722 oxfordroadmap

comtradr's Issues

In ct_search, setting `type = "services"` requires EB02 commodity scheme

Example, showing the current issue:

vals <- ct_search(reporters = "Canada",
                  partners = "Germany",
                  type = "services")
#> Error: API request failed, with status code [200]
#> Fail Reason: Invalid classification for trade type.

Performing a query with arg type = "services" requires that the current commodity db scheme being used by comtradr be "EB02". It's easy to make this adjustment prior to making the call to ct_search by running comtradr::ct_update_databases("EB02"), but at the very least the error message in the example above should be more helpful (should indicate what the issue is, and how to fix it using ct_update_databases).

This issue first came up in #5.

"remaining hourly queries" is not resetting at "reset time"

I have more queries to run than the API permits in an hour, so I am breaking them up into batches and trying to run them on hourly (plus a few minutes cushion) intervals. When I try to run the second batch (after the "reset time"), the "reset time" resets but not "remaining hourly queries." So if I drove the queries down to zero in the previous hour, ct_get_remaining_hourly_queries() continues to return zero after the "reset time" (indefinitely, as far as I can tell) -- and the second batch of queries fails with "Error: over the hourly limit." As a workaround, I can detach/unload and then reattach comtradr between batches -- which I assume is clearing the cache values (but is also evading comtradr's intended controls on how many queries are sent to the API hourly.)

ct_register_token does not take effect

I try to register a token by the function ct_register_token. However, the download process still hit the 100 query limit. The right query limit after a token be registered should go up to 10,000.

goodpractice

excellent work!
so far gp shows this:

── GP comtradr ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 47% of code lines are covered by test cases.

    R/ct_commodity_lookup.R:91:NA
    R/ct_commodity_lookup.R:92:NA
    R/ct_commodity_lookup.R:93:NA
    R/ct_commodity_lookup.R:94:NA
    R/ct_search.R:145:NA
    ... and 256 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your
    lines shorter than 80 characters

    inst/doc/comtradr-vignette.R:2:81
    inst/doc/comtradr-vignette.R:16:81
    inst/doc/comtradr-vignette.R:44:81
    inst/doc/comtradr-vignette.R:78:81
    inst/doc/comtradr-vignette.R:121:81
    ... and 25 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. LaTeX errors found:
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
Error in .rs.normalizePath(marker$file, mustWork = TRUE) : 
  path[1]="./inst/doc/comtradr-vignette.R": No such file or directory
In addition: There were 17 warnings (use warnings() to see them)

Does the package work for other classifications?

@ChrisMuir , does the package work with other classifications? I tried using SITCrev4 and it failed. I could not find an option on ct_search related to the database type.

library(comtradr)
# Update and select SITCrev4
ct_update_databases(commodity_type = "SITCrev4")
#> Updates found. The following datasets have been downloaded: commodities DB
ct_commodity_db_type()
#> [1] "SITCrev4"
ex_1 <- ct_search(reporters = "China",
                  partners = c("Rep. of Korea", "USA", "Mexico"),
                  trade_direction = "exports")
#> Error: API request failed, with status code [200]
#> Fail Reason: SITCrev4 is an invalid trade classification

Planned re-launch of comtradr

Hi all!

the Comtrade API has been undergoing extensive changes. To accomodate these changes, comtradr will undergo a relaunch. Given that the most important functions are not working at the moment, this is a sensible moment to deprecate some and also update some of the used packages, e.g. from httr to httr2. Sadly, Chris will also leave the package as a maintainer, meaning there will also be a transition of maintainership in parallel.

To have an overview of needed changes, I will try to make a summary and put out some ideas that I would then propose to discuss in separate issues, so we can distribute efficiently. Please let me know, if I forgot something important (surely I did...).

Pinging everyone who wanted to help or is already a co-maintainer: @hgoers @dkolkin @pachadotdev

Authentication

Currently, the ct_register_token() function registers the token and does some extensive validation, depending whether there is a token and whether it is premium. The new API always requires a token. I propose that for now, we focus on only the options for the free API, as I personally do not have access to the premium API and it is quite expensive. I assume most users will not have access anyways. If somebody has access to the premium feature we can maybe extend for this option later?

I think we can probably just use two simple functions proposed in the httr2 package for setting and getting the token that are relatively user friendly, e.g. like here: https://github.com/datapumpernickel/untrader/blob/main/R/utils.R

If we have time, we can think about secrets and safer ways to store the credentials, but I think this suffices for now. I think there is no way to get nicely formatted data about how many calls are left on a given day of the 250 free calls, we could think about implementing something for that, but I would keep that for a future extension as well.

Getting data

Currently, most things happen in the ct_search() function, that is both checking the input parameters as well as building the request. The execution of the API request is already sourced out to another function, but this function also does some of the processing of the response. My proposal would be to further modularize this function to make it easier to trouble-shoot, make it easy to extend specific parts of it (e.g. adding new parameters) and work on it in parallel without conflicts.

The structure I started in another package for testing this out is the following:

ct_get_data()    
│
└───ct_params_check()
│   │    This function would call all subsequent functions in the relevant hierarchy and passes arguments between them (e.g. check 
|   |     classification code and url parameters before commodity code and pass it to cmd_code_check)
│   │
│   └─── ct_class_code_check()
|   |     The following functions should check the input parameters for validity and also translate them from human readable to 
|   |     to the machine-readable codes of the comtrade API
│   └─── ct_freq_check()
│   │  
│   └─── ct_cmd_code_check()
│   │
│   └─── ct_flow_direction_check()
│   │
│   └─── ct_reporter_check()
│   │
│   └─── ct_partner_check()
│   │
│   └─── >extendable for all parameters that we want to include or that are added in the future
|
└─── ct_request_build()
|   this builds the URL with the httr2 library
|
└───ct_request_perform()
|    this performs the request and catches and returns possible http errors, also easy implementation of retries and throttling with httr2
|
└─── ct_request_process()
    this will parse the response to an r data.frame, could also include 'pretty_cols = T' argument to allow for further processing

You can see it partially implemented in this example here: https://github.com/datapumpernickel/untrader/blob/main/R/check_params.R

I think this would make it very easy for us to extend the API in the future and at the same time make it easy to pinpoint errors to specific parameters. Lastly, it would probably make it also easier to write testthat statements for these functions. I am in no way set on any function names, these are just put as an example.

The previous, then unused functions we should keep in one file (e.g. legacy.R) and stick some "Deprecated" Error messages in there that help users transition to the new functions. See #44 for some hints and ideas by Maëlle on this.

Package Data

Getting and saving the data

When verifying the input parameters we will need to access the predefined tables of the UN. There is a few functions currently implemented for saving and querying these, I guess we should see how much of it we can salvage and what could benefit from a re-write.
The full list of reference tables can be found here: https://comtradeapi.un.org/files/v1/app/reference/ListofReferences.json
It would probably make sense to write a function that queries all of these in some smart fashion and saves them in extdata. I think it would be good to make these available to the user as well and not keep them internal, at least I found myself frequently looking at them. I would need to familiarize myself with the best practices around sysdata/extdata and the other directories

Updating the reference tables on the user side

Currently there is a package function that updates the tables for the current session when users execute it ct_update_databases. I am not sure what are the specific guidelines of rOpenSci and CRAN for saving data like this temporarily and permanently, maybe it would make sense to do some research into how this could be solved best and then adjust to them. I assume @chris put some thought into this and we might be able to recycle large parts of this function.

Look-up functions

The package currently provides for two look-up functions ct_country_lookup.R and ct_commodity_lookup.R to make it easier to query the API in natural language.

For country codes:
- I would propose we abandon the writing out of country names, because it is very error prone. As there is no standardized way of writing country names, making this a constant annoyance (at least for me). Instead I propose we change to iso3 codes. These are standardized and would nicely play with a lot of other packages that specialize in transforming them back into plain english or other languages. Instead of having to lookup countries, e.g. one could just pass in a pre-existing list of EU-28 iso3c codes one got e.g. from giscoR for geodata.
For commodity codes:
- Although there is online resources for looking up HS codes that are much nicer than a function in R, this makes it more interactive and easy to not have to leave R to look up some Commodity Codes. I think we should keep these function and only update it where necessary.

Testing

We will have to write new tests for our new suite of functions. After reading some of the review for the package from last time, I suggest we write a lot of testthat statements with few arguments to easily pinpoint what is wrong with each test.
Furthermore, Maëlle suggested we implement testing for http-calls (see #44). I think this is a great idea, because it would allow us to check for the proper building of the http-request without passing one of our API tokens to github and figuring out how to make this pass the check for submitting to CRAN without an API token.

Badges

There is quite a few badges that are currently not working anymore or are failing. I honestly have no clue about most of these.
My naive suggestions would be to archive these for now, together with their respective yaml files and implement them again when we have gotten further with the new functions.

@ChrisMuir It would be great to get some guidance from you on this. Maybe we can make a slack-group with the new co-maintainers to discuss this internally.

Travis: I get an error message about Project not found or access denied when following the link: https://ci.appveyor.com/project/ropensci/comtradr

Codecov: Same here, something seems to have been archived, I am not sure what would be necessary to get it running again: https://app.codecov.io/gh/ropensci/comtradr

CRAN: obviously currently down, since we are off CRAN for now

Peer-reviewed: Although the package has been peer-reviewed, after the comprehensive re-launch it probably should not claim this badge until a re-review. Is there any policy on this @maelle?

Really looking forward to working with all of you on this!

Dowloading bulk data with SITC commodity type from using comtradr package

Dear all,

I have been working on downloading bulk data using comtradr package and I'm facing several problems. My goal is to download commodity data for all commodities, all countries and all time periods, I work with a data download permission/IP that is supposed to let me download all the three variables set at "all". But I still get the error: between 'reporters', 'partners', and date range, only one of these may be 'all'
Could someone tell me what mistake I am making here?

Another problem is that I would need the SITC commodity types which are not the default for downloading with ct_search() command. The suggested solution is to covert or in other words update the data by ct_update_databases() command and specify px = S2. Below are the code that I used:

q <- ct_search(reporters = "all", partners = "all", trade_direction = "all")

and to update the category:

ct_update_databases(commodity_type = "SITCrev2")
I would highly appreciate your help.
Best,
Shadi

Bulk/large download

Hey,
assuming I wanted to download the whole or a very large fraction of the UN Comtrade data using comtradr, how would i go about that?

I guess I would have to write nested loops of reporters and partners so that each single request is not that big, right? And include a longer waiting period once the request limit (for one hour?) has been reached?

I am asking because I am thinking how one could use the etl package to create a local SQL Comtrade DB. I have started coding here, but the downloading ("extracting") part is missing...

add validation for new arguments partner2, mode_of_transportation and customs_code

The arguments:

partner2
mode_of_transportation
custom_code

are currently not being validated. The defaults are already important (and set correctly), because if not set return for beginners confusing data concerning all sorts of custom codes and modes of transportation.

SSL issue

Hi all,

I am pretty new to an advanced use of R, so apologies if this is something very easy to fix.
When trying to use the command I am immediately blocked by the following error message:
"Error in curl:: curl_fetch_memory(url, handle=handle) :
SSL peer certificate or SSH remote key was not OK.

Do you have any suggestion on how to solve this?

Best,
Emanuele

API will be decommissioned by 2023

Hi all,

thanks for the great package, has saved me tremendous amounts of time in the past.
I am writing, because to my understanding, the current API will be decommissioned by 2023.
See here: https://unstats.un.org/wiki/display/comtrade/New+Comtrade+FAQ+for+Advanced+Users#NewComtradeFAQforAdvancedUsers-WhatisthelegacyoftheUNComtrade?UntilwhencanIuseit?

I find this quite worrying, while tinkering around with the new API, I have found it to be quite unstable. Some reported codes are not in the references, others are in the references, but do not work anymore (such as "all").

As I see it, this package might be obsolete, unless you already are a registered user.
I was wondering, whether:

Somebody has tried adapting the package to the new comtrade or is planning to do that? As far as I can see, quite some things have changed, most importantly the authentication and this would require substantial investment of time, right?
Somebody would be willing to let the UN know that maybe they could postpone the decomissioning until the new version is fully up and running? I am not sure, whether the preoblems I am having are just because I am unable to propperly query, or whether it really is a bit difficult to understand.

Thanks and all the best!

Convenience: Allow start/end to be given as year-integers

Hey,

i use always annual data (and it's also the default value). So it would be convenient to let the user specify only start = 2005 or end = 2005 if freq = "annual". start = 2005 would expand to 2005-01-01 and end = 2005 to 2005-12-31.
What do you think of that?

COMODO CA cert causing problems for some

Saw this https://community.rstudio.com/t/ssl-certificate-problem-certificate-has-expired/68619 in the RSS feeds and suspected it was the recent COMODO CA cert issue biting them and it is.

https://comtrade.un.org/data/doc/api/ looks fine in-browser b/c browsers are lying to avoid users clamoring for them to "fix it". They can't. Each site has to regen certs.

x <- openssl::download_ssl_cert("comtrade.un.org")

lapply(x, `[[`, "validity")

will show (as of this timestamp) that the topmost cert is expired.

Not sure y'all can do anything abt it but I figured I'd toss an issue here in case more folks encounter the situation and go here to note it. They can just pile on to this one :-)

commodity_lookup() Returning Named Vector in Specific Situations

In function commodity_lookup(), if there is one (and only one) search result for each element of the input arg values, the return is a named vector, regardless of whether arg return_char is TRUE or FALSE. The intention is for the function to return an unnamed vector if return_char is TRUE, and a list if return_char is FALSE.

Here are two reproducible examples:

Ex. 1:

commoditydf <- comtradr::ct_commodities_table(type = "HS")
input_commod <- commoditydf$commodity[c(10, 20, 30, 40)]
commodity_lookup(input_commod, commoditydf, return_code = TRUE, return_char = TRUE)

returns

                                        010119 - Horses; live, other than pure-bred breeding animals 
                                                                                            "010119" 
                                                  010231 - Buffalo; live, pure-bred breeding animals 
                                                                                            "010231" 
0105 - Poultry; live, fowls of the species gallus domesticus, ducks, geese, turkeys and guinea fowls 
                                                                                              "0105" 
             010594 - Poultry; live, fowls of the species Gallus domesticus, weighing more than 185g 
                                                                                            "010594"

Ex. 2:

commoditydf <- comtradr::ct_commodities_table(type = "HS")
input_commod <- c("010119", "010231", "010594")
commodity_lookup(input_commod, commoditydf, return_code = FALSE, return_char = FALSE)

returns

                                                                                   010119 
                           "010119 - Horses; live, other than pure-bred breeding animals" 
                                                                                   010231 
                                     "010231 - Buffalo; live, pure-bred breeding animals" 
                                                                                   010594 
"010594 - Poultry; live, fowls of the species Gallus domesticus, weighing more than 185g"

The ct_get_remaining_hourly_queries() can not return the right query limit

Here is what I want to achieve with ct_get_remaining_hourly_queries():
Since the comtradr API has an hourly query limit for an unauthorized user with just 100 queries per hour, so I'd like to pause the R script until the query limit being reset to 100 again.
Code Version 1:
for(Year in startYear){## loop on the start year

for(country in reporter[,2]){## loop on the report country
if(ct_get_remaining_hourly_queries()==0){
wait_time=as.numeric((ct_get_reset_time()-Sys.time()))*60+5
pause(wait_time)
}
tryCatch({
filename=paste0("./rawData/plastic/plastic_",country,"_",Year,".csv")
if(file.exists(filename)==FALSE){
assign(paste0("plastic_",country,"_",Year),value = ct_search(reporters =country,partners="All", trade_direction ="all",freq = "annual",start_date =as.character(Year),end_date = as.character(Year+4),commod_codes = comm_code))
write.csv(get(paste0("plastic_",i)),filename)
}
},error=function(e){cat("ERROR:",file=paste0("./rawData/log/plastic_",i,".log"),append = TRUE,conditionMessage(e),"\n")})
print(country)
}

}
In the first version, I basicly use this chunk of code: wait_time=as.numeric((ct_get_reset_time()-Sys.time()))*60+5 pause(wait_time) . I suppose that the comtradr will reset the ct_get_remaining_hourly_queries() to 100 after the R session being paused for a while, but I get a persistent error from the following tryCatch statement：
ERROR: over the hourly limit. hour resets at 2018-11-08 23:42:35
ERROR: over the hourly limit. hour resets at 2018-11-09 00:42:40
ERROR: over the hourly limit. hour resets at 2018-11-09 01:42:45
ERROR: over the hourly limit. hour resets at 2018-11-09 02:42:50
ERROR: over the hourly limit. hour resets at 2018-11-09 03:42:55
ERROR: over the hourly limit. hour resets at 2018-11-09 04:43:00
ERROR: over the hourly limit. hour resets at 2018-11-09 05:43:05
ERROR: over the hourly limit. hour resets at 2018-11-09 06:43:10
ERROR: over the hourly limit. hour resets at 2018-11-09 07:43:15
ERROR: over the hourly limit. hour resets at 2018-11-09 08:43:20
ERROR: over the hourly limit. hour resets at 2018-11-09 09:43:25
So obviously, the query limit has been wrongly estimated for the current session or IP address.
Thus I modifed my code by introduce the hidden restart fucntion from rstudio:.rs.restartR(). After I manually restart the R session, the ct_get_remaining_hourly_queries() is equal to 100 now.

rebuild the vignettes

We need to update the old vignettes to reflect the current changes to the code. The structure seems to be fine as is to me.

Pkg data updates not reflected in new R sessions

The core functions of this package require access to two reference data files - one is a df of countries and country codes, the other is a df of commodities and commodity codes. The package ships with both as package data rda files in inst/extdata, but there's also a function for updating either/both of these data sets, ct_update_databases() (the API issues data updates periodically, and there are a number of different commodity data sets that the user can choose to use, and access to those is made available via the update function). The purpose of the update function is to save data updates/changes to comtradr/extdata within the local R library directory (over-writing the rda file(s) in process).

Something is going wrong though, currently the update function will indeed save updates down to file, however if R is restarted, the pkg data files will revert back to their state prior to updating.

Here's an example:

library(comtradr)

# Check which commodity DB exists currently on file.
ct_commodity_db_type()
#> [1] "HS"

# This should download a different commodity DB file and save it to file.
ct_update_databases(commodity_type = "HS1992")
#> Updates found. The following datasets have been downloaded: commodities DB

# Check that download was successful.
ct_commodity_db_type()
#> "HS1992"

Here I restart R.

library(comtradr)

# Check which commodity DB exists currently on file.
ct_commodity_db_type()
#> [1] "HS"

The commodities data set that gets loaded upon package load is the data set that existed on file prior to our updating, which is not what we want.

The weird part is I can navigate to my local R packages directory, and I can see that file commodity_table.rda was indeed last modified during the update function. I can also manually read it in and check its type:

file_test <- system.file("extdata", "commodity_table.rda", package = "comtradr")

# This loads df 'commodity_df' into the global env.
load(file_test)

attributes(commodity_df)$type
#> [1] "HS1992"

cat(file_test)
#> [1] "path/to/local/R/libraries/comtradr/extdata/commodity_table.rda"

I don't have the comtradr package installed in any old R installation directories, I removed the dev git repo (for the purpose of this test), I've searched through temp folders and don't see anything suspicious. I have no idea where the "old" data is even being loaded from.

edit to add: I've tested this on both PC and Mac.

Issue: Specifying `commod_codes` in `ct_search` returns error.

Hi,

I was trying to use comtradr to download some SITC Rev. 2 data on the 5-digit level. Here is what I did.

First, I update the database to SITC Rev.2

comtradr::ct_update_databases(commodity_type = "SITCrev2")

Running the following chunk of code works as expected.

comtradr::ct_search(reporters = "Mexico",
                                 partners  = "China",
                                 trade_direction = "exports",
                                 commod_codes = 'TOTAL')

However, as soon as I want to specify commod_codes I get the following error.

comtradr::ct_search(reporters = "Mexico",
                                 partners  = "China",
                                 trade_direction = "exports",
                                 commod_codes = "AG5")

Error: API did not return json. Instead, text/html data was returned

I observe this behaviour for AG[X], X = 1,...,5. I'm running comtradr version 0.2.2.

Thanks a lot for your help.

ct_commodity_lookup Fails With All Caps Input

ct_commodity_lookup("halibut")
#> $halibut
#> [1] "030221 - Fish; halibut (reinhardtius hippoglossoides, hippoglossus hippoglossus, hippoglossus stenolepis), fresh or chilled (excluding fillets, livers, roes and other fish meat of heading no. 0304)"
#> [2] "030331 - Fish; halibut (reinhardtius hippoglossoides, hippoglossus hippoglossus, hippoglossus stenolepis), frozen (excluding fillets, livers, roes and other fish meat of heading no. 0304)"

ct_commodity_lookup("HALIBUT")
#> $HALIBUT
#> character(0)

#> Warning message:
#> There were no matching results found

These two commodity lookup queries should return the same results.

This issue first came up in #5.

reduce dependencies / magrittr pipe?

we might be able to reduce dependencies in some parts of the package still. We can probably do this towards the end to minimize the footprint of the package.

Furthermore, I have worked with the 'new' native R pipe, but I am not sure it is smart in hindsight, because if I understand correctly not supported in R versions pre 4.1.0? That was 'only' in 2021, so some users/servers might still use an older version?

`ct_register_token()` does not validate token

Hi Chris,

I think comtradr::ct_register_token() fails to verify if the provided token is recognised by Comtrade's API:

> comtradr::ct_get_remaining_hourly_queries()
[1] 100
> comtradr::ct_register_token('badtoken')
> comtradr::ct_get_remaining_hourly_queries()
[1] 10000

Maybe an easy way to check if a token is valid could be via getUserInfo. Maybe something like:

mytoken <- "..."
rjson::fromJSON(
  file = paste0("https://comtrade.un.org/api/getUserInfo?token=", mytoken)
)

RStudio - error Error in curl::handle_setopt(handle, .list = req$options) : Unknown options: curlopt_ssl_verifypeer, set.ssl.primary.verifypeer

Hello,

I am trying to download data from the UN Comtrade database. After updating my RStudio I have the same problem either using the ct_search function in the comtradr package or using get.Comtrade function define on this website: http://nathanlane.info/2017/04/19/uncomtrade.html . There is an error: Error in curl::handle_setopt(handle, .list = req$options) :
Unknown options: curlopt_ssl_verifypeer, set.ssl.primary.verifypeer

Can anyone resolve this problem and suggest to me what should I do?

SSL Certificate Issue

As of 2017-05-28, the core functions of this package are no longer working properly on Windows machines, they fail with error message:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Peer certificate cannot be authenticated with given CA certificates

Which essentially means there's a CA certificate issue preventing curl from completing the connection. I believe the UN site hosting the DB is the issue, as there are issues with its SSL certificate per ssldecoder.

The help docs for parameter combination is not true

When I try to download the data with a registered token, the actual valid parameter combination reporters, partners and date range accept at most 2 Alls. check the code below:
ct_search(reporters ="China",partners="All", trade_direction ="all",freq = "annual",start_date ="all",end_date = "all",commod_codes = "01")

However, as stated in the docs:

Between params "reporters", "partners", and the query date range (as dictated by the two params "start_date" and "end_date"), only one of these three may use the catch-all input "All".

I think there might be some inconsistency.

New Maintainer Needed

Hello all, I am the author and current maintainer of the comtradr package. Recently the UN has introduced breaking changes to their data API, this package will need extensive updates to get it working again.

I have lost the appetite to continue maintaining this package. I am working with rOpenSci to try to find one or more individuals that would like to take ownership of the package, get it working again, and stay on top of maintenance. I'm happy to help with the transition and hand-off.

@maelle @ropensci/admin

In ct_search, Unexpected Results when Passing "total" to arg `commod_codes`

Example, showing the current issue:

comtradr::ct_update_databases(commodity_type = "EB02")
vals <- ct_search(reporters = "USA", 
                  partners = "World", 
                  freq = "annual", 
                  start_date = "2015-01-01", 
                  end_date = "2015-12-31", 
                  commod_codes = "TOTAL", 
                  type = "services")
nrow(vals)
#> 0

This query should return a data frame with two rows, not 0.

In ct_search, the default value for the commodity codes arg is "TOTAL", which is meant to mean "all commodities". In the guts of ct_search, the literal string "TOTAL" ends up getting used within the query url ... this works for all of the different "HS" commodity schemes, however for EB02, "TOTAL - all services" has an actual commodity code of 200.

To fix this within ct_search, when "all" or "total" is passed as input to arg commod_codes, I'll have a step that actually looks up the code for either in the current commodity codes db on file.

This issue first came up in #5.

ct_get_remaining_hourly_queries() does not get updated after validating token

I have tried the potential solution suggested in #34 , but I face a slightly different issue. My token is authenticated (verified at https://comtrade.un.org/ws/CheckRights.aspx). The remaining hourly queries does not get updated in the ct_get_remaining_hourly_queries( ) function, but is updated in getOption("comtradr"). Unfortunately, I have been limited to the standard access despite having the token.

> ct_register_token(goodtoken)

> getOption("comtradr")
$comtrade
$comtrade$token
[1] goodtoken

$comtrade$account_type
[1] "premium"

$comtrade$per_hour_limit
[1] 10000

$comtrade$per_second_limit
[1] 1


attr(,"class")
[1] "comtradr_credentials"

> ct_get_remaining_hourly_queries()
[1] 100

Error when searching for EU

Thanks for this great package! I have an issue with looking for EU trade statistics. Every time I look for the EU I get the same error massage. I suspect this may be related to the special character in the string.

df <- ct_search(reporters = "USA",
partners = c("China",
"EU-28",
"Rep. of Korea" ),
trade_direction = "exports")

Error: Result 2 must be a single string, not a character vector of length 0 Run

updating reference tables

reference tables for validation of the commodity_classification parameter need to be updated every once in a while.

we need to include some kind of semi-automated task that updates these and pushes changes to github and possibly CRAN
we need some kind of user function to update for the current session

this also includes:

refactoring the DATASET.R file to include cleaning of the partner/reporter and other ref tables

Appveyor Failures

Appveyor is currently always failing for R devel build. I'm making this issue just to track changes I'm going to try making to the appveyor.yml file in an attempt to fix the issue.

Currently, the log includes this line, per this issue over at r-appveyor:

== 10/04/2018 21:22:35: Skipping download of Rtools because src/ directory is missing.

Which seems to be causing issues like this when installing source packages:

In R CMD INSTALL
* installing *source* package 'backports' ...
** package 'backports' successfully unpacked and MD5 sums checked
** libs
*** arch - i386
c:/Rtools/mingw_32/bin/gcc  -I"c:/R/include" -DNDEBUG          -O3 -Wall  -std=gnu99 -mtune=generic -c dotsElt.c -o dotsElt.o
/bin/sh: c:/Rtools/mingw_32/bin/gcc: No such file or directory
make: *** [dotsElt.o] Error 127
ERROR: compilation failed for package 'backports'

Error after registration of API token

When I use the ct_search function after the regestration of an API token (function: ct_register_token() ), I get the error message "Error: Comtrade API request failed, with status code [500]". The same query is working fine, if there
a) no token is registerd or
b) the registered token is invalid.

Maybe it's the same problem as mentioned in #34 as "issue 2". I spent several houres on it, but could not fix it or at least understand it.

BTW: I also tried to use the API directly via httr. So far I did not run into any restrictions (max. number of elements, more than 100 queries/h) - wether I used an token or not in the URL. But the ct_search functions stops with an error if there were more than 100 queries...

It would be nice if someone could help me and fix this problem, so that I don't have to write a new query function myself. Thanks.

commodity_lookup() throwing error, related to param return_char

When a single lookup value is passed, the functions works great:

commodity_lookup("trout", commoditydf, return_char = FALSE)

However when multiple lookup values are passed, AND return_char is set to FALSE, the function throws an error:

commodity_lookup(c("tomato", "trout"), commoditydf, return_char = FALSE)

results in:

Error in function_list[[k]](value) : 
  'names' attribute [2] must be the same length as the vector [1]

The desired output is a named list of length two.

I'm working on patching this bug now.

Eswatini/Swaziland bug

I think the name change of Swaziland to Eswatini has not yet been properly implemented in either comtradr or the underlying API

This code gives an error

ct_search(reporters = "Ghana", 
                   partners = "Eswatini "
                   commod_codes = "all",
                   trade_direction = "exports",
                   start_date = "2018",
                   end_date =  "2018")

but this code does work

ct_search(reporters = "Ghana", 
                   partners = "Swaziland "
                   commod_codes = "all",
                   trade_direction = "exports",
                   start_date = "2018",
                   end_date =  "2018")

However, in the output Eswatini is mentioned as partner country

first line of output looks like this

   classification year period period_desc aggregate_level is_leaf_code trade_flow_code trade_flow reporter_code reporter reporter_iso partner_code  partner partner_iso second_partner_code second_partner
1              H5 2018   2018        2018               2            0               2     Export           288    Ghana          GHA          748 Eswatini         SWZ                  NA           <NA>

Can't get data for 2017

I'm not able to get data for 2017 on, already updated database. For Exemple
example1 <- ct_search(reporters = "Canada", partners = "Brazil", trade_direction = "exports", start_date = "2017-01-01", end_date = "2018-01-01")
Will return a data frame with no observations.

Provide an argument that allows users to pull all countries and country groupings

Currently, 'all' reporters or partners excludes groupings such as ASEAN or the EU. We should provide users with a means of pulling data for these groupings.

Issue with selection of classification system (px)

Is it possible to select the classification like in the API? I have a list of codes, some are H3 and some are H2.

In the source code of ct_search I found the following line, that is used for "px", but I don't get what that does.

code_type <- ct_commodity_db_type() %>% commodity_type_switch

comtradr is returning an empty data frame

Hi thanks very much for making the package! I have been trying to just go through the examples that you list in your readme file, but unfortunately I've only been getting empty data frames (see an example below). Do you have any idea what might be going on? If it makes any difference, I'm accessing Comtrade from Germany. Thanks very much in advance!!

example1 <- ct_search(reporters = "China",
partners = c("Rep. of Korea", "USA", "Mexico"),
trade_direction = "exports")
str(example1)
'data.frame': 0 obs. of 35 variables:
$ classification : logi
$ year : logi
$ period : logi
$ period_desc : logi
$ aggregate_level : logi
$ is_leaf_code : logi
$ trade_flow_code : logi
$ trade_flow : logi
$ reporter_code : logi
$ reporter : logi
$ reporter_iso : logi
$ partner_code : logi
$ partner : logi
$ partner_iso : logi
$ second_partner_code : logi
$ second_partner : logi
$ second_partner_iso : logi
$ customs_proc_code : logi
$ customs : logi
$ mode_of_transport_code: logi
$ mode_of_transport : logi
$ commodity_code : logi
$ commodity : logi
$ qty_unit_code : logi
$ qty_unit : logi
$ alt_qty_unit_code : logi
$ alt_qty_unit : logi
$ qty : logi
$ alt_qty : logi
$ netweight_kg : logi
$ gross_weight_kg : logi
$ trade_value_usd : logi
$ cif_trade_value_usd : logi
$ fob_trade_value_usd : logi
$ flag : logi

attr(*, "url")= chr "https://comtrade.un.org/api/get?max=50000&type=C&freq=A&px=HS&ps=all&r=156&p=410%2C842%2C484&rg=2&cc=TOTAL&fmt=json&head=H"
attr(*, "time_stamp")= POSIXct, format: "2018-09-27 16:37:58"
attr(*, "req_duration")= num 60.2`

`ct_search()` unable to loop over list of product codes

I need to pull import and export data between the US, China, and the World over a span of 5 years for a list of approx. 1,500 HS6 codes. I have an authentication token that I've successfully passed via ct_register_token() so theoretically I should be able to make 1,500 separate pulls from the API without hitting the rate limit. What I am attempting, then, is this:

trade <- lapply(hs1, function(i){
                                Sys.sleep(1)
                                tmp <- ct_search(reporters = "USA", partners = c("China", "World"),
                                                 trade_direction = c("imports", "exports"), start_date = 2000,
                                                 end_date = 2004, commod_codes = i)
                                df <- as.data.frame(do.call(rbind, tmp))
                                return(df)
})

where hs1 is a list ~1,500 elements long. However, every single time I run this, I get this error: Error: Comtrade API request failed, with status code [500]. Is this a bug with the package, or a problem with the API itself? Am I just trying to pull too much/too often, despite the token? Should I be trying to loop these pulls in a different manner? Or perhaps the authentication token not passing to the API correctly, so it thinks I've exceeded my rate limit? (when I check ct_get_remaining_queries(), it is still well in the 9 thousands after I tried running the lapply above several times).

An option that works for what I want is using the get.Comtrade function here, chunking the list of HS codes into lists of 100, and looping the lapply over those chunks, waiting 1 hour in between (since I can't figure out how to pass the authentication token within the function above). This is fine and works, but it takes 15 hours to get the data I want, and I'd like to see if there's a quicker way that utilizes the fact I have a token. Thanks in advance for any assistance in this regard.

Pkg Edits Related to rOpenSci Review

I submitted this package to rOpenSci, to be considered as part of their collection of open science packages. It is currently under reivew. As part of the review process, there are a number of changes and package edits to be made. This issue is meant to serve as a central collection of those changes.

Changes/Updates/Edits

Boxes will be checked when the change has been made and pushed to this repo.

Getting data

Update the function to access data from the new comtrade-v1 API. We can use this issue to discuss how we want to structure this function.

Service Data Download leads to API request fail

Hey,

nice package!
I have troubles downloading service trade data (on a fresh started R instance):

> library(comtradr)
> this.year <- 2015
> tmp <- ct_search(reporter = "Austria",
+                  partners = "World", freq = "annual",
+                  start_date = as.Date(paste0(this.year, "/01/01")),
+                  end_date = as.Date(paste0(this.year, "/12/31")),
+                  type = "services")
Error: API request failed, with status code [200]
Fail Reason: Invalid classification for trade type.
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] comtradr_0.1.0      ormisc_0.1          openxlsx_4.0.17    
[4] devtools_1.13.4     data.table_1.10.4-3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     digest_0.6.12    withr_2.1.0.9000 R6_2.2.2        
 [5] jsonlite_1.5     magrittr_1.5     httr_1.3.1       rlang_0.1.4     
 [9] curl_3.0         tools_3.4.2      purrr_0.2.4      compiler_3.4.2  
[13] memoise_1.1.0

Any idea what's wrong?

bug Eswatini / Swaziland

I think the name change of Swaziland to Eswatini has not yet been properly implemented in either comtradr or the underlying API

This code gives an error

ct_search(reporters = "Ghana", 
                   partners = "Eswatini "
                   commod_codes = "all",
                   trade_direction = "exports",
                   start_date = "2018",
                   end_date =  "2018")

but this code does work

ct_search(reporters = "Ghana", 
                   partners = "Swaziland "
                   commod_codes = "all",
                   trade_direction = "exports",
                   start_date = "2018",
                   end_date =  "2018")

However, in the output Eswatini is mentioned as partner country

first line of output looks like this

   classification year period period_desc aggregate_level is_leaf_code trade_flow_code trade_flow reporter_code reporter reporter_iso partner_code  partner partner_iso second_partner_code second_partner
1              H5 2018   2018        2018               2            0               2     Export           288    Ghana          GHA          748 Eswatini         SWZ                  NA           <NA>

Modify API error msg to be more clear

Here is an example:

vals <- ct_search(reporters = "Canada",
                  partners = "Germany",
                  type = "services")
#> Error: API request failed, with status code [200]
#> Fail Reason: Invalid classification for trade type.

This error message is related to an invalid input for arg type, and has nothing to do with request status codes. I think it makes sense to remove the references to status codes in error messages when the failure had nothing to do with the status code, and the status code is 200.

This issue first came up in #5

API planned updates/upgrades for 2018

Per the United Nations Comtrade website, they are planning to make a bunch of updates and upgrades to their API service, which will coincide with a version change (from v.1 to v.2). Looks like they expect v.2 to be made available sometime mid-year 2018, and for v.1 to be discontinued at the end of 2018. So as soon as v.2 is publicly available, I will be working on updating this package to reflect the new features.

Below is a list of planned changes from their Upgrade Plan website. This will be a fair amount of work 😸

EDIT: As of 2018-11-10, the Upgrade Plan website indicates that there have been delays in rolling this out....looks like it may not happen until 2019.

Previous System	Upgrades made with new system	Result for Comtrade Users
Three separate processing systems for annual and monthly Merchandise Trade Statistics and Trade in Services	One integrated processing system for all trade data	Improved consistency and timeliness of dissemination of trade data. Monthly Merchandise Trade Statistics will be disseminated the same way as annual Merchandise Trade Statistics data (e.g., will be converted to other classifications and missing quantities will be estimated)
Data items for Merchandise Trade Statistics included trade by commodity and by partner	In addition, new data items now include (when reported):mode of transportcustoms procedure codes2nd partner	More information on the nature of trade flows and partner country attribution, allowing for better analysis of bilateral trade asymmetries
Single valuation for Merchandise Trade Statistics imports (at CIF) and exports (at FOB)	In addition, FOB valuation for imports (when reported)	Symmetrical valuation of imports and exports and more information on insurance and freight costs, allowing for better analysis of bilateral trade asymmetries
Trade flows for Merchandise Trade Statistics consisted of Total Exports: Re-exports; Total Imports; and Re-Imports	Expanded breakdown of trade flows for Merchandise Trade Statistics (when reported) including Domestic Exports, Export/Import of goods after/for inward processing; Export/Import of goods for/after outward processing; and Export/Import on intra-firm trade	More information on the nature of trade flows, especially re-exports, re-imports, and goods for processing, and intra-firm trade
Merchandise Trade Statistics quantity units standardised to 12 World Customs Organization (WCO) recommended units	In addition, standardization to more quantity units (43+ units), when reported	Additional measures of quantity
Single weight (net weight in kg) for Merchandise Trade Statistics	In addition, gross weight (net weight plus the weight of the shipping or cargo container), when reported	More information on weight and shipping
Limited quantity estimations for Merchandise Trade Statistics (not all commodities estimated)	Estimation of quantity for all commodities; when applicable. However, some may not be shown for specific commodities	More complete quantity data; especially for aggregated data
Conversion for Merchandise Trade Statistics at 6-digit HS level only	Conversion at any HS level and improved conversion of residuals	More complete converted datasets
Only EBOPS2002 data disseminated for Trade in Services (EBOPS2010 data converted to EBOPS2002)	Trade in Services data also disseminated in EBOPS2010 (when reported in EBOPS2010)	Trade in Services data in both EBOPS2002 and EBOPS2010, when reported

Verify possible values for the frequency arg

Currently, we accept A and M. However, the previous API allowed you to pull quarterly data (Q). Need to verify whether this is still possible following the update.

Extend possible values for the commodity_classification arg

Currently only accept HS. Need to update to include SITC, BEC or EBOPS.

Update ct_search() to allow for "All of yyyy" queries

This was brought up by GH user pedromein in issue #12. The Comtrade API has a (new?) feature that allows for queries that return monthly data (vs annual data) for an entire individual year. Previously, if searching for monthly data, there were two options:

Pass all to start_date and end_date to get all monthly data for all years.
Pass a date range, in months, to start_date and end_date. Doing this, the most you could get from a single query was five consecutive months worth of data.

Now they have an option to get all monthly data for a single year. So need to update ct_search() to give users this option.

Error Message Maximum resultset is: 10000

I am attempting to pull UN Comtrade data using the r comtrade package with a nested for loop for multiple trading partners and years. The code will iterate several times and then stops with the following error

Error: API request failed. Err msg from Comtrade:
Maximum resultset is: 10000

Does this error message suggest there is something wrong with my query or that I have hit some sort of limit? I know that I have not hit the 100 query limit for the hour, so this is something else.

Error code [400] when downloading dataset from comtrade

I have used the example code from

https://rpubs.com/matt_smith/ITNr

library(comtradr)

#Find all the codes associated with "motor vechicle"
auto_codes <- ct_commodity_lookup("motor vehicle",
                                  return_code = FALSE,
                                  return_char = TRUE)

##Download the data for trade in these codes (motor vechicles)
auto_trade<- ct_search(reporters = "All",
                       partners = "All",
                       trade_direction = "imports",
                       start_date = "2016-01-01",
                       end_date = "2016-12-31",
                       commod_codes = auto_codes)

When I run this I get the error code

Error: Comtrade API request failed, with status code [400]

Any idea why this doesn't work?

Add a "depreciated" error to legacy functions

We need to provide informative depreciation errors to the existing functions. I propose that we use lifecycle to do this. In the depreciated error we should point the user to the new, relevant function and its documentation.

ropensci / comtradr Goto Github PK

comtradr's Introduction

comtradr

Installation 🛠️

Usage

Authentication 🔐

Storing the API key

Example 1 ⛴️

Example 2 ⛴️

Bulk Download Example 📦

Data availability

Comtradr has relaunched recently 🚧

comtradr's People

Contributors

Stargazers

Watchers

Forkers

comtradr's Issues

Authentication

Getting data

Package Data

Getting and saving the data

Updating the reference tables on the user side

Look-up functions

Testing

Badges

Changes/Updates/Edits

Recommend Projects

Recommend Topics

Recommend Org