Code Monkey home page Code Monkey logo

sbtools's Introduction

ScienceBase R Tools

Tools for interfacing R with ScienceBase data services.

Package Description

This package provides a rich interface to USGS’s ScienceBase, a data cataloging and collaborative data management platform. For further information, see the sbtools manuscript in The R Journal (USGS IP-075498). See citation('sbtools') for how to cite the package.

Recommended Citation:

  Winslow, LA, S Chamberlain, AP Appling, and JS Read. 2016. sbtools: 
  A package connecting R to cloud-based data for collaborative online 
  research. The R Journal 8:387-398.

Package source code DOI: https://doi.org/10.5066/P912NGFV

Linux Test Coverage
R-CMD-check codecov.io

Current CRAN information

Version Monthly Downloads Total Downloads
CRAN version

Package Installation

To install the sbtools package, you must be using R 3.0 or greater and run the following command:

install.packages("sbtools")

To get cutting-edge changes, install from GitHub using the devtools packages:

remotes::install_github("DOI-USGS/sbtools")

Reporting bugs

Please consider reporting bugs and asking questions on the Issues page:

https://github.com/DOI-USGS/sbtools/issues

Release Procedure

For release of the sbtools package, a number of steps are required.

  1. Ensure all checks pass and code coverage is adequate.
  2. Ensure NEWS.md reflects updates in version.
  3. Update DESCRIPTION to reflect release version.
  4. Convert DISCLAIMER.md to approved language and rebuild README.Rmd.
  5. Create release candidate branch and commit release candidate.
  6. Build source package and upload to CRAN.
  7. Once accepted to CRAN, tag release candidate branch an push to repositories.
  8. Change DISCLAIMER.md back to development mode and increment description version.
  9. Merge release candidate and commit.
  10. Open PR/MR in development state.

Disclaimer

This software is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The software has not received final approval by the U.S. Geological Survey (USGS). No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software.

CC0

sbtools's People

Contributors

aappling-usgs avatar amart90 avatar dblodgett-usgs avatar hcorson-dosch-usgs avatar jesse-ross avatar jiwalker-usgs avatar katrinleinweber avatar lawinslow avatar ldecicco-usgs avatar olivroy avatar sckott avatar wdwatkins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sbtools's Issues

query_item_identifier fails to return an existing item

I think query_item_identifier is supplying a new, incorrect session when the session arg is not explicitly set. This may be because this test passes if session is undeclared in the args, even though the default to session is current_session():

if (missing(session) || is.null(session)) {
    session = handle(pkg.env$url_base)
}

So there might be two problems. 1, the session gets reset to handle(pkg.env$url_base) even when you think you're choosing the default arg value. 2, the value that session gets set to is different from the value returned by current_session(). See:

> httr::handle(sbtools:::pkg.env$url_base)
# Host: https://www.sciencebase.gov/catalog/ <0x0000000011f31b10>
> current_session()
# Host: https://www.sciencebase.gov/catalog/ <0x0000000011cdbc00>

Not sure why that different session w/ same URL is tripping things up, but this turns into a problem for me in continental stream metabolism, where I created this item:

# find the project root ("Continental Stream Metabolism" folder)
true_sites_root <- sbtools::query_item_identifier(scheme="mda_streams", type="project_root", key="uber")$id
project_root <- sbtools::item_get_parent(true_sites_root)

# create a sandbox sites folder to work with from here on
sites_root <- sbtools::item_create(parent_id = project_root, title="Sites_dev")
sbtools::item_update_identifier(id=sites_root, scheme="mda_streams_dev", type="sites_root", key="uber") # true sites root currently has type="project_root". i find that confusing.
sites_root_saved <- sites_root
> sites_root_saved
# [1] "55568a6fe4b0a92fa7e9cf2d"

then i tried to find it again, but got an empty data.frame:

sites_root <- sbtools::query_item_identifier(scheme="mda_streams_dev", type="sites_root", key="uber")
> sites_root
# data frame with 0 columns and 0 rows

but if i explicitly declare session=current_session(), it works.

sites_root <- sbtools::query_item_identifier(scheme="mda_streams_dev", type="sites_root", key="uber", session=current_session())
> sites_root
#       title                       id
#1 Sites_dev 55568a6fe4b0a92fa7e9cf2d

add a get_parent function?

Does this fit in sbtools?

quick way to do it is this:

get_parent = function(item_id){

  url <- sprintf("https://www.sciencebase.gov/catalog/item/%s?format=json&fields=parentId", item_id)

  parent_id <- fromJSON(txt = url)$parentId

  url <- sprintf("https://www.sciencebase.gov/catalog/item/%s?format=json&fields=title", parent_id)
  parent_site <- fromJSON(txt = url)$title
  return(parent_site)
}

but we would ask for item json twice per id. Any better way to do this?

get smarter about using httr::content()

From httr::content:
When using content() in a package, DO NOT use on as = "parsed". Instead, check the mime-type is what you expect, and then parse yourself. This is safer, as you will fail informatively if the API changes, and you will protect yourself against changes to httr.

See r-lib/httr#246

Currently, this is speckled throughout the package.

ncdf

saw mention of ncdf in the proposal and had a comment (and I don't work with ncdf formats much, so I'm kinda clueless here):

I hear from Roy Mendelssohn at NOAA that the ncdf package http://cran.r-project.org/web/packages/ncdf/index.html only works with older ncdf format. True? But this pkg is nice b/c it installs on all OS'es.

Roy said to instead use ncdf4 http://cran.r-project.org/web/packages/ncdf4/index.html - but there are no windows binaries, and it sounds as though there never will be.

Thoughts on this? I ask b/c if we need ncdf4 functionality, that could lead to a problem for windows users that aren't super savvy (aka that couldn't do install from source, etc.)

doc on query_item_id wrong for no match return

Says it returns a NULL when no matching item found:

query_item_identifier(scheme = 'mda_streams', type= NULL, key = sites, session, limit = 10000)
data frame with 0 columns and 0 rows

example on item_list_children returns d.f of NAs

item_list_children('5060b03ae4b00fc20c4f3c8b')
   id
1  NA
2  NA
3  NA
4  NA
5  NA
6  NA
7  NA
8  NA
9  NA
10 NA
11 NA
12 NA
13 NA
14 NA
15 NA
16 NA
17 NA
18 NA
19 NA
20 NA
There were 40 warnings (use warnings() to see them)

File Download feedback

from @dblodgett-usgs

would be nice if you download files function returned the paths to the files or at least the file names it downloads instead of true, but that’s minor.

We could return a list of all files downloaded with their paths, regardless of if the user supplied file names or if names were generated. This would be more useful than TRUE.

improve auth usability

Might take a look at what we are doing w/ hazardItems auth

After auth works, we set pkg.env token and username, and do a token check when session is needed.

Some initial feedback

  • curl options - I'd suggest using ... or similar to allow curl options to be passed
    into GET/POST/etc.
  • authentication: I haven't dug into this, but i imagine passing in the
    session bit to each function call makes sense more so if there are multiple
    potential accounts a user could have, while if a user will only ever have
    one, then perhaps the auth function could be run once by a user, and then
    run internally within function calls so the user doesn't have to worry about
    it

looking through this more, but these are the first two things...

wrap RCurl error for outdated token

make this make more sense to people that their session is out of date:

Error in RCurl::curlPerform(curl = handle$handle, .opts = curl_opts$values) : 
  Stale CURL handle being passed to libcurl 

... applies to multiple internal GET/PUT calls in item_update_identifier

as I'm adding dots arguments today to pass to curl, this is the first case i've seen where the same dots have to be passed to multiple functions - in this case, within item_update_identifier, the dots go to both query_item_identifier and to sbtools_PUT. I'm introducing this oddity because I don't know how else to handle the dots - should we be accepting two different well-formed, single-item config lists rather than allowing the user to pass in the info in standard curl dots format?

New disclaimer

We need to adjust the language of the disclaimer on GRAN to:

.onAttach <- function(libname, pkgname) {
  packageStartupMessage("This information is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The information has not received final approval by the U.S. Geological Survey (USGS) and is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the information.")
}

item_append_files wipes out identifier

library(sbtools)

myFolder <- "550057b9e4b02419550fa5f7"
session <- authenticate_sb(username = "xxx")

folderID <- item_create(myFolder, 
                        title="Test Workflow",
                        session=session)

fileStuff <- item_append_files(folderID,
                               files = "fluxBiasMulti.pdf",
                               session = session
)

x <- item_update_identifier(folderID, 'test', 'workflow', "Unique thing", session )
# This wipes out the identifier:
fileStuff <- item_append_files(folderID, files = "multiPlotDataOverview.pdf", session = session)

I can add:

x <- item_update_identifier(folderID, 'test', 'workflow', "Unique thing", session )

after each item_append_files...but sometimes I get an error that there already is an identifier...presumably adding in a lag might prevent that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.