Code Monkey home page Code Monkey logo

geoknife's Introduction

geoknife package version 1.6.10

CRAN Download Count

Tools for geo-web processing of gridded data via the Geo Data Portal. geoknife slices up gridded data according to overlap with irregular features, such as watersheds, lakes, points, etc. The result is subsetted data in plain text, NetCDF, geotiff or other formats.

GDP


Installing geoknife

To install the geoknife from CRAN:

install.packages("geoknife")

Or to install the current development version of the package:

install.packages("remotes")
remotes::install_github('DOI-USGS/geoknife')

Reporting bugs

Please consider reporting bugs and asking questions on the Issues page: https://github.com/DOI-USGS/geoknife/issues

Code of Conduct

We want to encourage a warm, welcoming, and safe environment for contributing to this project. See the code of conduct for more information.

Package Support

The Water Mission Area of the USGS supports the development and maintenance of geoknife through September 2018, and most likely further into the future. Resources are available primarily for maintenance and responding to user questions. Priorities on the development of new features are determined by the geoknife development team.

USGS

geoknife overview

The geoknife package was created to support web-based geoprocessing of large gridded datasets according to their overlap with landscape (or aquatic/ocean) features that are often irregularly shaped. geoknife creates data access and subsequent geoprocessing requests for the USGS’s Geo Data Portal to carry out on a web server. The results of these requests are available for download after the processes have been completed. This type of workflow has three main advantages: 1) it allows the user to avoid downloading large datasets, 2) it avoids reinventing the wheel for the creation and optimization of complex geoprocessing algorithms, and 3) computing resources are dedicated elsewhere, so geoknife operations do not have much of an impact on a local computer.

geoknife interacts with a remote server to figure out what types of processing capabilities are available, in addition to seeing what types of geospatial features are already available to be used as an area of interest (commonly, these are user-uploaded shapefiles). Because communication with web resources are central to geoknife operations, users must have an active internet connection.

The main elements of setting up and carrying out a geoknife ‘job’ (geojob) include defining the feature of interest (the stencil argument in the geoknife function), the gridded web dataset to be processed (the fabric argument in the geoknife function), and the the processing algorithm parameters (the knife argument in the geoknife function). The status of the geojob can be checked with check, and output can be loaded into a data.frame with result.

What can geoknife do?

Define a stencil that represents the geographic region to slice out of the data
library(geoknife)
# from a single point
stencil <- simplegeom(c(-89, 46.23))
   # -- or --
# from a collection of named points
stencil <- simplegeom(data.frame(
              'point1' = c(-89, 46), 
              'point2' = c(-88.6, 45.2)))
   # -- or --
#for a state from a web available dataset
stencil <- webgeom('state::New Hampshire')
stencil <- webgeom('state::New Hampshire,Wisconsin,Alabama')
   # -- or --
#for HUC8s from a web available dataset
stencil <- webgeom('HUC8::09020306,14060009')
Define a fabric that represents the underlying data
# from the prism dataset:
fabric <- webdata('prism')
   # -- or --
# explicitly define webdata from a list:
fabric <- webdata(list(
            times = as.POSIXct(c('1895-01-01','1899-01-01')),
            url = 'https://cida.usgs.gov/thredds/dodsC/prism_v2',
            variables = 'ppt'))
# modify the times field:
times(fabric) <- as.POSIXct(c('2003-01-01','2005-01-01'))
Create the processing job that will carry out the subsetting/summarization task
job <- geoknife(stencil, fabric, wait = TRUE)

# use existing convienence functions to check on the job:
check(job)
## $status
## [1] "Process successful"
## 
## $URL
## [1] "https://labs.waterdata.usgs.gov:443/gdp-process-wps/RetrieveResultServlet?id=3362d55f-6d35-4e21-baf6-51a2583bc8bdOUTPUT"
## 
## $statusType
## [1] "ProcessSucceeded"
## 
## $percentComplete
## [1] "100"

see also:

running(job)
error(job)
successful(job)
Plot the results
data <- result(job)
plot(data[,1:2], ylab = variables(fabric))

Use an email to listen for process completion
job <- geoknife(webgeom('state::New Hampshire'), fabric = 'prism', email = '[email protected]')

geoknife Functions (as of v1.1.5)

Function Title
geoknife slice up gridded data according to overlap with feature(s)
gconfig set or query package settings for geoknife processing defaults
algorithm the algorithm of a webprocess
attribute the attribute of an webgeom
check check status of geojob
download download the results of a geojob
error convenience function for state of geojob
running convenience function for state of geojob
successful convenience function for state of geojob
start start a geojob
cancel cancel a geojob
geom the geom of a webgeom
inputs the inputs of a webprocess
id the process id of a geojob
values the values of a webgeom
result load the output of a completed geojob into data.frame
variables the variables for a webdata object
wait wait for a geojob to complete processing
times the times of a webdata object
url the url of a webdata, webgeom, geojob, or webprocess
version the version of a webgeom or webdata
xml the xml of a geojob
query query datasets or variables

geoknife classes (as of v0.12.0)

Class Title
simplegeom a simple geometric class. Extends sp::SpatialPolygons
webgeom a web feature service geometry
webprocess a web processing service
webdata web data
geojob a geo data portal processing job
datagroup a simple class that contains data lists that can be webdata

What libraries does geoknife need?

This version requires httr, sp, and XML. All of these packages are available on CRAN, and will be installed automatically when using the install.packages() instructions above.

Check Notes:

In addition to typical R package checking, a Dockerfile is included in this repository. Once built, it can be run with the following command.

docker build -t geoknife_test .

docker run --rm -it -v %cd%:/src geoknife_test /bin/bash -c "cp -r /src/* /check/ && cp /src/.Rbuildignore /check/ && cd /check && Rscript -e 'devtools::build()' && R CMD check --as-cran ../geoknife_*"

Release Procedure

For release of the sbtools package, a number of steps are required.

  1. Ensure all checks pass and code coverage is adequate.
  2. Ensure NEWS.md reflects updates in version.
  3. Update DESCRIPTION to reflect release version.
  4. Convert DISCLAIMER.md to approved language and rebuild README.Rmd.
  5. Create release candidate branch and commit release candidate.
  6. Build source package and upload to CRAN.
  7. Once accepted to CRAN, tag release candidate branch an push to repositories.
  8. Change DISCLAIMER.md back to development mode and increment description version.
  9. Merge release candidate and commit.
  10. Open PR/MR in development state.

Disclaimer

This software is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The software has not received final approval by the U.S. Geological Survey (USGS). No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software.

CC0

geoknife's People

Contributors

aappling-usgs avatar dblodgett-usgs avatar jiwalker-usgs avatar ldecicco-usgs avatar sckott avatar wdwatkins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoknife's Issues

getDataIDs failing

XPath error : Undefined namespace prefix
XPath error : Invalid expression
 Show Traceback

 Rerun with Debug
 Error in xpathApply.XMLInternalDocument(ndoc, path, fun, ..., namespaces = namespaces,  : 
  error evaluating xpath expression //ns:LiteralData 

New disclaimer

We need to adjust the language of the disclaimer on GRAN to:

.onAttach <- function(libname, pkgname) {
  packageStartupMessage("This information is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The information has not received final approval by the U.S. Geological Survey (USGS) and is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the information.")
}

webdata options

This is a documentation issue. webdata('prism') looks awfully nice. i want to do it for some other dataset, e.g. the one at http://cida.usgs.gov/thredds/dodsC/mows/sr.html. How do I reference that one? webdata('mows/sr'), webdata('sr'), and webdata('mows') all fail.

Jordan already knows about this general issue. Jordan writes: "One huge gap in the package right now is the user's ability to know what gridded datasets they can process. I simply won't have that in place before next week, but I know what I need to do to get that in. So, all of the examples are for PRISM. "

broken job can't be checked

I thought I'd try a different data source. I did this:

> fabric <- webdata(url="http://cida.usgs.gov/thredds/dodsC/mows/sr")
> stencil <- webgeom("state::CO")
> job <- geoknife(stencil, fabric) #ignore the checkAttrNamespaces warning for now - it's a known issue
Warning message:
In checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning) :
  missing namespace definitions for prefix(es) xlink

When I went to check the job, I couldn't see the status and got this error instead:

> check(job)
Error in match.arg(state, c("none", "ProcessStarted", "Process successful",  : 
  'arg' should be one ofnone”, “ProcessStarted”, “Process successful”, “ProcessFailed”, “unknown> traceback()
5: stop(gettextf("'arg' should be one of %s", paste(dQuote(choices), 
       collapse = ", ")), domain = NA)
4: match.arg(state, c("none", "ProcessStarted", "Process successful", 
       "ProcessFailed", "unknown"))
3: setJobState(process$status)
2: check(job)
1: check(job)

after this, I can't start a new job because I always get this error, no matter how many times I've called check(job) or error(job):

Error in start(geojob) : 
  Cannot start a new geojob until a previous one is completed or is error. See "check(geojob)"

job status not changing until error() call?

I've passed in a bad stencil and/or fabric. The job starts with a warning:

> job <- geoknife(stencil, fabric)
Warning message:
In checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning) :
  missing namespace definitions for prefix(es) xlink

I think, "shoot. better fix that stencil and try again." so i try but can't start a new process:

> job <- geoknife(stencil, fabric)
Error in start(geojob) : 
  Cannot start a new geojob until a previous one is completed or is error. See "check(geojob)"
In addition: Warning message:
In checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning) :
  missing namespace definitions for prefix(es) xlink

I run the above line repeatedly, each time getting the error message.

I check the job status. It's at 'error'.

> error(job)
[1] TRUE

And now, magically, I can attempt to run again:

> job <- geoknife(stencil, fabric)
Warning message:
In checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning) :
  missing namespace definitions for prefix(es) xlink

Something about calling error(), check(), etc. seems to actually change the state of the job.

Fresh-eyes Review

I tried not to cheat by looking at source code (few users will).

  1. Add this type of example to webdata
    webdata('prism', times=as.POSIXct(c('1990-01-01', '1995-01-01')))
  2. Can you change the webdata link in geoknife so it goes to method instead of class?
  3. Maybe call out "slots" as in webdata so it is clear those can be passed as "..."?
  4. Make URL an exposed field in webgeom (not clear what URL is set).

...in progress

check mimetype on wcs subset

"org.n52.wps.server.ExceptionReport: Could not determine input format because none of the supported formats match the given schema ("null") and encoding ("null"). (A mimetype was not specified)"

for experimental NED service

error with getShapefiles() when SB WFS is used

setWFS(gk) <- 'https://www.sciencebase.gov/catalogMaps/mapping/ows/50d35261e4b062c7914ebd14'
getShapefiles(gk)

Error: failed to load external entity "https://www.sciencebase.gov/catalogMaps/mapping/ows/50d35261e4b062c7914ebd14?service=WFS&version=1.1.0&request=GetCapabilities"

new classes and workflow for v1.0

# geoknife v1.0

# -- new geoknife classes -- 
geojob
webgeom
webdata
webprocess

# -creation methods for webdata
data <- webdata(list) # create and populate slots
data <- webdata(algorithm) # empty webdata w/ slots based on algorithm
data <- webdata(geojob) # get dataset object from previous job

# -webdata methods
query(webdata,'variables')
query(webdata) # get all metadata from dataset
query(webdata,'times') # get start and stop times of dataset
times(webdata)<- list #start_time = POSIXct,...





...
# -creation methodd for webgeom
webgeom <- webgeom(WFS = "cida...")
webgeom <- webgeom(geojob)

# -webgeom methods
query(webgeom,'shapes')
query(webgeom,'attributes')
query(webgeom,'IDs')
IDs(webgeom) <- 'character'
IDs(webgeom)
attribute(webgeom) <- 'character'
attribute(webgeom)
shape(webgeom) <- 'character'
shape(webgeom)


# geoknife would be the verb for the process starting ("slice this!"), creates geojob object
geojob <- geoknife(geom = "numeric", data = "character", excecute=TRUE, 
    knife = "Area Grid Statistics (weighted)"...) # as a quickstart
geojob <- geoknife(stencil = "sp::SpatialPolygonsDataFrame", fabric = dataset, excecute=TRUE, ...) # as a quickstart
geojob <- geoknife(stencil = "webgeom", fabric = "webdata", excecute=TRUE, ...) # as a quickstart
geojob <- geoknife(geom = "geojob", fabric = "character", excecute=TRUE, ...) # as a quickstart
geojob <- geoknife(stencil = "geojob", fabric = "character", knife = "webprocess", excecute = TRUE, ...) # as a quickstart
geojob <- geoknife(geojob, excecute = TRUE) # which just re-runs an old job or a job that was built w/ excecute=F

# -- supported input types that are handled w/ appropriate dispatch --
geom <- XML::XML
geom <- sp::SpatialPointsDataFrame
geom <- sp::SpatialPolygonsDataFrame
geom <- webgeom
geom <- numeric # linear ring of lat/lon
geom <- numeric # single lat/lon point
geom <- geojob # get geom from previous job
data <- "character" # for shortname quickstart for datasets lists that are built into the package (e.g., prism)



# -- methods for geojob -- 
algorithm(geojob)
algorithm(geojob)<-
geom(geojob)
geom(geojob)<-
webdata(geojob)
webdata(geojob)<-
summarize(geojob)
print(geojob)
check(geojob)
isRunning(geojob)
isError(geojob)
data.frame(geojob) # was loadOutput()
execute(geojob)

# -- functions in the geoknife package
getAlgorithms() # can specify WPS = "cida..."
getAlgorithms(webprocess)

do unique call on feature attr list

I mentioned this earlier, but shapefiles that have shared attributes (STATE_NAME, HUC*, etc). Doing a unique keeps these lists shorter and more useful.

add query data function

to get hosted and referenced resources:

POST to http://cida.usgs.gov/gdp/proxy/http://cida.usgs.gov/gdp/geonetwork/srv/en/csw

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" service="CSW" version="2.0.2" resultType="results" outputSchema="http://www.isotc211.org/2005/gmd" maxRecords="1000">
<csw:Query typeNames="csw:Record">
<csw:ElementSetName>full</csw:ElementSetName>
<csw:Constraint version="1.1.0">
<ogc:Filter xmlns:ogc="http://www.opengis.net/ogc">
<ogc:And>
<ogc:Or>
<ogc:PropertyIsLike matchCase="false" wildCard="*" singleChar="." escapeChar="!">
<ogc:PropertyName>Anytext</ogc:PropertyName>
<ogc:Literal>gov.usgs.cida.gdp.wps.algorithm.FeatureCategoricalGridCoverageAlgorithm</ogc:Literal>
</ogc:PropertyIsLike>
<ogc:PropertyIsLike matchCase="false" wildCard="*" singleChar="." escapeChar="!">
<ogc:PropertyName>Anytext</ogc:PropertyName>
<ogc:Literal>gov.usgs.cida.gdp.wps.algorithm.FeatureCoverageOPeNDAPIntersectionAlgorithm</ogc:Literal>
</ogc:PropertyIsLike>
<ogc:PropertyIsLike matchCase="false" wildCard="*" singleChar="." escapeChar="!">
<ogc:PropertyName>Anytext</ogc:PropertyName>
<ogc:Literal>gov.usgs.cida.gdp.wps.algorithm.FeatureGridStatisticsAlgorithm</ogc:Literal>
</ogc:PropertyIsLike>
<ogc:PropertyIsLike matchCase="false" wildCard="*" singleChar="." escapeChar="!">
<ogc:PropertyName>Anytext</ogc:PropertyName>
<ogc:Literal>gov.usgs.cida.gdp.wps.algorithm.FeatureWeightedGridStatisticsAlgorithm</ogc:Literal>
</ogc:PropertyIsLike>
</ogc:Or>
</ogc:And>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.