Code Monkey home page Code Monkey logo

getremotedata's Introduction

getRemoteData

getRemoteData is an R package that offers a common grammar to query and import remote data (i.e. data stored on the cloud) from heterogeneous sources. Overall, this package attempts to facilitate and speed-up the painfull and time-consuming data import / download process for some well-known and widely used environmental / climatic data (e.g. MODIS, GPM, etc.) as well as other sources (e.g. VIIRS DNB, etc.). You will take the best of getRemoteData if you work at local to regional spatial scales, i.e. typically from few decimals to a decade squared degrees. For larger areas, other packages might be more relevant (see section Other relevant packages ).

Why such a package ?

Modeling an ecological phenomenon (e.g. species distribution) using environmental data (e.g. temperature, rainfall) is quite a common task in ecology. The data analysis workflow generally consists in :

  • importing, tidying and summarizing various environmental data at geographical locations and dates of interest ;
  • creating explicative / predictive models of the phenomenon using the environmental data.

Data of interest for a specific study are usually heterogeneous (various sources, formats, etc.). Downloading long time series of several environmental data “manually” (e.g. through user-friendly web portals) is time consuming, not reproducible and prone to errors. In addition, when downloaded manually, spatial datasets might cover quite large areas, or include many dimensions (e.g. the multiple bands for a MODIS product). If your aera of interest is smaller or if you do not need all the dimensions, why donwloading the whole dataset ? Whenever possible (i.e. made possible by the data provider - check section Behind the scene… how it works), getRemoteData enables to download the data strictly for your region and dimensions of interest.

When should you use getRemoteData ?

getRemoteData can hopefully help if you recognize yourself in one or more of the following points :

  • work at a local to regional spatial scale ;
  • need to import data from various sources (e.g. MODIS, GPM, etc.) ;
  • are interested in importing long climatic / environmental time-series ;
  • have a slow internet connection ;
  • care about the digital environmental impact of your work.

getRemoteData is developed in the frame of PhD project, and the sources of data implemented in the package are hence those that I use in my work. Sources of data are mostly environmental / climatic data, but not exclusively. Have a look at the function getAvailableDataSources to check which sources are already implemented !

Installation

You can install the development version of getRemoteData from GitHub with:

# install.packages("devtools")
devtools::install_github("ptaconet/getRemoteData")

Get the data sources implemented in getRemoteData

You can get the data sources/collections downloadable with getRemoteData and details about each of them with :

getRemoteData::getAvailableDataSources()
#> Warning: replacing previous import 'dplyr::intersect' by
#> 'lubridate::intersect' when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::union' by 'lubridate::union'
#> when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::setdiff' by 'lubridate::setdiff'
#> when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::select' by 'raster::select' when
#> loading 'getRemoteData'
#> Warning: replacing previous import 'lubridate::origin' by 'raster::origin'
#> when loading 'getRemoteData'

source

covariate

collection

getRemoteData_import_func

getRemoteData_prepare_func

url_metadata

is_timeSeries

provider

long_name

version

DOI

spatial_resolution_m

temporal_resolution

temporal_resolution_unit

spatial_coverage

url_programmatic_access

url_manual_access

status

MODIS

Temperature

MOD11A1.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/mod11a1v006/

TRUE

NASA

MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MOD11A1.006

1000

1

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.006/contents.html

https://search.earthdata.nasa.gov/search?q=MOD11A1&ok=MOD11A1

Implemented

MODIS

Temperature

MYD11A1.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/myd11a1v006/

TRUE

NASA

MODIS/Aqua Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MYD11A1.006

1000

1

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MYD11A1.006/contents.html

https://search.earthdata.nasa.gov/search?q=MYD11A1&ok=MYD11A1

Implemented

MODIS

Temperature

MOD11A2.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/mod11a2v006/

TRUE

NASA

MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1 km SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MOD11A2.006

1000

8

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A2.006/contents.html

https://search.earthdata.nasa.gov/search?q=MOD11A2&ok=MOD11A2

Implemented

MODIS

Temperature

MYD11A2.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/myd11a2v006/

TRUE

NASA

MODIS/Aqua Land Surface Temperature/Emissivity 8-Day L3 Global 1 km SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MYD11A2.006

1000

8

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MYD11A2.006/contents.html

https://search.earthdata.nasa.gov/search?q=MYD11A2&ok=MYD11A2

Implemented

MODIS

Vegetation indices

MOD13Q1.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/mod13q1v006/

TRUE

NASA

MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MOD13Q1.006

250

16

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MOD13Q1.006/contents.html

https://search.earthdata.nasa.gov/search?q=MOD13Q1&ok=MOD13Q1

Implemented

MODIS

Vegetation indices

MYD13Q1.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/myd13q1v006/

TRUE

NASA

MODIS/Aqua Vegetation Indices 16-Day L3 Global 250m SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MYD13Q1.006

250

16

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MYD13Q1.006/contents.html

https://search.earthdata.nasa.gov/search?q=MYD13Q1&ok=MYD13Q1

Implemented

MODIS

Evapotranspiration

MOD16A2.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/mod16a2v006/

TRUE

NASA

MODIS/Terra Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MOD16A2.006

500

8

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MOD16A2.006/contents.html

https://search.earthdata.nasa.gov/search?q=MOD16A2&ok=MOD16A2

Implemented

MODIS

Evapotranspiration

MYD16A2.v006

getData_modis()

prepareData_modis()

https://lpdaac.usgs.gov/products/myd16a2v006/

TRUE

NASA

MODIS/Aqua Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V006

6

https://dx.doi.org/10.5067/MODIS/MYD16A2.006

500

8

day

Global

https://opendap.cr.usgs.gov/opendap/hyrax/MYD16A2.006/contents.html

https://search.earthdata.nasa.gov/search?q=MYD16A2&ok=MYD16A2

Implemented

GPM

Rainfall

GPM_3IMERGDF

getData_gpm()

prepareData_gpm()

https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_06/summary

TRUE

NASA

GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V06

6

https://doi.org/10.5067/GPM/IMERGDF/DAY/06

10000

1

day

Global

https://gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGDF.06/

https://search.earthdata.nasa.gov/search?q=GPM_3IMERGDF_06

Implemented

GPM

Rainfall - Night of catch

GPM_3IMERGHH

getData_gpm()

prepareData_gpm()

https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_06/summary

TRUE

NASA

GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V06

6

https://doi.org/10.5067/GPM/IMERG/3B-HH/06

10000

30

minute

Global

https://gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/

Implemented

SMAP

Soil Moisture Active Passive

SPL3SMP_E

getData_smap()

prepareData_smap()

https://nsidc.org/data/spl3smp_e#sm

TRUE

NASA

SMAP Enhanced L3 Radiometer Global Daily 9 km EASE-Grid Soil Moisture, Version 3

3

https://doi.org/10.5067/T90W6VRLCBHI

9000

1

day

Global

https://n5eil02u.ecs.nsidc.org/opendap/SMAP/SPL3SMP_E.002/contents.html

https://nsidc.org/data/spl3smp_e?qt-data_set_tabs=1#qt-data_set_tabs

TAMSAT

Rainfall

TAMSAT

getData_tamsat()

prepareData_tamsat()

https://www.tamsat.org.uk/about

TRUE

TAMSAT

Tropical Applications of Meteorology using SATellite data and ground-based observations

3

http://doi.org/10.1038/sdata.2017.63

4000

1

day

Africa

https://www.tamsat.org.uk/data/archive

https://www.tamsat.org.uk/data/rfe/index.cgi

Implemented

ERA5

Wind - Night of catch

ERA5

getData_era5()

prepareData_era5()

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview

TRUE

Copenicus

ERA5

5

27000

1

hour

Global

https://dominicroye.github.io/en/2018/access-to-climate-reanalysis-data-from-r/

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form

Implemented

MIRIADE

Apparent magnitude of the Moon - Night of catch

MIRIADE

getData_miriade()

http://vo.imcce.fr/webservices/miriade/

TRUE

IMCCE

The Virtual Observatory Solar System Object Ephemeris Generator

2

NA

NA

NA

Global

http://vo.imcce.fr/webservices/miriade/?ephemcc

http://vo.imcce.fr/webservices/miriade/?forms

Implemented

VIIRS DNB

Nighttime lights - Night of catch

VIIRS DNB

getData_viirsdnb()

https://eogdata.mines.edu/download_dnb_composites.html

TRUE

NOAA

Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB)

NA

450

1

month

Global

https://gis.ngdc.noaa.gov/arcgis/rest/services/NPP_VIIRS_DNB/Monthly_AvgRadiance/ImageServer/

https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html

Implemented

SRTMGL1_v004

Elevation and elevation derivatives

SRTMGL1_v003

getData_srtm()

https://lpdaac.usgs.gov/products/srtmgl1v003/

FALSE

NASA

Digital Elevation Model from the NASA Shuttle Radar Topography Mission Global 1 arc second

3

https://dx.doi.org/10.5067/MEASURES/SRTM/SRTMGL1.003

30

NA

NA

Global

https://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/

https://search.earthdata.nasa.gov/search/collection-details?p=C1000000240-LPDAAC_ECS&q=SRTM&ok=SRTM

Implemented

CGLS-LC101

Land cover

CGLS-LC100

getData_cgls()

https://land.copernicus.eu/global/products/lc

FALSE

Copernicus Global Land Operations

Moderate dynamic land cover 100m 2015

2

100

1

year

Global

NA

https://lcviewer.vito.be/download

Not implemented

CCI_LC_S2_AFRICA

Land cover

CCI-LS

http://2016africalandcover20m.esrin.esa.int/

FALSE

Climate Change Initiative Land Cover (ESA)

S2 prototype Land Cover 20m map of Africa 2016

NA

20

NA

NA

Africa

NA

http://2016africalandcover20m.esrin.esa.int/download.php

Not implemented

HRSL

Built-up, Population

HRSL

getData_hrsl()

https://ciesin.columbia.edu/repository/hrsl/#over

FALSE

Facebook Connectivity Lab and Center for International Earth Science Information Network

High Resolution Settlement Layer

1

30

NA

NA

Available for some countries : see list here : https://ciesin.columbia.edu/repository/hrsl/#data

NA

https://ciesin.columbia.edu/repository/hrsl/#data

Not implemented

WorldPop_100m_Population

Population

WorldPop_100m_Population

getData_worldpop()

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107042

FALSE

WorldPop

Alpha version 2014 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates

3

10.5258/SOTON/WP00033 10.5258/SOTON/WP00065

100

NA

NA

Africa, Asia, South America, Oceania

NA

https://www.worldpop.org/geodata/listing?id=16

Not implemented

OpenSteetMap

Roads

OpenSteetMap

https://www.openstreetmap.org/about

FALSE

OpenSteetMap

OpenSteetMap

NA

NA

NA

NA

NA

Global

https://www.openstreetmap.org/

Not implemented

Global_Surface_Water

Open waters / Wetlands

Global_Surface_Water

getData_gsw()

https://storage.cloud.google.com/global-surface-water/downloads_ancillary/DataUsersGuidev2018.pdf

FALSE

JRC

Global Surface Water

1

10.1038/nature20584

30

1

year

Global

NA

https://global-surface-water.appspot.com/download

Not implemented

Example

Say you want to download over a 3500km2 region of interest:

library(getRemoteData)
library(sf)
library(purrr)
# Read the region of interest as a sf object. Here : the Korhogo area in Côte D'Ivoire
roi<-sf::st_read(system.file("extdata/ROI_example.kml", package = "getData"),quiet=T)
# Set-up your time frame of interest. 
time_frame<-c("2017-05-01","2017-06-10")
# Set-up your credentials to EarthData
username_EarthData<-"my.earthdata.username"
password_EarthData<-"my.earthdata.username"
# Download the MODIS LST TERRA daily products in the current working directory
# Setting the argument 'download' to FALSE will return the URLs of the products, without downloading them 
dl_modis<-getRemoteData::getData_modis(timeRange = time_frame,
                                     roi = roi,
                                     collection="MOD11A1.006",
                                     dimensions=c("LST_Day_1km","LST_Night_1km"),
                                     download = T,
                                     destFolder=getwd(),
                                     username=username_EarthData,
                                     password=password_EarthData,
                                     parallelDL=T #setting to F will download the data linearly
                                     )
head(dl_modis)
# Download the GPM daily products in the current working directory
dl_gpm<-getRemoteData::getData_gpm(timeRange = time_frame,
                                     roi = roi,
                                     collection="GPM_3IMERGDF.06",
                                     dimensions=c("precipitationCal"),
                                     download = T,
                                     destFolder=getwd(),
                                     username=username_EarthData,
                                     password=password_EarthData,
                                     parallelDL=T
                                     )
head(dl_gpm)

# Get the data downloaded as a list of rasters
rasts_modis<-dl_modis$destfile %>%
  purrr::map(~getRemoteData::prepareData_modis(.,"LST_Day_1km")) %>%
  set_names(dl_modis$name)

rasts_gpm<-dl_gpm$destfile %>%
  purrr::map(~getRemoteData::prepareData_gpm(.,"precipitationCal")) %>%
  set_names(dl_gpm$name)

The functions of getRemoteData all work the same way :

  • timeRange is your date / time frame of interest (eventually including hours for the data with less that daily resolution) ;
  • roi is your area of interest (as an sf object, either point or polygon) ;
  • destfolder is the data destination folder ;
  • by default, the function does not download the dataset. It returns a data.frame with the URL(s) to download the dataset(s) of interest given the input arguments. To download the data, set the download argument to TRUE ;
  • other arguments are specific to each data product (e.g. collection, dimensions,username,password)

Absence of the timeRange (resp. roi) arguments in a function means that the data of interest do not have any time (resp. spatial) dimension.

Have a look at the vignette Automatic extraction of spatial-temporal environmental data within buffers around sampling points to get an example of what you can do with getRemoteData !

Current limitations

The package is at a very early stage of development. Here are some of the current limitations and ideas of future developments :

  • MODIS data cannot be downloaded if your area of interest covers multiple MODIS tiles (for an overview of MODIS tiles go here);

Behind the scene… how it works

As much as possible, when implemented by the data providers, getRemoteData uses web services or APIs to download the data. Web services are in few words standard web protocols that enable to filter the data directly at the downloading phase. Filters can be spatial, temporal, dimensional, etc. Example of widely-used web services / data transfer protocols for geospatial timeseries are OGC WFS or OPeNDAP. If long time series are queried, getRemoteData speeds-up the downloading time by parallelizing it.

Other relevant packages

getremotedata's People

Contributors

ptaconet avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.