getRemoteData
is an R package that offers a common grammar to query
and import remote data (i.e. data stored on the cloud) from
heterogeneous sources. Overall, this package attempts to facilitate
and speed-up the painfull and time-consuming data import /
download process for some well-known and widely used environmental /
climatic data (e.g. MODIS,
GPM, etc.) as
well as other sources (e.g. VIIRS
DNB,
etc.). You will take the best of getRemoteData
if you work at local
to regional spatial scales, i.e. typically from few decimals to a
decade squared degrees. For larger areas, other packages might be more
relevant (see section Other relevant
packages ).
Why such a package ?
Modeling an ecological phenomenon (e.g. species distribution) using environmental data (e.g. temperature, rainfall) is quite a common task in ecology. The data analysis workflow generally consists in :
- importing, tidying and summarizing various environmental data at geographical locations and dates of interest ;
- creating explicative / predictive models of the phenomenon using the environmental data.
Data of interest for a specific study are usually heterogeneous (various
sources, formats, etc.). Downloading long time series of several
environmental data “manually” (e.g. through user-friendly web portals)
is time consuming, not reproducible and prone to errors. In addition,
when downloaded manually, spatial datasets might cover quite large
areas, or include many dimensions (e.g. the multiple bands for a MODIS
product). If your aera of interest is smaller or if you do not need all
the dimensions, why donwloading the whole dataset ? Whenever possible
(i.e. made possible by the data provider - check section Behind the
scene… how it works),
getRemoteData
enables to download the data strictly for your region
and dimensions of interest.
When should you use getRemoteData
?
getRemoteData
can hopefully help if you recognize yourself in one or
more of the following points :
- work at a local to regional spatial scale ;
- need to import data from various sources (e.g. MODIS, GPM, etc.) ;
- are interested in importing long climatic / environmental time-series ;
- have a slow internet connection ;
- care about the digital environmental impact of your work.
getRemoteData
is developed in the frame of PhD project, and the
sources of data implemented in the package are hence those that I use in
my work. Sources of data are mostly environmental / climatic data, but
not exclusively. Have a look at the function getAvailableDataSources
to check which sources are already implemented !
You can install the development version of getRemoteData
from
GitHub with:
# install.packages("devtools")
devtools::install_github("ptaconet/getRemoteData")
You can get the data sources/collections downloadable with
getRemoteData
and details about each of them with :
getRemoteData::getAvailableDataSources()
#> Warning: replacing previous import 'dplyr::intersect' by
#> 'lubridate::intersect' when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::union' by 'lubridate::union'
#> when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::setdiff' by 'lubridate::setdiff'
#> when loading 'getRemoteData'
#> Warning: replacing previous import 'dplyr::select' by 'raster::select' when
#> loading 'getRemoteData'
#> Warning: replacing previous import 'lubridate::origin' by 'raster::origin'
#> when loading 'getRemoteData'
source |
covariate |
collection |
getRemoteData_import_func |
getRemoteData_prepare_func |
url_metadata |
is_timeSeries |
provider |
long_name |
version |
DOI |
spatial_resolution_m |
temporal_resolution |
temporal_resolution_unit |
spatial_coverage |
url_programmatic_access |
url_manual_access |
status |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MODIS |
Temperature |
MOD11A1.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006 |
6 |
1000 |
1 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MOD11A1&ok=MOD11A1 |
Implemented |
||
MODIS |
Temperature |
MYD11A1.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Aqua Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006 |
6 |
1000 |
1 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MYD11A1.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MYD11A1&ok=MYD11A1 |
Implemented |
||
MODIS |
Temperature |
MOD11A2.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1 km SIN Grid V006 |
6 |
1000 |
8 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A2.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MOD11A2&ok=MOD11A2 |
Implemented |
||
MODIS |
Temperature |
MYD11A2.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Aqua Land Surface Temperature/Emissivity 8-Day L3 Global 1 km SIN Grid V006 |
6 |
1000 |
8 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MYD11A2.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MYD11A2&ok=MYD11A2 |
Implemented |
||
MODIS |
Vegetation indices |
MOD13Q1.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006 |
6 |
250 |
16 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MOD13Q1.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MOD13Q1&ok=MOD13Q1 |
Implemented |
||
MODIS |
Vegetation indices |
MYD13Q1.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Aqua Vegetation Indices 16-Day L3 Global 250m SIN Grid V006 |
6 |
250 |
16 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MYD13Q1.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MYD13Q1&ok=MYD13Q1 |
Implemented |
||
MODIS |
Evapotranspiration |
MOD16A2.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Terra Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V006 |
6 |
500 |
8 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MOD16A2.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MOD16A2&ok=MOD16A2 |
Implemented |
||
MODIS |
Evapotranspiration |
MYD16A2.v006 |
getData_modis() |
prepareData_modis() |
TRUE |
NASA |
MODIS/Aqua Net Evapotranspiration 8-Day L4 Global 500m SIN Grid V006 |
6 |
500 |
8 |
day |
Global |
https://opendap.cr.usgs.gov/opendap/hyrax/MYD16A2.006/contents.html |
https://search.earthdata.nasa.gov/search?q=MYD16A2&ok=MYD16A2 |
Implemented |
||
GPM |
Rainfall |
GPM_3IMERGDF |
getData_gpm() |
prepareData_gpm() |
TRUE |
NASA |
GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V06 |
6 |
10000 |
1 |
day |
Global |
https://gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGDF.06/ |
Implemented |
|||
GPM |
Rainfall - Night of catch |
GPM_3IMERGHH |
getData_gpm() |
prepareData_gpm() |
TRUE |
NASA |
GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V06 |
6 |
10000 |
30 |
minute |
Global |
https://gpm1.gesdisc.eosdis.nasa.gov/opendap/GPM_L3/GPM_3IMERGHH.06/ |
Implemented |
|||
SMAP |
Soil Moisture Active Passive |
SPL3SMP_E |
getData_smap() |
prepareData_smap() |
TRUE |
NASA |
SMAP Enhanced L3 Radiometer Global Daily 9 km EASE-Grid Soil Moisture, Version 3 |
3 |
9000 |
1 |
day |
Global |
https://n5eil02u.ecs.nsidc.org/opendap/SMAP/SPL3SMP_E.002/contents.html |
https://nsidc.org/data/spl3smp_e?qt-data_set_tabs=1#qt-data_set_tabs |
|||
TAMSAT |
Rainfall |
TAMSAT |
getData_tamsat() |
prepareData_tamsat() |
TRUE |
TAMSAT |
Tropical Applications of Meteorology using SATellite data and ground-based observations |
3 |
4000 |
1 |
day |
Africa |
Implemented |
||||
ERA5 |
Wind - Night of catch |
ERA5 |
getData_era5() |
prepareData_era5() |
https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview |
TRUE |
Copenicus |
ERA5 |
5 |
27000 |
1 |
hour |
Global |
https://dominicroye.github.io/en/2018/access-to-climate-reanalysis-data-from-r/ |
https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form |
Implemented |
|
MIRIADE |
Apparent magnitude of the Moon - Night of catch |
MIRIADE |
getData_miriade() |
TRUE |
IMCCE |
The Virtual Observatory Solar System Object Ephemeris Generator |
2 |
NA |
NA |
NA |
Global |
Implemented |
|||||
VIIRS DNB |
Nighttime lights - Night of catch |
VIIRS DNB |
getData_viirsdnb() |
TRUE |
NOAA |
Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) |
NA |
450 |
1 |
month |
Global |
https://gis.ngdc.noaa.gov/arcgis/rest/services/NPP_VIIRS_DNB/Monthly_AvgRadiance/ImageServer/ |
https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html |
Implemented |
|||
SRTMGL1_v004 |
Elevation and elevation derivatives |
SRTMGL1_v003 |
getData_srtm() |
FALSE |
NASA |
Digital Elevation Model from the NASA Shuttle Radar Topography Mission Global 1 arc second |
3 |
30 |
NA |
NA |
Global |
https://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/ |
https://search.earthdata.nasa.gov/search/collection-details?p=C1000000240-LPDAAC_ECS&q=SRTM&ok=SRTM |
Implemented |
|||
CGLS-LC101 |
Land cover |
CGLS-LC100 |
getData_cgls() |
FALSE |
Copernicus Global Land Operations |
Moderate dynamic land cover 100m 2015 |
2 |
100 |
1 |
year |
Global |
NA |
Not implemented |
||||
CCI_LC_S2_AFRICA |
Land cover |
CCI-LS |
FALSE |
Climate Change Initiative Land Cover (ESA) |
S2 prototype Land Cover 20m map of Africa 2016 |
NA |
20 |
NA |
NA |
Africa |
NA |
Not implemented |
|||||
HRSL |
Built-up, Population |
HRSL |
getData_hrsl() |
FALSE |
Facebook Connectivity Lab and Center for International Earth Science Information Network |
High Resolution Settlement Layer |
1 |
30 |
NA |
NA |
Available for some countries : see list here : https://ciesin.columbia.edu/repository/hrsl/#data |
NA |
Not implemented |
||||
WorldPop_100m_Population |
Population |
WorldPop_100m_Population |
getData_worldpop() |
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107042 |
FALSE |
WorldPop |
Alpha version 2014 estimates of numbers of people per grid square, with national totals adjusted to match UN population division estimates |
3 |
10.5258/SOTON/WP00033 10.5258/SOTON/WP00065 |
100 |
NA |
NA |
Africa, Asia, South America, Oceania |
NA |
Not implemented |
||
OpenSteetMap |
Roads |
OpenSteetMap |
FALSE |
OpenSteetMap |
OpenSteetMap |
NA |
NA |
NA |
NA |
NA |
Global |
Not implemented |
|||||
Global_Surface_Water |
Open waters / Wetlands |
Global_Surface_Water |
getData_gsw() |
https://storage.cloud.google.com/global-surface-water/downloads_ancillary/DataUsersGuidev2018.pdf |
FALSE |
JRC |
Global Surface Water |
1 |
10.1038/nature20584 |
30 |
1 |
year |
Global |
NA |
Not implemented |
Say you want to download over a 3500km2 region of interest:
- a 40 days time series of MODIS Terrra Land Surface Temperature (LST) (daily time resolution);
- the same 40 days times series of Global Precipitation Measurement (GPM) (daily time resolution) :
library(getRemoteData)
library(sf)
library(purrr)
# Read the region of interest as a sf object. Here : the Korhogo area in Côte D'Ivoire
roi<-sf::st_read(system.file("extdata/ROI_example.kml", package = "getData"),quiet=T)
# Set-up your time frame of interest.
time_frame<-c("2017-05-01","2017-06-10")
# Set-up your credentials to EarthData
username_EarthData<-"my.earthdata.username"
password_EarthData<-"my.earthdata.username"
# Download the MODIS LST TERRA daily products in the current working directory
# Setting the argument 'download' to FALSE will return the URLs of the products, without downloading them
dl_modis<-getRemoteData::getData_modis(timeRange = time_frame,
roi = roi,
collection="MOD11A1.006",
dimensions=c("LST_Day_1km","LST_Night_1km"),
download = T,
destFolder=getwd(),
username=username_EarthData,
password=password_EarthData,
parallelDL=T #setting to F will download the data linearly
)
head(dl_modis)
# Download the GPM daily products in the current working directory
dl_gpm<-getRemoteData::getData_gpm(timeRange = time_frame,
roi = roi,
collection="GPM_3IMERGDF.06",
dimensions=c("precipitationCal"),
download = T,
destFolder=getwd(),
username=username_EarthData,
password=password_EarthData,
parallelDL=T
)
head(dl_gpm)
# Get the data downloaded as a list of rasters
rasts_modis<-dl_modis$destfile %>%
purrr::map(~getRemoteData::prepareData_modis(.,"LST_Day_1km")) %>%
set_names(dl_modis$name)
rasts_gpm<-dl_gpm$destfile %>%
purrr::map(~getRemoteData::prepareData_gpm(.,"precipitationCal")) %>%
set_names(dl_gpm$name)
The functions of getRemoteData
all work the same way :
timeRange
is your date / time frame of interest (eventually including hours for the data with less that daily resolution) ;roi
is your area of interest (as ansf
object, either point or polygon) ;destfolder
is the data destination folder ;- by default, the function does not download the dataset. It returns a data.frame with the URL(s) to download the dataset(s) of interest given the input arguments. To download the data, set the download argument to TRUE ;
- other arguments are specific to each data product (e.g.
collection
,dimensions
,username
,password
)
Absence of the timeRange
(resp. roi
) arguments in a function means
that the data of interest do not have any time (resp. spatial)
dimension.
Have a look at the vignette Automatic extraction of spatial-temporal
environmental data within buffers around sampling
points
to get an example of what you can do with getRemoteData
!
The package is at a very early stage of development. Here are some of the current limitations and ideas of future developments :
- MODIS data cannot be downloaded if your area of interest covers multiple MODIS tiles (for an overview of MODIS tiles go here);
As much as possible, when implemented by the data providers,
getRemoteData
uses web services or APIs to download the data. Web
services are in few words standard web protocols that enable to filter
the data directly at the downloading phase. Filters can be spatial,
temporal, dimensional, etc. Example of widely-used web services / data
transfer protocols for geospatial timeseries are OGC
WFS or
OPeNDAP. If long time series
are queried, getRemoteData
speeds-up the downloading time by
parallelizing it.
getSpatialData
- [
MODIS
] and [MODISTools
] and [MODISTsp
] - GPM ?