e-sensing / sits Goto Github PK

Satellite image time series in R

Home Page: https://e-sensing.github.io/sitsbook/

License: GNU General Public License v2.0

R 94.74% C++ 5.25% C 0.01%

remote-sensing big-earth-data geospatial cbers image-time-series land-cover-classification landsat sentinel-2 eo-datacubes earth-observation

sits's Introduction

SITS - Satellite Image Time Series Analysis for Earth Observation Data Cubes

Overview

sits is an open source R package for satellite image time series analysis. It enables users to apply machine learning techniques for classifying image time series obtained from earth observation data cubes. The basic workflow in sits is:

Select an image collection available on cloud providers AWS, Brazil Data Cube, Digital Earth Africa, Copernicus Data Space, Digital Earth Australia, Microsoft Planetary Computer, NASA Harmonized Landsat/Sentinel, and Swiss Data Cube.
Build a regular data cube from analysis-ready image collections.
Extract labelled time series from data cubes to be used as training samples.
Perform samples quality control using self-organised maps.
Train machine learning and deep learning models.
Tune deep learning models for improved accuracy.
Classify data cubes using machine learning and deep learning models.
Run spatial-temporal segmentation methods for object-based time series classification.
Post-process classified images with Bayesian smoothing to remove outliers.
Estimate uncertainty values of classified images.
Evaluate classification accuracy using best practices.
Improve results with active learning and self-supervised learning methods.

Conceptual view of data cubes (source: authors)

Documentation

Detailed documentation on how to use sits is available in the e-book “Satellite Image Time Series Analysis on Earth Observation Data Cubes”.

`sits` on Kaggle

Those that want to evaluate the sits package before installing are invited to run the examples available on Kaggle. If you are new to Kaggle, please follow the instructions to set up your account. These examples provide a fast-track introduction to the package. We recommend running them in the following order:

Installation

Pre-Requisites

The sits package relies on the geospatial packages sf, stars, gdalcubes and terra, which depend on the external libraries GDAL and PROJ. Please follow the instructions for installing sits from the Setup chapter of the on-line sits book.

Obtaining `sits`

sits can be installed from CRAN:

install.packages("sits")

The latest supported version is available on github. It may have additional fixes from the version available from CRAN.

devtools::install_github("e-sensing/sits", dependencies = TRUE)

# load the sits library
library(sits)
#> SITS - satellite image time series analysis.
#> Loaded sits v1.5.1.
#>         See ?sits for help, citation("sits") for use in publication.
#>         Documentation avaliable in https://e-sensing.github.io/sitsbook/.

Support for GPU

Classification using torch-based deep learning models in sits uses CUDA compatible NVIDIA GPUs if available, which provides up 10-fold speed-up compared to using CPUs only. Please see the installation instructions for more information on how to install the required drivers.

Building Earth Observation Data Cubes

Image Collections Accessible by `sits`

Users create data cubes from analysis-ready data (ARD) image collections available in cloud services. The collections accessible in sits 1.5.1 are:

Brazil Data Cube - BDC: Open data collections of Sentinel-2, Landsat-8 and CBERS-4 images.
Copernicus Data Space Environment CDSE: Open data collections from the EU Copernicus programme.
Earth on AWS - AWS: Sentinel-2/2A level 2A collections.
Digital Earth Africa - DEAFRICA: Open data collection of Sentinel-2/2A and Landsat-8 for Africa.
Digital Earth Australia - DEAUSTRALIA: Open data collections for the Australian subcontinent.
Microsoft Planetary Computer - MPC: Open data collection of Sentinel-2/2A and Landsat-8.
NASA Harmonized Landsat/Sentinel Collection HLS.
Swiss Data Cube (SDC): Open data collection of Sentinel-2/2A and Landsat-8.
USGS: Landsat-4/5/7/8 collections, which are not open data.

Open data collections do not require payment of access fees. Except for those in the Brazil Data Cube, these collections are not regular. Irregular collections require further processing before they can be used for classification using machine learning models.

Building a Data Cube from an ARD Image Collection

The following code defines an irregular data cube of Sentinel-2/2A images available in the Microsoft Planetary Computer, using the open data collection "SENTINEL-2-L2A". The geographical area of the data cube is defined by the tiles "20LKP" and "20LLKP", and the temporal extent by a start and end date. Access to other cloud services works in similar ways.

s2_cube <- sits_cube(
  source = "MPC",
  collection = "SENTINEL-2-L2A",
  tiles = c("20LKP", "20LLP"),
  bands = c("B03", "B08", "B11", "SCL"),
  start_date = as.Date("2018-07-01"),
  end_date = as.Date("2019-06-30"),
  progress = FALSE
)

This cube is irregular. The timelines of tiles "20LKP" and "20LLKP" and the resolutions of the bands are different. Sentinel-2 bands "B03" and "B08" have 10-meters resolution, while band "B11" and the cloud band "SCL" have 20-meters resolution. Irregular collections need an additional processing step to be converted to regular data cubes, as described below.

Conceptual view of data cubes (source: authors)

After defining an irregular ARD image collection from a cloud service using sits_cube(), users should run sits_regularize() to build a regular data cube. This function uses the gdalcubes R package, described in Appel and Pebesma, 2019.

gc_cube <- sits_regularize(
  cube          = s2_cube,
  output_dir    = tempdir(),
  period        = "P15D",
  res           = 60,
  multicores    = 4
)

The above command builds a regular data cube with all bands interpolated to 60 m spatial resolution and 15-days temporal resolution. Regular data cubes are the input to the sits functions for time series retrieval, building machine learning models, and classification of raster images and time series.

The cube can be shown in a leaflet using sits_view().

# View a color composite on a leaflet
sits_view(s2_cube[1, ], green = "B08", blue = "B03", red = "B11")

Working with Time Series in `sits`

Accessing Time Series in Data Cubes

sits has been designed to use satellite image time series to derive machine learning models. After the data cube has been created, time series can be retrieved individually or by using CSV or SHP files, as in the following example. The example below uses a data cube in a local directory, whose images have been obtained from the "MOD13Q1-6" collection of the Brazil Data Cube.

library(sits)
# this data cube uses images from the Brazil Data Cube that have
# downloaded to a local directory
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
# create a cube from downloaded files
raster_cube <- sits_cube(
  source = "BDC",
  collection = "MOD13Q1-6",
  data_dir = data_dir,
  delim = "_",
  parse_info = c("X1", "X2", "tile", "band", "date"),
  progress = FALSE
)
# obtain a set of samples defined by a CSV file
csv_file <- system.file("extdata/samples/samples_sinop_crop.csv",
  package = "sits"
)
# retrieve the time series associated with the samples from the data cube
points <- sits_get_data(raster_cube, samples = csv_file)
# show the time series
points[1:3, ]
#> # A tibble: 3 × 7
#>   longitude latitude start_date end_date   label    cube      time_series      
#>       <dbl>    <dbl> <date>     <date>     <chr>    <chr>     <list>           
#> 1     -55.8    -11.7 2013-09-14 2014-08-29 Cerrado  MOD13Q1-6 <tibble [12 × 2]>
#> 2     -55.8    -11.7 2013-09-14 2014-08-29 Cerrado  MOD13Q1-6 <tibble [12 × 2]>
#> 3     -55.7    -11.7 2013-09-14 2014-08-29 Soy_Corn MOD13Q1-6 <tibble [12 × 2]>

After a time series has been obtained, it is loaded in a tibble. The first six columns contain the metadata: spatial and temporal location, label assigned to the sample, and coverage from where the data has been extracted. The spatial location is given in longitude and latitude coordinates. The first sample has been labelled “Pasture”, at location (-55.65931, -11.76267), and is considered valid for the period (2013-09-14, 2014-08-29).

Time Series Classification

Training Machine Learning Models

sits provides support for the classification of both individual time series as well as data cubes. The following machine learning methods are available in sits:

Support vector machines (sits_svm())
Random forests (sits_rfor())
Extreme gradient boosting (sits_xgboost())
Multi-layer perceptrons (sits_mlp())
1D convolution neural networks (sits_tempcnn())
Temporal self-attention encoder (sits_tae())
Lightweight temporal attention encoder (sits_lighttae())

The following example illustrate how to train a dataset and classify an individual time series. First we use the sits_train() function with two parameters: the training dataset (described above) and the chosen machine learning model (in this case, TempCNN). The trained model is then used to classify a time series from Mato Grosso Brazilian state, using sits_classify(). The results can be shown in text format using the function sits_show_prediction() or graphically using plot.

# training data set
data("samples_modis_ndvi")
# point to be classified
data("point_mt_6bands")
# Train a deep learning model
tempcnn_model <- sits_train(
  samples = samples_modis_ndvi,
  ml_method = sits_tempcnn()
)
# Select NDVI band of the  point to be classified
# Classify using TempCNN model
# Plot the result
point_mt_6bands |>
  sits_select(bands = "NDVI") |>
  sits_classify(tempcnn_model) |>
  plot()
#>   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%

Classification of NDVI time series using TempCNN

The following example shows how to classify a data cube organized as a set of raster images. The result can also be visualized interactively using sits_view().

# Create a data cube to be classified
# Cube is composed of MOD13Q1 images from the Sinop region in Mato Grosso (Brazil)
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
sinop <- sits_cube(
  source = "BDC",
  collection = "MOD13Q1-6",
  data_dir = data_dir,
  delim = "_",
  parse_info = c("X1", "X2", "tile", "band", "date"),
  progress = FALSE
)
# Classify the raster cube, generating a probability file
# Filter the pixels in the cube to remove noise
probs_cube <- sits_classify(
  data = sinop,
  ml_model = tempcnn_model,
  output_dir = tempdir()
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
# apply a bayesian smoothing to remove outliers
bayes_cube <- sits_smooth(
  cube = probs_cube,
  output_dir = tempdir()
)
# generate a thematic map
label_cube <- sits_label_classification(
  cube = bayes_cube,
  output_dir = tempdir()
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
# plot the the labelled cube
plot(label_cube,
  title = "Land use and Land cover in Sinop, MT, Brazil in 2018"
)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)

Land use and Land cover in Sinop, MT, Brazil in 2018

References

Citable papers for sits

If you use sits, please cite the following paper:

Rolf Simoes, Gilberto Camara, Gilberto Queiroz, Felipe Souza, Pedro R. Andrade, Lorena Santos, Alexandre Carvalho, and Karine Ferreira. “Satellite Image Time Series Analysis for Big Earth Observation Data”. Remote Sensing, 13: 2428, 2021. doi:10.3390/rs13132428.

Additionally, the sample quality control methods that use self-organized maps are described in the following reference:

Lorena Santos, Karine Ferreira, Gilberto Camara, Michelle Picoli, Rolf Simoes, “Quality control and class noise reduction of satellite image time series”. ISPRS Journal of Photogrammetry and Remote Sensing, 177:75-88, 2021. doi:10.1016/j.isprsjprs.2021.04.014.

Papers that use sits to produce LUCC maps

Rolf Simoes, Michelle Picoli, et al., “Land use and cover maps for Mato Grosso State in Brazil from 2001 to 2017”. Sci Data 7(34), 2020. doi:10.1038/s41597-020-0371-4.
Michelle Picoli, Gilberto Camara, et al., “Big Earth Observation Time Series Analysis for Monitoring Brazilian Agriculture”. ISPRS Journal of Photogrammetry and Remote Sensing, 2018. doi:10.1016/j.isprsjprs.2018.08.007.
Karine Ferreira, Gilberto Queiroz et al., “Earth Observation Data Cubes for Brazil: Requirements, Methodology and Products”. Remote Sens. 12:4033, 2020. doi:10.3390/rs12244033.
Hadi, Firman, Laode Muhammad Sabri, Yudo Prasetyo, and Bambang Sudarsono. Leveraging Time-Series Imageries and Open Source Tools for Enhanced Land Cover Classification. In IOP Conference Series: Earth and Environmental Science, 1276:012035. IOP Publishing, 2023.
Bruno Adorno, Thales Körting, and Silvana Amaral, Contribution of time-series data cubes to classify urban vegetation types by remote sensing. Urban Forest & Urban Greening, 79, 127817, 2023.
Giuliani, Gregory. Time-First Approach for Land Cover Mapping Using Big Earth Observation Data Time-Series in a Data Cube – a Case Study from the Lake Geneva Region (Switzerland). Big Earth Data, 2024.
Werner, João, Mariana Belgiu et al., Mapping Integrated Crop–Livestock Systems Using Fused Sentinel-2 and PlanetScope Time Series and Deep Learning. Remote Sensing 16, no. 8 (January 2024): 1421.

Papers that describe software used by the sits package

We thank the authors of these papers for making their code available to be used in connection with sits.

Marius Appel and Edzer Pebesma, “On-Demand Processing of Data Cubes from Satellite Image Collections with the Gdalcubes Library.” Data 4 (3): 1–16, 2020. doi:10.3390/data4030092.
Ron Wehrens and Johannes Kruisselbrink, “Flexible Self-Organising Maps in kohonen 3.0”. Journal of Statistical Software, 87(7), 2018. doi:10.18637/jss.v087.i07.
Charlotte Pelletier, Geoffrey I. Webb, and Francois Petitjean. “Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series.” Remote Sensing 11 (5), 2019. doi:10.3390/rs11050523.
Vivien Garnot, Loic Landrieu, Sebastien Giordano, and Nesrine Chehata, “Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention”, Conference on Computer Vision and Pattern Recognition, 2020. <doi: 10.1109/CVPR42600.2020.01234>.
Vivien Garnot, Loic Landrieu, “Lightweight Temporal Self-Attention for Classifying Satellite Images Time Series”, 2020. <arXiv:2007.00586>.
Maja Schneider, Marco Körner, “[Re] Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention.” ReScience C 7 (2), 2021. doi:10.5281/zenodo.4835356.
Jakub Nowosad, Tomasz Stepinski, “Extended SLIC superpixels algorithm for applications to non-imagery geospatial rasters”. International Journal of Applied Earth Observation and Geoinformation, 112, 102935, 2022.
Martin Tennekes, “tmap: Thematic Maps in R.” Journal of Statistical Software, 84(6), 1–39, 2018.

Acknowledgements for community support

The authors are thankful for the contributions of Edzer Pebesma, Jakub Nowosad. Marius Appel, Martin Tennekes, Robert Hijmans, Ron Wehrens, and Tim Appelhans, respectively chief developers of the packages sf/stars, supercells, gdalcubes, tmap, terra, kohonen, and leafem. The sits package recognises the great work of the RStudio team, including the tidyverse. Many thanks to Daniel Falbel for his great work in the torch and luz packages. Charlotte Pelletier shared the python code that has been reused for the TempCNN machine learning model. We would like to thank Maja Schneider for sharing the python code that helped the implementation of the sits_lighttae() and sits_tae() model. We recognise the importance of the work by Chris Holmes and Mattias Mohr on the STAC specification and API.

Acknowledgements for Financial and Material Support

We acknowledge and thank the project funders that provided financial and material support:

Amazon Fund, established by the Brazilian government with financial contribution from Norway, through the project contract between the Brazilian Development Bank (BNDES) and the Foundation for Science, Technology and Space Applications (FUNCATE), for the establishment of the Brazil Data Cube, process 17.2.0536.1.
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES) and from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), for providing MSc and PhD scholarships.
Sao Paulo Research Foundation (FAPESP) under eScience Program grant 2014/08398-6, for for providing MSc, PhD and post-doc scholarships, equipment, and travel support.
International Climate Initiative of the Germany Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety (IKI) under grant 17-III-084- Global-A-RESTORE+ (“RESTORE+: Addressing Landscape Restoration on Degraded Land in Indonesia and Brazil”).
Microsoft Planetary Computer under the GEO-Microsoft Cloud Computer Grants Programme.
The Open-Earth-Monitor Cyberinfratructure project, which has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101059548.
FAO-EOSTAT initiative, which uses next generation Earth observation tools to produce land cover and land use statistics.

How to contribute

The sits project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

sits's People

Contributors

Stargazers

Watchers

Forkers

luizassis ywatacarvalho lubiavinhas grahamjeffries pedrobsb sensing-bot antonycastro zhaoxiaohe markcoetzee cuulee 120534 harryprince jimjoker vwmaus kezekwem andreemidio ammaciel fdbesanto2 pedro-andrade-inpe mvgaldos rolfsimoes drroad anhnguyendepocen oldlipe sumesh1 richardscottoz m3nin0 cherish2019 priscasantos sandroklippel ivo92 yuan-newhub edzer ashlinrichardson mtoqeerpk burcu2708 dongyi1996 mromerosanchez mathiasfls surfcao memo1986 lorenalves sabangroo ishaankochhar kumarnarendra0619 lucassaranmacedo geogubd muhammadsyukrilugm imsuya jacksonmrod helixcn aminkhairoun sht111 wellalbuquerque kvantas dondealban joselastra ksoumya fehoo13 ronnyhdez baggiocastro rsmahabir rsbivand theniaoliveira g-bordoni albhasan mdsumner wuzhenzong0 sidneyvelloso florentdemelezi pondib zhihu3456

sits's Issues

tests based on sits vignette

Implement tests based on the code written in sits vignette in order to ensure their results are valid.

Check if .sits_convert_resolution is necessary

.sits_convert_resolution() does not work. Check if it is really necessary. The first two lines of this function are:

    res <- vector()
    names[res] <- c("xres", "yres")

They should be as follows to work:

    res <- vector(length = 2)
    names(res) <- c("xres", "yres")

This function is only called by .sits_fromSHP(), in an if that is never true in the tests:

    if (coverage$xres > 1) {
        res <- .sits_convert_resolution(coverage)
        xres <- res["xres"]
        yres <- res["yres"]
    }

Misleading argument

The function sits::sits_classify_raster has the argument file, which is actually a prefix instead of a path to a file. This may mislead users. I suggest renaming this argument as file_prefix

About sits_select and consistency with other sits functions

Some sits functions work as interfaces to dplyr functions. For example: sits_transmute calls dplyr::transmute, sits_mutate calls dplyr::mutate, sits_sample calls dplyr::sample_n and dplyr::sample_frac. In the transmute and mutate cases, they act upon $time_series, which cannot be easily handled with dplyr directly. However, sits_select calls dplyr::filter (and not dplyr::select), and in this case dplyr could be used directly:

data.tb <- sits_select(cerrado_2classes, label == "Cerrado")

is the same of

data.tb <- dplyr::filter(cerrado_2classes, label == "Cerrado")

It will for sure confuse the user that knows dplyr. The other objective of sits_select is to call sits_select_bands, which is very useful and calls dplyr::select over $time_series. My suggestions to keep consistency are (1) rename sits_select_bands to sits_select (as the other functions already work with $time_series), OR (2) remove sits_select and rename sits_mutate to sits_mutate_bands, sits_transmute to sits_transmute_bands, and so on.

Roll back GBM tests

GBM tests needed to be removed in order to allow the Continuous Integration server work properly with the updated version of R (3.5.1) and its packages. The lines removed will be shown below.

Warning: replacing previous import ‘dtwclust::predict’ by ‘stats::predict’ when loading ‘sits’

checking whether package ‘sits’ can be installed ... WARNING
Found the following significant warnings:
Warning: replacing previous import ‘dtwclust::predict’ by ‘stats::predict’ when loading ‘sits’

Error in sits_dendogram

I am trying to build the sits vignette. The vignette builder is failing in the call

dendro <- sits_dendrogram(cerrado_2classes)
Error in dtwclust::tsclust(values.tb, type = "hierarchical", k = NROW(data.tb), :
Cannot have more clusters than series in the dataset

Could you check what is the problem?

code coverage

Check code coverage of the tests.

confusion in the names of functions

Some functions/data of sits do not follow a convention in their names. Usually, sits use only lowercase (even for acronyms - and sits is an acronym written in lowercase) with words separated by underscore:

sits_bayes_postprocess()
sits_csv_error_file()
sits_info_services()
sits_ndvi_arima_filter()
point_ndvi

However, some functions/data do not follow this convention:

sits_TWDTW_classify()
sits_data_toCSV()
.kalmanfilter()
samples_MT_ndvi
sits_shp_toCSV()

Shouldn´t they be named as below?

sits_TWDTW_classify() -> sits_twdtw_classify()
sits_data_toCSV()     -> sits_data_to_csv()
.kalmanfilter()       -> .kalman_filter()
samples_MT_ndvi       -> samples_mt_ndvi
sits_shp_toCSV()      -> sits_shp_to_csv()

Raster Projection Problem

Hi! Great package!

I am trying the command sits_TWDTW_classify to map cropland by using Sentinel-2 tile.

I found this package fail to identify the other projections, such as UTM. Because when I use the command sits_getdata to extract time series of points from tibble coverage, it always reports an error that "Error in spTransform(y, px) : error in pj_transform: latitude or longitude exceeded limits".

I was wondering how to realize the functions, and whether there are any solutions when the projection of input data is UTM.

Best regards

Jie

Output of tests

The tests of sits packages print several messages and also create some files. Remove such files along the tests and avoid showing messages to have a clean output.

Change `sits_labels` output

In sits_labels() output, change freq column to prop.

improve sits_classify error when bands do not match

data(point_MT_6bands)
data(samples_MT_9classes)

samples.tb <- sits_select(samples_MT_9classes,
                          bands = c("ndvi"))
model <- sits_train(samples.tb, sits_svm())
point.tb <- sits_select(point_MT_6bands, bands = c("ndvi", "evi"))

sits_classify(point.tb, model)

shows

bands in the data do not match bands in the model

improve to

bands in the data (ndvi, evi) do not match bands in the model (ndvi)

sits_getdata fail if shapefile has no EPSG code defined

sits_getdata can receive as parameter a shapefile to retreive points from a coverage. If shapefile has no EPSG code (only Proj.4 string), the function fails.

Create a README.Rmd

The current README.md file uses R markdown, and not GitHub markdown. Create a README.Rmd in order to allow syntax highlight in the main page of sits package, in the same way of dtwSat.

.sits_factory_function2 and .set_fun_args

Remove functions .sits_factory_function2() and .set_fun_args() as they are not called by any other function within the package.

Evaluate snow package

Evaluate snow package to check if it can replace parallel.

sits_plot samples with different labels and bands

sits_plot currently plots all combinations of labels and bands in separate plots. Investigate if it would be interesting to plot them in a single plot, using one axis for labels and another for bands.

data <- samples_MT_9classes %>%
    sits_select_bands(evi, ndvi) %>%
    dplyr::filter(label %in% c("Pasture", "Forest"))
par(mfcol = c(2, 2)) # sits ignores this
sits_plot(data)

Build fails

A fresh download from github fails on "R CMD build sits" by throwing :

Quitting from lines 481-515 (sits.Rmd)
Error: processing vignette 'sits.Rmd' failed with diagnostics:
conditions failed for call 'tools::buildVignett .. ".", tangle = TRUE)':
* length(.) == nrow(dist1_DT)
Description: sits_classify_raster - number of classified pixels is different
from number of input pixels
Execution halted

'Rtools’ is not available' for R version 3.5.1

When I try to install sits (devtools::install_github("e-sensing/sits")) it shows 'Rtools’ package is not available' for R version 3.5.1. (it's the last one up 2018-08)

sits_svm example

Good morning

the example code provided for sits::sits_svm runs but it doesn't work. It seems the classifier is unable to classify the given point. See the code below

data(samples_MT_ndvi)
data(point_ndvi)
# label before classification
(label_before <- point_ndvi$label)
# NoClass is not in the label set
(point_ndvi$label %in% unique(samples_MT_ndvi$label))
class.tb <- sits_classify (point_ndvi, samples_MT_ndvi, ml_method = sits_svm(kernel = "radial", cost = 10))
# label after classification. Neither the label on the point_ndvi nor the class.tb changed!
(label_after1 <- point_ndvi$label)
(label_after2 <- class.tb$label)

sits_services x sits_info_services

What is the difference between sits_services and sits_info_services?

Coordinate EOCubes with SITS

There is a need to coordinate the EOCubes and SITS packages. SITS should rely on EOCubes to provide information about organized collections of satellite data (cubes). There should be a away for SITS to ask EOCubes to provide information about a requested coverage. This should work in a similar way as WTSS and Google Earth Engine:

The user asks EOCube about what collections are available.
The user defines a coverage as a partition of a collection in space and time.
EOCubes supplies to SITS all information about this coverage, without the need for entering any list of files. This should be transparent to the user.

Call sits_coverage failed

Hello

I am getting an error when use the function sits_coverage. I use the command raster_cov <- sits_coverage(service = "RASTER", name = "Sinop-crop", timeline = timeline, bands = c("ndvi"), scale_factors = 1e-04, files = filename).

The error is "Error: conditions failed for call 'sits_coverage(service = "RASTER", name = "Sinop-crop", timeline = timeline, ': * !(purrr::is_null(.))
Description: Not able to obtain scale factors for raster data"

However, I had set the scale_factors in the command before.
Some idea about how to solve this?

Best,
Jie

Compilation fail

Hello @vwmaus!

I am getting an error when compile sits from scratch:

installing to library ‘/.../R_libs’
installing source package ‘sits’ ...
g++ -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
** libs
RcppExports.cpp:4:18: fatal error: Rcpp.h: No such file or directory

Some idea about how to solve this?

Missing RASTER_minimum_value

The property RASTER_minimum_value, misses values for the bands:

swir1
swir2

This throws errors during Landsat classifications.

sits_classify error when model has more bands than data

When the model has more bands then data in sits_classify, it stops with a bad error.

data(point_MT_6bands)
data(samples_MT_9classes)

samples.tb <- sits_select(samples_MT_9classes,
                          bands = c("ndvi", "evi"))
model <- sits_train(samples.tb, sits_svm())
point.tb <- sits_select(point_MT_6bands, bands = c("ndvi"))

sits_classify(point.tb, model)

shows the following error:

Item 26 of j is 428 which is outside the column number range [1,ncol=414]

It should be

bands in the data do not match bands in the model

in the same way when the model has less bands than data.

sits_coverage is ignoring some arguments

sits_coverage, when used with some service, is ignoring some arguments. One possible action is to compare the passed (and currently ignored) argument to the available information about the service and use if it is allowed (a subset of the available values, a value within the available ranges, etc.), or stop with an error otherwise. Current output:

coverage_wtss <- 
    sits_coverage(service = "WTSS-INPE",
                  name    = "MOD13Q1",
                  bands = c("ndvi", "evi"))

coverage_wtss$bands # [1] "mir"  "blue" "nir"  "red"  "evi"  "ndvi"

coverage_wtss <- 
    sits_coverage(service = "WTSS-INPE",
                  name    = "MOD13Q1",
                  bands = c("ndvi", "eviee"))

coverage_wtss$bands # [1] "mir"  "blue" "nir"  "red"  "evi"  "ndvi"

sits_kohonen parameters

The first two parameters of "sits_kohonen.R" are a tibble and the time series extracted from the same tibble. Do we need two parameters or could we use only one?

Check if .sits_binary_search() is necessary

Function .sits_binary_search() is not called by any other function in sits package (except by itself). Verify if this function is really necessary.

Check if probabilities are normalized

Check if classification probabilities are normalized in the same way for the different algorithms. If not, normalize them using a single range.

distance in the output of sits_classify

In the output of sits_classify, there is an attribute called distance that comes with only zeros. Is it necessary? Could it be removed?

data(samples_MT_ndvi)
svm_model <- sits_train(samples_MT_ndvi, sits_svm(kernel = "radial", cost = 10))
data(point_ndvi)
class.tb <- sits_classify(point_ndvi, svm_model)

class.tb$predicted[[1]]$distance

sits_services() is not printing products nor coverages

The documentation of sits_services() says that it "Uses the configuration file to print information about the services, products and coverages." However, its output is currently only the services:

> sits_services()
Service - WTSS-INPE
------------------
Service - SATVEG
------------------

Fix it.

Bad error message

An error message produced by sits_select_bands wrongly references the sits_select function

``
library(sits)
data(samples_MT_9classes)
bs <- c(sits_bands(samples_MT_9classes), "fake_bands")
sits_select_bands(samples_MT_9classes, bs)

Error: conditions failed for call 'sits_select_bands(samples_MT_9classes, bs)':
* all(bands %in% sits_bands(.))
Description: sits_select: some band(s) not found in input data
``

Bug in sits_coverage function

sits_coverage() function is not working properly with WTSS coverages other than "MOD13Q1". Example:

sits_services()
# shows 'MOD13Q1' and 'MOD13Q1_M' (a new coverage for MOD13Q1 quality bands)
sits_coverage("WTSS-INPE", "MOD13Q1")
# it's working!
sits_coverage("WTSS-INPE", "MOD13Q1_M")
# fail!

Some possible related issues: #109

Review "sits table"

Review the term "sits table" in the documentation, as it does not exist any more. It should be replaced by "tibble" or "data.table" accordingly.

Remove some sits dependencies

Try to remove some dependencies of sits. Some candidates are:

(1) "mclust" - clustering functions of this package take to much time to process
(2) "signal" - implements Savitksy-Golay filter that could be implemented directly in sits
(3) "tidyr" - only needed to run fastApply in the past but it is no longer used
(4) "openxlsx"- create an excel spreadsheet with the confusion matrix
(5) "pryr" - find out the free memory available.

sits_services() does not query WTSS service to show coverages

sits_services function does not query WTSS service to show coverage details.
It reads sits config to get bands and coverages.

sits_plot output of sits_classify with more than one band

When calling sits_plot with the output of sits_classify using more than one band, in only draws the first band as default. Shouldn´t it draw all them? In the example below, it shows only ndvi.

data(point_MT_6bands)
data(samples_MT_9classes)

samples.tb <- sits_select(samples_MT_9classes,
                          bands = c("ndvi", "evi"))
model <- sits_train(samples.tb, sits_svm())
point.tb <- sits_select(point_MT_6bands, bands = c("ndvi", "evi"))

result <- sits_classify(point.tb, model)

sits_plot(result)

sits_conf_matrix error

sits_conf_matrix error fails when there are less than two classes in a predicted & reference tibble

The code below throws the error:

data and reference should be factors with the same levels.

library(sits)
data(cerrado_2classes)
pred_ref.tb <-  sits_kfold_validate(cerrado_2classes, folds = 2)
conf.mx <- sits_conf_matrix(pred_ref.tb[1:10,])

R version

> R.Version()
$platform
[1] "x86_64-pc-linux-gnu"

$arch
[1] "x86_64"

$os
[1] "linux-gnu"

$system
[1] "x86_64, linux-gnu"

$status
[1] ""

$major
[1] "3"

$minor
[1] "4.4"

$year
[1] "2018"

$month
[1] "03"

$day
[1] "15"

$`svn rev`
[1] "74408"

$language
[1] "R"

$version.string
[1] "R version 3.4.4 (2018-03-15)"

$nickname
[1] "Someone to Lean On"

Overwrite results

The function sits::sits_classify_raster overwrites former results and it doesn't provide a way to control this behavior. This could lead to undesired results, specially when classifying large data sets.

The overwrite parameter is burn in the code at sits:::.sits_classify_multicores

documentation of sits_convnets

Fix the documentation of sits_convnets:

* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'sits_convnets'
  ‘filters’ ‘kernels’
Documented arguments not in \usage in documentation object 'sits_convnets':
  ‘filter’ ‘kernel’

Please do not fix this issue yet. I'll use it to test the continuous integration service.

Remove temp folders 'README_*' and include it in .gitignore

The temporary folders 'README_cache' and 'README_files' appeared in root directory. It is probably produced in the process of README compilation.

Release version conflict

The release announcement [1] and DESCRIPTION file have different version numbers (1.12.5 and 1.12.0 respectively)

[1] https://github.com/e-sensing/sits/releases/tag/1.12.5

Review sits_patterns

Review sits_patterns:

Argument timeline is useless
Arguments start_date and end_date are not month-day. They should be a Date, or a string that could be converted to a Date. The code below does not work:

data(cerrado_2classes)
patterns.tb <- sits_patterns(cerrado_2classes, start_date ="09-13", end_date = "08-29")

If start_date is set as argument but not end_date, start_date will be overwritten, and vice-versa.

        if (purrr::is_null(start_date) || purrr::is_null(end_date)) {
            start_date <- lubridate::as_date(utils::head(sample_dates, n = 1))
            end_date   <- lubridate::as_date(utils::tail(sample_dates, n = 1))
        }

Review sits_classify_raster memory and multicores

sits_classify_raster uses 4GB of memory and the maximum number of cores minus one as default. Review these values, as the user might not have 4GB of memory available and might not want to use almost all processors. For example, sits_classify has multicores = 1 as default.

Missing configuration values

config.yaml misses some values. The property "RASTER_missing_value" misses the blue band for LANDSAT as well as for other indexes such as savi and msavi.

review sits_select_bands()

What about reviewing sits_select_bands() to work in the same way of dplyr::select()? Currently it works as follows:

sits_select_bands(samples_MT_9classes,  c("ndvi", "evi"))

The code above could be updated to:

sits_select_bands(samples_MT_9classes, ndvi, evi)

sits tibble' class

Good morning,

There are differences in the classes of different datasets in the package. Some are "sits" and some other "sits_tibble". Are there differences among them?

Kind regards,

library(sits)
data(point_MT_6bands)
data(samples_MT_ndvi)
data(point_ndvi)
data(prodes_226_064)
data(cerrado_2classes)

# class sits
class(sits::sits_tibble()) # template function!
class(point_MT_6bands)
class(samples_MT_9classes)
# class sits_tibble
class(samples_MT_ndvi)
class(cerrado_2classes)
class(point_ndvi)
class(prodes_226_064)

Check if .sits_split_block_size() is necessary

Function .sits_split_block_size() is not used by any other function in sits. Check if it is really necessary.