bfast2 / bfast Goto Github PK

View Code? Open in Web Editor NEW

39.0 39.0 15.0 282 KB

Breaks For Additive Season and Trend

Home Page: https://bfast2.github.io

License: GNU General Public License v2.0

R 96.11% C++ 1.95% TeX 1.94%

bfast's People

Contributors

Stargazers

Watchers

Forkers

kongdd yangxhcaf jane09gis lilixu-ccnu rral0 alequech allansp84 c11 dongyi1996 hhuliujia niceynine ayelealmaw blackviking98921 kmkamilkhel tkmurakami

bfast's Issues

start github organisation for bfast activities

plot.bfast: Better plotting

The current plot method in plot.bfast shows the components one by one, but there is no way to plot everything in one plot. It would be very convenient to have that.

Here's a small snippet of my custom code that produces pretty nice results:

MyTS = bfastts(ChangedEVITS[9,], dates, "10-day") # Some time series
MyTS = na.approx(MyTS) # This step is optional, just for break-free visualisation
MyBF = bfast(MyTS, GetBreakNumber(dates), season="harmonic", max.iter=3)
MyIter = length(MyBF$output)

MyFit = MyBF$output[[MyIter]]$St + MyBF$output[[MyIter]]$Tt
MyTrend = MyBF$output[[MyIter]]$Tt
MyTimes = as.numeric(time(MyBF$output[[MyIter]]$Tt))
MyTimes = MyTimes[!is.na(MyBF$output[[MyIter]]$Tt)]
MyBreaks = MyTimes[MyBF$output[[MyIter]]$bp.Vt$breakpoints]
plot(MyTS); lines(MyFit, col="blue"); lines(MyTrend, col="green"); abline(v=MyBreaks, col="red")

Sample output:

What's missing is a legend. Suggestions are also welcome!

bfastpp: document external regressors better

There is lack of documentation in ?bfastpp about how external regressors are handled. It is described, but an example of how to use them is missing.

Enabling fast options by default

It seems that there has been enough extensive testing of Marius' code that the fast options should be enabled by default, with the ability to disable them if needed.

This depends on fixing the upstream issue appelmar/bfast#2, however.

Update the description of the repo

The description at the top is still from when it moved to GitHub and is no longer relevant. Suggestions welcome!

Add an example tutorial from the latest developments

Make an overview of the latest pending changes as a tutorial, so then we don't need a big list of changes but we have it more visual. Useful also for AEO as a tutorial.

bfastpp: why is season a factor?

bfastpp produces output where season is a factor. I think the original idea is that seasonality should be repetitive, and thus every year you have the same factors, right? But that breaks down if you have a non-integer frequency of the ts object, because then the season each year is different, therefore each point becomes unique, which is not useful at all...

But why not just leave it as a number? A season of 1.6 is valid as far as I can tell (the season is in between 1 and 2 of the first year). Or just use DOY altogether?

Bfastmonitor/Spatial: rmse extra band

Is there an easy way to get the RMSE (of the residuals) out, as an extra band (returnLayers) in Bfastmonitor? Many thanks!!!

Error in if (is.nan(p0) || p0 < a2 || p0 > (1 - a2)) {: missing value where TRUE/FALSE needed

Equivalent of bfast2/strucchangeRcpp#18 in bfast01, with a trivial reproducer:

library(bfast)
bfast01(data = Nile)
#> Error in if (is.nan(p0) || p0 < a2 || p0 > (1 - a2)) {: missing value where TRUE/FALSE needed

^{Created on 2020-03-27 by the reprex package (v0.3.0)}

Blame:

bfast/R/bfast01.R

Line 462 in f58ba97

if(is.nan(p0) || p0 < a2 || p0 > (1-a2)) {

But really should just depend on strucchange to not duplicate code.

check out rOpenSci guidlines

goodpractice package, pkgdown (documentation) en rhub (for testing on different systems)

Add devel/master branches

The master branch should be stable and aimed at CRAN submission, and other branches should be implementing new features (potentially breaking CRAN requirements).

Add a fitted.bfast() method

Right now, running fitted() on a bfast object returns NULL (that's returned for any unknown class). It would be better if it returned x$output[[iteration]]$St + x$output[[iteration]]$Tt instead.

bfastts: allow a vector for type

By default bfastts will fail with a cryptic message if no type is supplied. It should either give a reasonable message or assume irregular (or regular if #2 is implemented).

Add external unit tests

We'd want to use e.g. testthat to test "bad examples" or very basic things that are not well-suited to go into the examples section of function documentation.

Nevertheless, the useful tests should all go into the examples section. Even simple use cases are useful for people to see immediately, so they could run it and see the output. The examples section gets tested by Travis automatically as part of R CMD check.

bfast01: error: inadmissable change points: 'from' is larger than 'to'

This call to Fstat in bfast01() causes an error for some inputs:
https://github.com/GreatEmerald/bfast/blob/6e86571601510ea33af4daa16b0e36564995bd54/R/bfast01.R#L156
The error is inadmissable change points: 'from' is larger than 'to'. Setting trim to a low number (~7) makes it not give this error any more.

I'd need some input data for reproducing it easier, but overall I'm not sure why that call to Fstat is there? In the regular bfast, this functionality is handled by bfastts and bfastpp instead.

problems with NAs in regressor of bfaston

here is the reprex.html of this problem:
http://localhost:31875/session/reprex63e869fb6762/reprex_reprex.html

Improved documentation

Right now the docs on all the bfast-related issues are scattered all across, generating quite a bit of confusion (case in point: https://doi.org/10.1016%2Fj.isprsjprs.2017.06.013 considers bfastMonitor to be renamed bfast, which is not true). We could use the wiki here or such to make an overview of the docs and capabilities of each function.

bfastpp: inconsistent behaviour between formula and full methods

A lot of code is the same between .bfastpp.full() and .bfastpp.formula(), but unfortunately not everything. That leads to inconsistent behaviour, e.g. season is calculated as factor(cycle(y)) in the former and as 1 + seq(as.numeric(cycle(y[1]))-1 in the latter, which leads to incorrect type (numeric instead of the expected factor).

Following DRY principles, the code should be unified.

As an aside but related: bfastpp gives a radically different object when a formula is given, vs when only a ts is given. It's extremely confusing, as the user normally assumes that giving a formula would simply filter out the undesired columns from the result.

Add example for sbins

At the moment the sbins parameter is documented but is lacking an example.

time < start: comparison (3) is possible only for atomic and list types

Perhaps it's a strucchange issue instead, but bfastmonitor returns the error:

time < start: comparison (3) is possible only for atomic and list types

When given input with a frequency of 23 and monitoring of (I believe) 29 points:

3, 5, 6, 8, 10, 13, 24, 32, 29, 29, 27, 27, 28, 21, 21, 18, 14, 11, 9, 6, 4, 5, 5, 4, 5, 6, 5, 9, 20, 24, 33, 36, 31, 36, 40, 35, 43, 33, 33, 21, 15, 11, 9, 8, 5, 5, NA, 3, 5, 6, 7, 13, 19, 28, 36, 36, 35, 29, 30, 26, 25, 22, 17, 14, 9, 9, 6, 6, 6, 8, NA, 7, 6, 11, 14, 21, 33, 39, 38, 32, 26, 27, 27, 20, 20, 18, 15, 14, 8, 7, 7, 6, 8, NA, 5, 6, 8, 13, 25, 36, 37, 36, 38, 41, 38, 32, 28, 25, 21, 16, 12, 11, 6, 4, 5, NA, NA, 6, 5, 5, 8, 11, 30, 42, 42, 40, 35, 33, 31, 29, 28, 24, 20, 17, 12, NA, 7, 6, 5, 6, 5, 8, 12, 10, 24, 33, 42, 42, 44, 48, 41, 43, 37, 31, 23, 20, 15, NA, 9, 7, 7, 7, 7, 7, 14, 14, 16, 30, 39, 43, 45, 39, 33, 29, 32, 31, 23, 20, 15, 11, 8, NA, 6, 5, NA, NA, NA, 9, 9, 12, 17, 33, 41, 42, 54, 49, 46, 34, 27, 25, 17, 14, 11, 7, 7, 6, 6, 6, 6, 7, 9, 10, 16, 28, 37, 46, 40, 37, 31, 35, 33, 29, 21, 19, 13, 10, 8, NA, 6, 4, NA, NA, NA, 10, 7, 10, 24, 39, 39, 50, 41, 42, 44, 40, 37, 35, 23, 17, 15, 13, 10, 7, 6, 10, NA, 4, 9, 11, 18, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA

bfast01: stl="both" results in an empty formula

Running bfast01(stl="both") always results in an error:

library(bfast)
mts = ts(1:10, 2000, 2011, frequency=20)
bfast01(mts, stl="both")
#> Error in parse(text = x, keep.source = FALSE): <text>:2:0: unexpected end of input
#> 1: response ~ 
#>    ^

Created on 2019-12-04 by the reprex package (v0.2.0).

This is because the corresponding component is removed from the formula, and "both" removes both of them, resulting in an empty formula.

I don't know what the fix should be: either not allow passing "both", or perhaps not filter the components out.

Roxygen2 comment inconsistencies

Some of the documentation generated by Roxygen2 is not consistent with the original documentation in terms of formatting. All docs should be compared against the originals.

Remove dependency on raster

The dependency on the raster package is spurious at the moment; it is only used for man pages and to load the example dataset. It should be moved to optional requires.

Though the dependency may come back if we do #3 and integrate bfastSpatial. Or maybe by that time we'll be using stars.

Add license headers

GPL licenses suggest adding a license notice at the top of each source file, like this:

<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year>  <name of author>

This file is part of Foobar.

Foobar is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.

Foobar is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Foobar.  If not, see <http://www.gnu.org/licenses/>.

This is where it is explicitly allowed to use later versions of GPL (the main LICENSE text doesn't specify it). So it's a good idea to add these headers to each file to make sure it's really GPLv2+ and not GPLv2 only.

Relates to #4 as the header can be added along the Roxygen comments.

Submit an updated version of bfast to CRAN

To update the package on CRAN, there are a few options. The first way, the simplest one, would be:

Push the current updates on CRAN for the bfast package only.

This is doable because none of the updates specifically depend on an update to strucchange. The downside is that for much of the new functionality (e.g. increase in speed and most of the new features, plus bug fixes), the users are very interested in the updates done to strucchange.

If we want to use the new functionality, we also want to have an update to strucchange, on which bfast could depend. Here the roadmap would be one of the two:

Split out the C++ part of bfast2/strucchange into its own package, e.g. strucchangeArmadillo
Submit strucchangeArmadillo to CRAN
Submit all the changes in bfast2/bfast to upstream strucchange
Add checks in bfast for whether strucchangeArmadillo is loaded and then if so, use it, otherwise warn users that a slow codepath is being used
Submit bfast to CRAN

This plan has a downside in that splitting the whole codebase of bfast2/strucchange would take a lot of time that we currently most likely do not have (volunteers welcome). Alternatively:

Rename bfast2/strucchange to strucchangeArmadillo and mention that it is a fork of strucchange at a certain revision
Submit strucchangeArmadillo to CRAN
Switch dependency in bfast from strucchange to strucchangeArmadillo
Submit bfast to CRAN

It's useful to discuss which way we should choose to proceed.

Setup test env with continuous integration

Update links to the repository

Since we moved the repository, the README needs to be updated with new instructions about how to install it. Speaking of which, do we need devtools, or is there a more lightweight package that provides install_github()?

Check for the case of zero variance

When the time series consists of a straight line, p.Vt <- sctest(efp(Vt ~ ti, h = h, type = type)) returns NaN, which then fails down the line. bfast should throw an error in case sd(Vt) == 0, and possibly catch any further errors when p.Vt is invalid.

bfastmonitor doesn't validate inputs

bfastmonitor silently fails when inputs such as date, or startdate are not logically valid.

bfastlite: add sctest

Currently the bfast0n function doesn't include the sctest functionality, but it would be useful. Things to think about: what to output if there is no change according to the sctest (a breakpoints object with breaks=0? NA? -9999?).

Marketing names for functions

Come up with easier-to-understand names of the existing functions for users (suggested by Andrei). Needs some creative thinking. Of course, the old names should become aliases in that case.

bfastpp: more flexible handling of season

At the moment, bfastpp assumes that the user wants to estimate one parameter per time step of the input time series for a year. However, that makes it impossible to use season with h below one year. In addition, bfastpp also assumes that the frequency is an integer, which is not necessarily the case.

Low h parameter may result in fewer breaks

In some cases, lowering h might actually result in fewer breakpoints being detected. That is quite curious and may be good to check why that happens. An example was provided:

library(bfast)
set.seed(1234567)
opar = par(mfrow=c(1,2))
x <- rnorm(168,sd=1) # a vector without any inserted abrupt change
ts.plot(x)
bfast(ts(x),max.iter=1,season="none")
#> [1] "No seasonal model will be fitted!"
#> 
#>   TREND BREAKPOINTS:  None
#> 
#>   SEASONAL BREAKPOINTS:  None
y <- c(x[1:20],x[21:168]+1.5) # we insert an abrupt change from layer 20 onward
ts.plot(y)

h <- seq(.05,.2,by=0.05) # a sequance of h to use in bfast function
R <- lapply(X=1:length(h), function(i){
    bf = bfast(ts(y),h = h[i],max.iter=1,season="none")
    if (!is.na(bf$output[[1]]$bp.Vt))
        plot(bf$output[[1]]$bp.Vt)
    bf
})
#> [1] "No seasonal model will be fitted!"
#> [1] "No seasonal model will be fitted!"
#> Warning in if (!is.na(bf$output[[1]]$bp.Vt)) plot(bf$output[[1]]$bp.Vt):
#> the condition has length > 1 and only the first element will be used
#> [1] "No seasonal model will be fitted!"
#> [1] "No seasonal model will be fitted!"
R
#> [[1]]
#> 
#>   TREND BREAKPOINTS:  None
#> 
#>   SEASONAL BREAKPOINTS:  None
#> 
#> 
#> [[2]]
#> 
#>   TREND BREAKPOINTS
#>   Confidence intervals for breakpoints
#>   of optimal 2-segment partition: 
#> 
#> Call:
#> confint.breakpointsfull(object = bp.Vt, het.err = FALSE)
#> 
#> Breakpoints at observation number:
#>   2.5 % breakpoints 97.5 %
#> 1    19          20     26
#> 
#> Corresponding to breakdates:
#>   2.5 % breakpoints 97.5 %
#> 1    19          20     26
#> 
#>   SEASONAL BREAKPOINTS:  None
#> 
#> 
#> [[3]]
#> 
#>   TREND BREAKPOINTS:  None
#> 
#>   SEASONAL BREAKPOINTS:  None
#> 
#> 
#> [[4]]
#> 
#>   TREND BREAKPOINTS:  None
#> 
#>   SEASONAL BREAKPOINTS:  None
Rt = sapply(R, function(x)as.numeric(x$Time))
par(opar)

data.frame(h=h,changepoint=Rt)
#>      h changepoint
#> 1 0.05          NA
#> 2 0.10          20
#> 3 0.15          NA
#> 4 0.20          NA

^{Created on 2019-10-16 by the reprex package (v0.3.0)}

Best approach to handling NA values

NA values are a problem in time series analysis for seasonality and trend calculation. What is worse is that they tend to be temporally biased towards the rainy period, and we might have no observations whatsoever over the entire time series during particular times. This means that if we just omit the time steps with NAs, the trend estimation will also be biased towards values in the dry season.

There was also the matter of the stl package not handling data with NA values altogether.

One workaround (suggested by Achim) was to use bfastpp instead of stl to get simplified covariates, and then calculate the trend manually. Another is to use stlplus, but that imputes values. Discussion about the pros and cons and how to communicate that would be welcome.

adding slides from @dainius on BFAST overview - functionality and planning

Handle NAs in plot.bfast()

At the moment, attempting to plot a bfast() object that has NAs in the response results in an error:

Error in plot.window(xlim, ylim, log, ...) : need finite 'ylim' values

Which is actually right because the plot method doesn't expect range() to return NA.

bfastts: handle leap day

If we have the leap day (Feb 29) and also the day after (March 1) in the time series, bfastts fails with an error:

Error in merge.zoo(zoo(coredata(x), tt), zoo(, tt2)) : 
  series cannot be merged with non-unique index entries in a series
In addition: Warning messages:
1: In zoo(data, 1900 + as.POSIXlt(dates)$year + (yday365(dates) - 1)/365,  :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

MWE:
Fails:

a = c(as.Date("2016-02-29"), as.Date("2016-03-01"))
b = 1:2
bfastts(b, a, type="irregular")

Works if we set it to be one day later:

a = c(as.Date("2016-02-29"), as.Date("2016-03-01"))
b = 1:2
bfastts(b, a+1, type="irregular")

Fork of strucchange?

So far we didn't have a big need to change the code of strucchange, but I'm slowly but surely finding more and more bugs there as well, so a fork may be inevitable. In that case, we'll also need to do #35 also for the new fork, so should we fork it sooner rather than later?

Also, should we do the same for strucchange as we have for bfast with the contributor guidelines etc.? And what about new CRAN releases for strucchange?

bfastpp: add read.zoo() into examples

The function read.zoo() is very convenient for plotting the results of bfastpp, and the examples section is already using the zoo package, so I suggesting adding this to it:

plot(zoo::read.zoo(d1))

This is very easy and gives a very nice plot for the bfastpp result:

library(bfast)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
ndvi <- as.ts(zoo(cbind(a = som$NDVI.a, b = som$NDVI.b), som$Time))
ndvi <- window(ndvi, start = c(2006, 1), end = c(2009, 23))

## parametric season-trend model
d1 <- bfastpp(ndvi, order = 2)
plot(zoo::read.zoo(d1))

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Add landing page

Can be based on http://bfast.r-forge.r-project.org/

Add continuous integration

Travis CI can be used (at least as a starting point). It should be enabled for the repo by making a travis.yml file.

bfastts with type="regular"

zoo is able to convert regular time series zoo objects into ts objects, so it is possible to automate making a bfastts from a zoo object that way. The problem is that frequency(bfastts) is then < 1, so all code using that function needs to use a separate parameter instead. That requires some code refactoring.

update the readme - install instructions

Add contribution guidelines

update README.md file
add CONTRIBUTING.md file

bfastts: 16-day type returns no regularity

If I use type="16-day", even if the underlying time series is exactly 16-day, the resulting ts has a frequency of 1, which is not what we need at all.

# Unexpected
bfast::bfastts(1:240, seq.Date(as.Date("2009-01-01"), by=16, length.out=240), type="16-day")
#> Warning in as.ts.zoo(zz): 'x' does not have an underlying regularity
#> Time Series:
#> Start = 1 
#> End = 240 
#> Frequency = 1 
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
#>  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
#>  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
#>  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
#>  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
#>  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
#> [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
#> [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
#> [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
#> [154] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
#> [171] 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
#> [188] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
#> [205] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
#> [222] 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
#> [239] 239 240
# Expected
ts(1:240, 2009, frequency=356.25/16)
#> Time Series:
#> Start = 2009 
#> End = 2019.73403508772 
#> Frequency = 22.265625 
#>   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
#>  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
#>  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
#>  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
#>  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
#>  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
#> [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
#> [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
#> [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
#> [154] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
#> [171] 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
#> [188] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
#> [205] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
#> [222] 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
#> [239] 239 240

^{Created on 2019-10-19 by the reprex package (v0.3.0)}

Add LICENSE file

Since most of R libs are licensed under GNU GPL v2/v3, we should use (one of) these before development.

bfastts: error if time series contains leap day

If we have the leap day (Feb 29) and also the day after (March 1) in the time series, bfastts fails with an error:

Error in merge.zoo(zoo(coredata(x), tt), zoo(, tt2)) : 
  series cannot be merged with non-unique index entries in a series
In addition: Warning messages:
1: In zoo(data, 1900 + as.POSIXlt(dates)$year + (yday365(dates) - 1)/365,  :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

MWE:
Fails:

a = c(as.Date("2016-02-29"), as.Date("2016-03-01"))
b = 1:2
bfastts(b, a, type="irregular")

Works if we set it to be one day later:

a = c(as.Date("2016-02-29"), as.Date("2016-03-01"))
b = 1:2
bfastts(b, a+1, type="irregular")

Add a bfast0n() method (breakpoints()-based) - multiple breakpoints

The method suggested by Achim to deal with NAs (breakpoints() over bfastpp()) should be added as a function.

Spatial bfast output / relation to bfastSpatial

For ease of applicability, having bfast generate spatial maps rather than only pixel time series would be good to have. bfastSpatial has some of that functionality, specifically with bfmSpatial, but it only deals with bfastMonitor output. Also, bfastSpatial is useful as a generic preprocessing framework, the output does not necessarily have to be used with bfast.

So an idea would be to add something like bfSpatial for the core bfast function. Perhaps with coordination with Loïc.

bfast package consolidation

Thanks to pull #1, we have an opportunity to consolidate all the changes across the different forks of bfast, with a view of eventually updating the package on CRAN.

For ease of reviewing and consolidating code, it would be nice to have collaboration across the different code authors on a single codebase. GitHub supports this somewhat, using protected branches so that the code does not conflict/does not get lost.

Suggestions about how best to go ahead with this are welcome!