steffenmoritz / imputets Goto Github PK

View Code? Open in Web Editor NEW

157.0 9.0 25.0 172.31 MB

CRAN R Package: Time Series Missing Value Imputation

Home Page: http://steffenmoritz.github.io/imputeTS/

License: GNU General Public License v3.0

R 97.96% C++ 2.04%

missing-data imputation data-visualization imputation-algorithm imputets cran time-series

imputets's Introduction

imputeTS: Time Series Missing Value Imputation

The imputeTS package specializes on (univariate) time series imputation. It offers several different imputation algorithm implementations. Beyond the imputation algorithms the package also provides plotting and printing functions of time series missing data statistics. Additionally three time series datasets for imputation experiments are included.

Installation

The imputeTS package can be found on CRAN. For installation execute in R:

 install.packages("imputeTS")

If you want to install the latest version from GitHub (can be unstable) run:

library(devtools)
install_github("SteffenMoritz/imputeTS")

Usage

Imputation

To impute (fill all missing values) in a time series x, run the following command:
```
 na_interpolation(x)
```
Output is the time series x with all NA's replaced by reasonable values.

This is just one example for an imputation algorithm. In this case interpolation was the algorithm of choice for calculating the NA replacements. There are several other algorithms (see also under caption "Imputation Algorithms"). All imputation functions are named alike starting with na_ followed by a algorithm label e.g. na_mean, na_kalman, ...
Plotting

To plot missing data statistics for a time series x, run the following command:
```
 ggplot_na_distribution(x)
```

This is also just one example for a plot. Overall there are four different types of missing data plots. (see also under caption "Missing Data Plots").

Printing

To print statistics about the missing data in a time series x, run the following command:
```
 statsNA(x)
```
Datasets

To load the 'heating' time series (with missing values) into a variable y and the 'heating' time series (without missing values) into a variable z, run:
```
 y <- tsHeating
 z <- tsHeatingComplete
```
There are three datasets provided with the package, the 'tsHeating', the 'tsAirgap' and the 'tsNH4' time series. (see also under caption "Datasets").

Imputation Algorithms

Here is a table with available algorithms to choose from:

Function	Description
na_interpolation	Missing Value Imputation by Interpolation
na_kalman	Missing Value Imputation by Kalman Smoothing
na_locf	Missing Value Imputation by Last Observation Carried Forward
na_ma	Missing Value Imputation by Weighted Moving Average
na_mean	Missing Value Imputation by Mean Value
na_random	Missing Value Imputation by Random Sample
na_remove	Remove Missing Values
na_replace	Replace Missing Values by a Defined Value
na_seadec	Seasonally Decomposed Missing Value Imputation
na_seasplit	Seasonally Splitted Missing Value Imputation

This is a rather broad overview. The functions itself mostly offer more than just one algorithm. For example na_interpolation can be set to linear or spline interpolation.

More detailed information about the algorithms and their options can be found in the imputeTS reference manual.

Missing Data Plots

Here is a table with available plots to choose from:

Function	Description
ggplot_na_distribution	Visualize Distribution of Missing Values
ggplot_na_distribution2	Missing Values Summarized in Intervals
ggplot_na_gapsize	Visualize Distribution of NA Gapsizes
ggplot_na_imputations	Visualize Imputed Values

More detailed information about the plots can be found in the imputeTS reference manual.

Datasets

There are three datasets (each in two versions) available:

Dataset	Description
tsAirgap	Time series of monthly airline passengers (with NAs)
tsAirgapComplete	Time series of monthly airline passengers (complete)
tsHeating	Time series of a heating systems supply temperature (with NAs)
tsHeatingComplete	Time series of a heating systems supply temperature (complete)
tsNH4	Time series of NH4 concentration in a wastewater system (with NAs)
tsNH4Complete	Time series of NH4 concentration in a wastewater system (complete)

The tsAirgap, tsHeating and tsNH4 time series are with NAs. Their complete versions are without NAs. Except the missing values their versions are identical. The NAs for the time series were artifically inserted by simulating the missing data pattern observed in similar non-complete time series from the same domain. Having a complete and incomplete version of the same dataset is useful for conducting experiments of imputation functions.

More detailed information about the datasets can be found in the imputeTS reference manual.

Reference

You can cite imputeTS the following:

Moritz, Steffen, and Bartz-Beielstein, Thomas. "imputeTS: Time Series Missing Value Imputation in R." R Journal 9.1 (2017). doi: 10.32614/RJ-2017-009.

Need Help?

If you have general programming problems or need help using the package please ask your question on StackOverflow. By doing so all users will be able to benefit in the future from your question.

Don't forget to mark your question with the imputets tag on StackOverflow to get me notified

Support

If you found a bug or have suggestions, feel free to get in contact via steffen.moritz10 at gmail.com.

All feedback is welcome

Version

3.3

License

GPL-3

imputets's People

Contributors

Stargazers

Watchers

imputets's Issues

Conflict with spacetime/raster

Maybe not an issue but using ImputeTS, Spacetime together, I am facing conflicts from one of the packages ImputeTS is depending on.

I am building a rasterbrick and clean it with ImputeTS, then I am converting it into Spacetime and it appears the following error message.

Found more than one class "xts" in cache; using the first, from namespace 'spacetime' Also defined by ‘quantmod’

Can you recall this issue?

multiple imputations

This is a very useful package. Appreciate this contribution. I would like to know whether is it possible to do multiple imputations using this package?
Any guidance/directions about the possibilities of doing that would be greatly appreciated.

Also, I would like to know ideas about checking for the missing mechanism (MCAR, MAR, MNAR), particularly for a univariate time series.

Thanks in advance!

Faceting

Hi @SteffenMoritz. Thank you for the nice package. Is it possible to facet using ggplot2 facet functions and the plotting functions from imputeTS package? Something like.

ggplot_na_distribution()+
  facet_wrap()/facet_grid()

zoo, xts compatibility for plotting functions

Currently the plotting functions do not work for multivariate input.
(this is intentional)

The plotting functions will also not work for certain univariate zoo, xts series inputs
(this could possibly be improved)

At least an error message could be provided, if the input series is of the wrong type.

How to choose the best algorithm ?

Dear @SteffenMoritz ,
I am an AgroParisTech student and with 4 of my classmates we are working on a project involving time series with many missing values.
We were thinking of using the package you developed to impute the missing values (which is really very usefull and easy to handle :) ), however we don't know which method to use. Do you know if a comparative study has already been conducted on the different imputation algorithms proposed by the package?

Kind regards,
Morgane Philipp

na.kalman function modified the original numeric vector

Dear Mr. Moritz,

I am testing the na.kalman function from the imputeTS package. It seems like after running the na.kalman and saving the outcome to a new variable, it also changes the original numeric vector. I described this behavior in this post (http://stackoverflow.com/questions/43478244/strange-behavior-of-the-na-kalman-function-from-the-r-imputets-package) with more details. Could you check if there is a way to ask na.kalman not to change the original vector?

Thank you for your time and consideration.

All the best

Yu-Chen

Add examples with pipes

Add some examples, with pipes to the documentation.
This should help users not too familiar with pipes to use these.

Example_
tsAirgap %>% na.mean %>% plotNA.imputations(x.withNA = tsAirgap,x.withImputations = . )

Able to install but not load

Looks like imputeTS relies on some exportable ggplot2 function that is no longer available?

Error: package or namespace load failed for ‘imputeTS’:
object ‘after_stat’ is not exported by 'namespace:ggplot2'

na_kalman: possible convergence problem: 'optim' gave code = 52 and message 'ERROR: ABNORMAL_TERMINATION_IN_LNSRCH'

Sometime this warning appears:

possible convergence problem: 'optim' gave code = 52 and message 'ERROR: ABNORMAL_TERMINATION_IN_LNSRCH'

e.g. for:

x <- tsAirgap
x[1:2] <- NA
na_kalman(x)

This issue only arises for very specific inputs. It is an indication/warning, that fitting the structural time series model did not work out ideally.

As it is only a warning and still leads to resulting imputations ignoring this message is a possibility.
(in this case you should check very closely that the resulting imputations are reasonable)
.
The warning itself does not come from imputeTS directly, but from a package that is called within imputeTS.
(the function StructTS from the package stats, which uses a call to optim, where the warning is given)

This optim function is used to fit a structural time series model, as fitting the best model is an optimization task.
For some inputs this optimization does not converge (sometimes also just not fast enough).

Best Workaround (in case the imputation results are poor) is probably to just use another imputation function.

An advanced fix would be trying to set parameters for optim - using parameter pass through in imputeTS to underlying functions.
StructTS has an optim.control parameter, which can be used to supply a list of control parameters to 'optim'. This one can be used in the imputeTS call to adjust the optimization process (example see below).

E.g. it can be tried if the optimization converges with more iterations:
(a lot more optimization control parameters can be specified - see under Details in the optim documentation https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/optim )

Example of advanced parameter pass through:

# create list with optim.control parameters
new_param <- list(maxit = 3000)

# Call na_kalman additionally supplying the specified list as optim.control variable
na_kalman(your_data, optim.control = new_param)

This parameter pass through will not always lead to the result that the message disappears and the convergence issues are solved, but sometimes it helps.

Need libcurl development package to install

The first attempt of installation failed, because the installation of dependencies failed. Turned out you need a development package of libcurl to build. Although in the messages that fly by on the screen do mention

------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libcurl was not found. Try installing:
 * deb: libcurl4-openssl-dev (Debian, Ubuntu, etc)
 * rpm: libcurl-devel (Fedora, CentOS, RHEL)
 * csw: libcurl_dev (Solaris)
If libcurl is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a libcurl.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------

the conclusion at the end of the high-speed scrolling do not. Would it be possible to mention this in the installation notes?

Feature: Allow bounded time series interpolation

Hello,

Thanks for this great package! I am currently using the na_kalman() function to impute missing glucose readings for diabetes patients. Since there is no option to fix the upper and lower bounds for the imputed values, the imputed data sometimes takes values (e.g., <0) that are physiologically not possible. At the moment, I am replacing such values with reasonable numbers once I am done with the interpolation, but I think it will be useful to add arguments to na_kalman and friends to provide bounds for the imputed values.

Thanks!

Add white noise to series

Imputed values look totally smoothed out for a lot of algorithms and because of that those values do not respect the statistic of the rest of the signal.

This will for example influence standard deviation of the series...

Interesting solutions to this could be an option for adding "white noise" with roughly the same standard deviation as the rest of the time series.

Offering Multiple Imputations could be another option.
Downside here is the MI concept is often misunderstood by users and this solution would harm the simplicity of use.

plotNa.distributionBar x-axis labels

Replace x-axis labels with time units (month/year or only year, depending on the timescale).
(this seems more intuitive)

Only possible if time information is given within the time series object (ts, zoo, timeSeries).

speed up na.interpolation

linear and spline interpolation can probably be made faster performance wise
(slightly different interpolation implementations from zoo package seem to be a little bit quicker)

na_replace doesn't allow replacement full NA vector

Dear Steffen,

Thanks for this nice package. Is there a reason the na_replace function doesn't allow for a vector filled with only NA's to be replaced with values?

If so, the error message
if (all(missindx)) { stop("Input data has only NAs. Input data needs at least 1 non-NA data point for applying na_mean") }
seems to contain an type. Shouldn't it be: ... for applying _na_replace_ ?

If not, could the check # 1.3 Check for algorithm specific minimum amount of non-NA values be removed? (If you prefer, I will happily do a PR to adjust is myself).

Cheers,

Jan

Add na.stl algorithm

Provide a algorithm na.stl / na.loess / na.decomposition
that decomposes the series in 3 components and user is able to choose a different imputation algorithm for each component.

C++ implementation of na.locf

My tests showed that this will speed up na.locf by factor 10-100

Test all Datasets

Test if all algortihms will end (at least during 2 days) on each provided time series.
(also on the very large one)

Spelling mistakes

Minor, but spelling mistakes in the error messages below. Change "saisonal" to "seasonal".

https://github.com/SteffenMoritz/imputeTS/blob/master/R/na.seadec.R#L101
https://github.com/SteffenMoritz/imputeTS/blob/master/R/na.seasplit.R#L105

Package install error

Hi
I'm try to install >install_github("SteffenMoritz/imputeTS")<
but error ocurred
how to fix it

Thanks
Sakda

2018-06-07 14:31:47 (1.35 MB/s) - ‘/tmp/RtmprcTMQj/forecast_8.3.tar.gz’ saved [601570/601570]

Installing forecast
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore
--quiet CMD INSTALL '/tmp/RtmprcTMQj/devtools21d72e52dd39/forecast'
--library='/home/---/R/x86_64-pc-linux-gnu-library/3.2'
--install-tests

ERROR: dependency ‘RcppArmadillo’ is not available for package ‘forecast’

removing ‘/home/---/R/x86_64-pc-linux-gnu-library/3.2/forecast’
Installation failed: Command failed (1)
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore
--quiet CMD INSTALL
'/tmp/RtmprcTMQj/devtools21d766fca16e/SteffenMoritz-imputeTS-d6a032e'
--library='/home/--/R/x86_64-pc-linux-gnu-library/3.2'
--install-tests

ERROR: dependency ‘forecast’ is not available for package ‘imputeTS’

removing ‘/home/--/R/x86_64-pc-linux-gnu-library/3.2/imputeTS’
Installation failed: Command failed (1)

Automated Testing: testthat / coveralls

covr package
testhat package

Multiple seasonality handling msts

Hi, is it possible to do a seasonal decomposition imputation for complex time series (msts class)?. For example a time series that has a hourly, weekly and yearly seasonality. Thanks!

plotNA.distrubitionBar default value 'breaks' parameter

Why do you have a hard coded default value for breaks = 10 in plotNA.distributionBar? Especially if you could use an existing function to compute a "optimal" number of cells, like nclass.Sturges(x) which is the default for the number of classes in hist(x, breaks="Sturges",...).

Auto detect seasonality

Internal function for detecting the seasonality of a time series if not given.
Provided as option for na.seadec and na.seasplit.
(forecast package has similar function - reuse?)

model0 or model In file na_kalman.R?

Line 200 of file na_kalman.R:
mod <- stats::StructTS(data, ...)$model0
where model0 is the initial state of the filter, not the final state fitted by maximum likelihood .
Should we change model0 to model?

Warning for osx CRAN check

Seems not to influence the user itself.

Check Details

Version: 2.3
Check: re-building of vignette outputs
Result: WARN
Error in re-building vignettes:
...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'imputeTS-Time-Series-Missing-Value-Imputation-in-R.tex' failed.
LaTeX errors:
! LaTeX Error: File `titlesec.sty' not found.

Possible fix is already added to github development version.

Hopefully this warning will disappear with the next update.

na.locf seems to be editing the original object

bar <-  foo<- cumsum(rnorm(100));
bar[sample(1:100,20)]   <-  NA;
baz <-  bar;
na.locf(bar);
identical(bar, baz);
sum(is.na(bar));
sum(is.na(baz));

Expected result: bar and baz are identical and have same number of NAs. Actual result: the opposite of that in both cases. I've never seen a bug like that, I don't think I could even do that on purpose without at least a substitute statement someplace. Took forever to figure out why my variables we getting permanently changed.

Stineman interpolation throwing error

I'm getting the following error when I try to use Stineman interpolation. I switched to a spline without affecting my data, but I wanted to document that this isn't working for me.

imputeTS::na.interpolation(site$t_hr_avg, option = "stein")
Error in imputeTS::na.interpolation(site$t_hr_avg, option = "stein") : 
Wrong parameter 'option' given. Value must be either 'linear' or 'spline'.

na.ma performance issue with trailing NA's

I am currently facing a special case for the na.ma algorithm, where all NA values are located at the end of the vector.

This causes the algorithm to perform much slower, since it mostly spends the time in this while loop to increase ktemp and then ends up filling the next ~1000 trailing NA's with the same value and then with 0.

Here is a small example of that problem:

data <- c(sample(1:3000, 200, T), rep(NA, 2000))
res <- imputeTS::na.ma(data)
plot(res)

Could/Should we maybe check if all remaining values are NA and then just assign them the weighted mean of the k last non-NA values?

Or do you maybe have a better idea how this problem can be bypassed?

Detailed Model Summary in na_kalman()

The interpolation result by na_kalman() is pretty good when the option model='auto.arima' is used. Is it possible to show the searched parameter results of auto.arima()?

Did the package use the default search parameter settings of auto.arima() in the forecast package?

Thanks in advance.

Add pattern based imputatio algorithm

Converting from ee.Image data to Numeric Vector (vector) or Time Series (ts) object

Hello everyone,
I have an 'ee.Image' data from MODIS Satellite. I'm writing my codes with 'rgee packages'. I want to calculate the missing values in the image using ImputeTS but firstly, I have to convert from ee.Image data to vector or ts object.
How can I do?
Thank you:)

Suggestion: Applying the na_mean function considering only values from the same periods.

When using the na_mean function for a time series, I discovered the need to calculate the value of NAs considering only the previous values for the same period. For example, for a quarterly time series, with data for three years, and in the last year , all values are NAs. To calculate the value of each quarter, only the values referring to the same quarter of previous years are considered. Below is the example code:

data_ts <- ts(data = c(2,4,6,8, 4,12,14,16, NA,NA,NA,NA), frequency = 4)
dtmissing <- is.na(data_ts)
for (i in 1:frequency(data_ts)) {
slcMonth <- cycle(data_ts) == i
data_ts[slcMonth & dtmissing] <- median(data_ts[slcMonth & !dtmissing], na.rm = T)
}

plotNA.imputation etc. not working with par()/layout()

Dear Steffen,

Thank you for the package.

Minor thing - it seems that plotNA.imputations() does not work with par()/layout(). When plotting together with non-imputed series (plot()), it seems that R chooses to show either plotNA.imputations() or the standard plot() depending on which one i used as the last line of code.

Best Regards,
Bjarke Ahm

na.forecast

provide a imputation algorithm that works based on forecast / backcast combinations
(using forecast functions from forecast package)

Support imputing around a circle (e.g. wind direction)

Feature request

Thanks for making imputeTS, it is really useful and easy to use.

Please consider adding a function to inpute around a circle. I'm thinking specifically of wind direction data in degrees (ranging 0 to 360), with a simple linear imputation that automatically selects the shortest distance around two points on the circle. It seems pretty simple, but I haven't been able to track down a simple solution in R. There is a relevant discussion here (https://stackoverflow.com/questions/9505862/shortest-distance-between-two-degree-marks-on-a-circle ).

possible convergence problem: 'optim' gave code = 1 and message 'NEW_X'

Sometimes this warning appears:

possible convergence problem: 'optim' gave code = 1 and message 'NEW_X'

Only appears with very specific input data. Very likely related to this issue with na_kalman (#60). Root cause is no convergence in optim function used by an underlying package. Hard to fix this in a reasonable way.

Furthermore, as can be seen here (https://stat.ethz.ch/pipermail/r-help/2013-June/354614.html) only appears with particular compilations of both 'R' and 'Rblas.dll'.

As it is only a warning and still leads to resulting imputations you might want to ignore this message.
(but you should check more closely than usually that the resulting imputations are reasonable)

Best workaround (in case the results are poor) is probably to just use another imputation function.

As the warning comes from a function call inside an underlying package (base R / stats package) there won't be update in the imputeTS package (since this would also only a workaround).

Unit tests Fail for systems without long double

CRAN check gives an error in the unit tests for systems without long double.

Problem does not influence the user.
No false results are given.

Probably there are differences in the decimals for systems without long double.
Might be easy to fix, by lowering the requested precision for the decimals in the unit checks.

Since I have no system to test this, it might take a little bit longer to fix this.

Test cases for advanced time series objects

Planning on test cases to see if all:

ts
mts
zoo
timeSeries
...

do work with imputeTS

Documentation needs updating

I believe the documentation needs to be updated. Currently the doc for the na_ functions reads:

"Both options have the issue, that NAs at the beginning (or for nocb at the end) of the time series cannot be imputed (since there is no last value to be carried forward present yet). In this case there are remaining NAs in the imputed time series. Since this only concerns very few values at the beginning of the series, na_remaining offers some quick solutions to get a series without NAs back."

This no longer appears to be true. It seems NA values at the begging or end of a time series are now replaced.

I would personally love to see an option to replace or not replace NAs at the beginning or end of a time series (i.e. only replace NAs once non-NA values commence).

Was this handled differently under a previous version that I can still access?

na_kalman is slow for long time series

Hi @SteffenMoritz

Thanks for the amazing package ImputeTS. However, I found it to be slow when imputing long time series (~3000 daily data) with na_kalman(x, model = "StructTS")

Below is the reproducible example:
series <- ts(rnorm(3000), start = c(2000, 1), frequency = 365.25)
sample <- sample(1:3000, 900)
series[sample] <- NA
na_kalman(series)

Is there any planned development to solve these performance issues when encountering long time series?

Add error messages for minimum required non-NA Values

Would be nice, if the package directly gives a error message, when the chosen method is not able to work properly because the time series is too short.

na.interpolation: "need at least two non-NA values to interpolate"
na.kalman: around 3 values + possible infinite loop
na.mean: 1 non NA value
na.locf: 1 non NA value
na.ma: unclear / possible errors for very small inputs (time series of length smaller parameter k)

Getting Error on part of my time series

Hi Steffen and Bartz,

I got an error after running the na_kalman from imputeTS library. This function works on part of the xts time series object I was testing, although it returns an error if applied on all of it.

The error is:

Error in if (frequency > 1 && 0 < (d <- abs(frequency - round(frequency))) && :
missing value where TRUE/FALSE needed

This is the structure of the its object:

str(SCADA_xts)

An ‘xts’ object on 2017-04-18 20:00:00/2019-07-31 20:00:00 containing:
Data: num [1:64120, 1] 198348 198618 195981 188490 186822 ...
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Toronto
xts Attributes:
NULL

Do you have any idea what might be wrong with my data?

Thanks and best regards,

Mehrdad

'libRblas.so: No such file or directory' during package installation

Hi,

first of all thanks for maintaining this great package!

On Ubuntu 22.04 LTS with a freshly installed R using the regular sudo apt install r-base-core I tried to install.packages('imputeTS') and received this error:

Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/home/jay/R/x86_64-pc-linux-gnu-library/4.2/fracdiff/libs/fracdiff.so':
  libRblas.so: cannot open shared object file: No such file or directory
Calls: <Anonymous> ... asNamespace -> loadNamespace -> library.dynam -> dyn.load
Execution halted

The sessionInfo(), though, shows that BLAS and LAPACK actually are find OpenBLAS which I think is what libRblas.so is referring to.

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

I elaborate on this a little further in this Stack Overflow question. There is an answer that suggests a package bug which is why I'm opening a ticket here.

Thanks for any help!

All the best

Add confidence intervals for imputations

Something similar to the forecast package would be nice.
(forecast has also nice plots with confidence intervals)

na.seadec replaces zero values

A error report from stackoverflow:
Unfortunately, I have found that na.seadec replaces zero values as well as NA values. – CameronNemo

could not find function "na_interpolation"

Until a few days ago I could use na.interpolation function just fine... Today it stopped working so I looked online and saw it was changed to na_interpolation. I tried this and it gave me the error:

Error in na.interpolation(comparSpline[v, ], option = "spline") :
could not find function "na.interpolation"

Is there a newer version of this function I am not aware of? As far as I can tell, the imputeTS package was successfully installed. I'm a bit lost. Thanks in advance for your help.

Return fitting statistics and/or residuals

Hi Steffen,

Thanks for this great package which greatly facilitates imputations.

This is a feature request. Would it be possible to return the fitting statistics and/or residuals from the subroutines? For example, na_kalman() uses stats::StructTS() and forecast::auto.arima(), both of which return the log-likelihood and the residuals. These can be useful to compare fitting methods.

Thank you very much.

Best,
Hung

Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0, : L-BFGS-B needs finite values of 'fn'

> data_tm1
# A tibble: 1,151,978 x 15
   index               BidOpen BidHigh BidLow BidClose AskOpen AskHigh AskLow AskClose  year  week
   <dttm>                <dbl>   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>  <dbl>    <dbl> <dbl> <dbl>
 1 2014-12-29 00:01:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 2 2014-12-29 00:02:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 3 2014-12-29 00:03:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 4 2014-12-29 00:04:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 5 2014-12-29 00:05:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 6 2014-12-29 00:06:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 7 2014-12-29 00:07:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 8 2014-12-29 00:08:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
 9 2014-12-29 00:09:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
10 2014-12-29 00:10:00    120.    120.   120.     120.    120.    120.   120.     120.  2015    53
# ... with 1,151,968 more rows, and 4 more variables: bias.open <dbl>, bias.high <dbl>,
#   bias.low <dbl>, bias.close <dbl>
> data_tm1_NA <- data_tm1 %>% 
+   dplyr::select(BidOpen, BidHigh, BidLow, BidClose, 
+                 AskOpen, AskHigh, AskLow,  AskClose) %>% 
+   prodNA(noNA = 0.01) %>% 
+   cbind(data_tm1[1], .) %>% tbl_df
> 
> data_tm1_1_tidyr <- data_tm1_NA %>% 
+   fill(BidOpen, BidHigh, BidLow, BidClose, 
+        AskOpen, AskHigh, AskLow, AskClose) %>% #default direction down
+   fill(BidOpen, BidHigh, BidLow, BidClose, 
+        AskOpen, AskHigh, AskLow, AskClose, .direction = 'up')
> 
> data_tm1_1_tidyr %>% anyNA
[1] FALSE
> 
> data_tm1_1_tidyr %<>% mutate(
+   bias.open = if_else(AskOpen>AskHigh|AskOpen<AskLow, 1, 0), 
+   bias.high = if_else(AskHigh<AskOpen|AskHigh<AskLow|AskHigh<AskClose, 1, 0), 
+   bias.low = if_else(AskLow>AskOpen|AskLow>AskHigh|AskLow>AskClose, 1, 0), 
+   bias.close = if_else(AskClose>AskHigh|AskClose<AskLow, 1, 0))
> 
> data_tm1_1_tidyr %>% 
+   dplyr::filter(bias.open==1|bias.high==1|bias.low==1|bias.close==1)
> 
> data_tm1_1_tidyr %<>% 
+   summarise(
+     AskOpen = mean((AskOpen - data_m1$AskOpen)^2), 
+     AskHigh = mean((AskHigh - data_m1$AskHigh)^2), 
+     AskLow = mean((AskLow - data_m1$AskLow)^2), 
+     AskClose = mean((AskClose - data_m1$AskClose)^2), 
+     Mean.HLC = (AskHigh + AskLow + AskClose)/3, 
+     Mean.OHLC = (AskOpen + AskHigh + AskLow + AskClose)/4, 
+     bias.open = sum(bias.open)/length(bias.open), 
+     bias.high = sum(bias.high)/length(bias.high), 
+     bias.low = sum(bias.low)/length(bias.low), 
+     bias.close = sum(bias.close)/length(bias.close)) %>% tbl_df
> 
> data_tm1_1_tidyr %>% 
+   kable(caption = 'MSE') %>% 
+   kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
+   scroll_box(width = '100%')#, height = '400px')
> data_m1_NA <- data_m1 %>% prodNA(noNA = 0.1)
> data_m1_10_impTS <- llply(algo, function(x) {
+   data_m1_NA %>% 
+     dplyr::select(starts_with('Ask'), starts_with('Bid')) %>% 
+     map(na.seadec, algorithm = x) %>% as.tibble
+   })
Error in optim(init[mask], getLike, method = "L-BFGS-B", lower = rep(0,  : 
  L-BFGS-B needs finite values of 'fn'

I noticed that sometimes there will be error prompt me when I am using na.seadec(x, algorithm = x).

Some gap sizes are not shown in plotNA.gapsize()?

The plotNA.gapsize() plots 10 gap sizes, but the full length of gaps is 23 instead? The range between 6 and 26 seems to be ignored.

library(imputeTS)
#> Registered S3 method overwritten by 'xts':
#>   method     from
#>   as.zoo.xts zoo
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> Registered S3 methods overwritten by 'forecast':
#>   method             from    
#>   fitted.fracdiff    fracdiff
#>   residuals.fracdiff fracdiff
plotNA.gapsize(tsNH4)

rle_na <- rle(is.na(tsNH4))
table(rle_na$lengths[rle_na$values])
#> 
#>   1   2   3   4   5   6   7   8   9  10  11  12  14  16  17  21  25  26 
#>  68  26  16  10   8   4   2   3   2   1   1   2   1   1   1   1   1   1 
#>  27  32  42  91 157 
#>   1   1   2   1   1

^{Created on 2019-07-08 by the reprex package (v0.3.0)}

steffenmoritz / imputets Goto Github PK

imputets's Introduction

imputeTS: Time Series Missing Value Imputation

Installation

Usage

Imputation

Plotting

Printing

Datasets

Imputation Algorithms

Missing Data Plots

Datasets

Reference

Need Help?

Support

Version

License

imputets's People

Contributors

Stargazers

Watchers

Forkers

imputets's Issues

Recommend Projects

Recommend Topics

Recommend Org