Comments (2)
Hello @MorganePhilipp ,
choosing the right algorithm is (similar to other machine learning task) quite dependent from the data you have. In general na_kalman
is quite a good choice (but also the most computationally intensive, thus will not work for all datasets).
If you go through the list of papers citing imputeTS (https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16876364094503492919) you will find some works doing comparisons with these algorithms. Unfortunately, none is a general algorithm comparison for different kinds of scenarios and data. Most of these comparisons are from authors introducing and benchmarking their own algorithms. Thus, the papers in these studies will (understandingly) mostly highlight the specific kind of data/scenario, where their new algorithm might be an improvement.
In general, you have to do a benchmark for your own data of different algorithms to find the one suited best.
Since the missing data are ultimately lost and you won't ever have a ground truth for these. So you have to find out from you existing data, which imputation method is best for your dataset.
The following procedure can be applied:
-
Artificially create NAs in your existing data (for which you then know the ground truth)
Here it is important to simulate the occurrence NAs similar to their real occurrence.
If you always have long NA gaps with multiple consecutive missing values, simulate long gaps. -
Apply different imputation methods on the time series with the artificially missing values.
Since you have the ground truth for your artificially introduced missing values, you can calculate an error/performance metric like RMSE, MAE. -
Do multiple simulation runs, so that the artificially introduced missing values are placed at several different locations. Each time calculate error metrics for the imputation algorithms you want to compare.
-
Create an overall results table
from imputets.
Dear @SteffenMoritz,
Thank you very much for your detailed answer which confirms our initial intuition and the link to the list of different publications.
We were indeed thinking of using a method of this type. Thank you very much for giving us all the steps. We will implete this with our data. :)
Kind regards,
Morgane PHILIPP
from imputets.
Related Issues (20)
- na_kalman is slow for long time series HOT 8
- Feature: Allow bounded time series interpolation HOT 1
- plotNA.imputation etc. not working with par()/layout() HOT 1
- Detailed Model Summary in na_kalman() HOT 2
- Faceting HOT 2
- Able to install but not load HOT 6
- Suggestion: Applying the na_mean function considering only values from the same periods. HOT 1
- Documentation needs updating HOT 5
- Support imputing around a circle (e.g. wind direction) HOT 5
- Getting Error on part of my time series HOT 3
- Return fitting statistics and/or residuals HOT 2
- model0 or model In file na_kalman.R? HOT 1
- multiple imputations
- na_kalman: possible convergence problem: 'optim' gave code = 52 and message 'ERROR: ABNORMAL_TERMINATION_IN_LNSRCH'
- possible convergence problem: 'optim' gave code = 1 and message 'NEW_X'
- 'libRblas.so: No such file or directory' during package installation HOT 3
- Converting from ee.Image data to Numeric Vector (vector) or Time Series (ts) object HOT 2
- Error in `optim()`: ! L-BFGS-B needs finite values of 'fn' HOT 1
- could not find function "na_interpolation" HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imputets.