Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

How to choose the best algorithm ? about imputets HOT 2 CLOSED

steffenmoritz commented on May 25, 2024

How to choose the best algorithm ?

from imputets.

Comments (2)

SteffenMoritz commented on May 25, 2024

Hello @MorganePhilipp ,

choosing the right algorithm is (similar to other machine learning task) quite dependent from the data you have. In general na_kalman is quite a good choice (but also the most computationally intensive, thus will not work for all datasets).

If you go through the list of papers citing imputeTS (https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16876364094503492919) you will find some works doing comparisons with these algorithms. Unfortunately, none is a general algorithm comparison for different kinds of scenarios and data. Most of these comparisons are from authors introducing and benchmarking their own algorithms. Thus, the papers in these studies will (understandingly) mostly highlight the specific kind of data/scenario, where their new algorithm might be an improvement.

In general, you have to do a benchmark for your own data of different algorithms to find the one suited best.
Since the missing data are ultimately lost and you won't ever have a ground truth for these. So you have to find out from you existing data, which imputation method is best for your dataset.

The following procedure can be applied:

Artificially create NAs in your existing data (for which you then know the ground truth)
Here it is important to simulate the occurrence NAs similar to their real occurrence.
If you always have long NA gaps with multiple consecutive missing values, simulate long gaps.
Apply different imputation methods on the time series with the artificially missing values.
Since you have the ground truth for your artificially introduced missing values, you can calculate an error/performance metric like RMSE, MAE.
Do multiple simulation runs, so that the artificially introduced missing values are placed at several different locations. Each time calculate error metrics for the imputation algorithms you want to compare.
Create an overall results table

from imputets.

MorganePhilipp commented on May 25, 2024

Dear @SteffenMoritz,

Thank you very much for your detailed answer which confirms our initial intuition and the link to the list of different publications.

We were indeed thinking of using a method of this type. Thank you very much for giving us all the steps. We will implete this with our data. :)

Kind regards,
Morgane PHILIPP

from imputets.

Recommend Projects

How to choose the best algorithm ? about imputets HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent