Code Monkey home page Code Monkey logo

Comments (2)

SteffenMoritz avatar SteffenMoritz commented on May 25, 2024

Hello @MorganePhilipp ,

choosing the right algorithm is (similar to other machine learning task) quite dependent from the data you have. In general na_kalman is quite a good choice (but also the most computationally intensive, thus will not work for all datasets).

If you go through the list of papers citing imputeTS (https://scholar.google.com/scholar?um=1&ie=UTF-8&lr&cites=16876364094503492919) you will find some works doing comparisons with these algorithms. Unfortunately, none is a general algorithm comparison for different kinds of scenarios and data. Most of these comparisons are from authors introducing and benchmarking their own algorithms. Thus, the papers in these studies will (understandingly) mostly highlight the specific kind of data/scenario, where their new algorithm might be an improvement.

In general, you have to do a benchmark for your own data of different algorithms to find the one suited best.
Since the missing data are ultimately lost and you won't ever have a ground truth for these. So you have to find out from you existing data, which imputation method is best for your dataset.

The following procedure can be applied:

  1. Artificially create NAs in your existing data (for which you then know the ground truth)
    Here it is important to simulate the occurrence NAs similar to their real occurrence.
    If you always have long NA gaps with multiple consecutive missing values, simulate long gaps.

  2. Apply different imputation methods on the time series with the artificially missing values.
    Since you have the ground truth for your artificially introduced missing values, you can calculate an error/performance metric like RMSE, MAE.

  3. Do multiple simulation runs, so that the artificially introduced missing values are placed at several different locations. Each time calculate error metrics for the imputation algorithms you want to compare.

  4. Create an overall results table

from imputets.

MorganePhilipp avatar MorganePhilipp commented on May 25, 2024

Dear @SteffenMoritz,

Thank you very much for your detailed answer which confirms our initial intuition and the link to the list of different publications.

We were indeed thinking of using a method of this type. Thank you very much for giving us all the steps. We will implete this with our data. :)

Kind regards,
Morgane PHILIPP

from imputets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.