Code Monkey home page Code Monkey logo

Comments (3)

fwetdb avatar fwetdb commented on August 22, 2024 1

Thanks for sharing your input! Since this is not a typical situation, I would have to think more thoroughly about this. However, I am currently busy with other work. I might revisit this within the next weeks but I don't want to guarantee anything at this point.

from ppscore.

fwetdb avatar fwetdb commented on August 22, 2024

I would love to help out here but I am not sure how I can do that.
The answer heavily depends on your exact data and what you try to achieve with the ppscore.
Maybe you can provide more detail?

The short answer is:
Don't use ppscore on timeseries data at all - unless you have deep technical understanding of how ppscore works and how to interpret the results.

The long answer is:

  • The crossvalidation used within ppscore might "leak information" between folds that should not be leaked because of the timeseries structure e.g. in 1990 you did not know yet about the crashes of the financial industry in the 2000s
  • If this is a problem in your dataset, then ppscore might not be applicable at all - or at least lead to wrong/confusing statements. Adjusting sample size or the number of cross_validation folds won't help with this.
  • The solution might have to do with passing a new suitable CV object as proposed in #10 however, this is currently not a priority for us and if i remember correctly, the proposed solution would require too much of our attention to make sure it was done in a correct way. Maybe work on this will continue in the future

from ppscore.

lastmeta avatar lastmeta commented on August 22, 2024

I'm not in a typical data modeler situation here. I'm not using ppscore to build a model, I'm building a system that builds models. This means I can't answer your question about what kind of data I have, because I don't know ahead of time.

I have hundreds of thousands of datastreams of literally all kinds of data. What the data streams have in common is that every observation comes with a timestamp, that's it. the observations can be float, integer, binary, string, you name it.

I am looking for a metric (like ppscore) that I can use as a preliminary filter for the models built to predict these datastreams. I want something better than correlation, indeed I want something that shows non-linear correlations. Non-spurious, non-linear correlations are hard to detect between two datasets, unless perhaps the intermittent correlation is cyclical. Therefore, I believe the ideal solution is to take multiple datastreams into account at a time, but I don't know how to even approach that naively let alone efficiently.

So if the system wants to build a model targeting one datastream, it needs to find variables to use as input to the model. I have been scoring random datastreams against the target variable using ppscore and then making automated models with the streams that score in the top 10%. Those models then compete till I have just one.

So given that my use case is not intricately intertwined with the ppscore, that I'm using it as a preliminary filter, would you say its a good enough fit. from my vantage point its my only option aside from calculating some linear correlation score. I'm hoping the ppscore will, in some cases notice some predictive power between two datasets that a linear correlation would miss.

from ppscore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.