Code Monkey home page Code Monkey logo

Comments (6)

alexandersmedley avatar alexandersmedley commented on July 21, 2024 1

Hi Florian,

I'm happy to learn the data helped you identify the problems :)

I had a hint the categorical breakpoint might not work but couldn't be sure as the for loop was acting weird. Didn't anticipate the x = y exception !

Thanks again for providing this package and taking the time to update and support it.

Cheers,

Alexander

from ppscore.

8080labs avatar 8080labs commented on July 21, 2024 1

Yes, the data was very helpful - thank you for that!

from ppscore.

8080labs avatar 8080labs commented on July 21, 2024

Hi Alex,

It would be great if you can share your data. You can choose any method that works for you eg a Google Drive Link, other file upload, send us the original source, ...

There is definitely some work to be done from our side to further clarify or catch the errors that are currently raised from sklearn.

The error "'DataFrame' object has no attribute 'dtype'" seems strange to me and it seems like internally .dtype is trying to be called on a DataFrame instead of a series. Is it possible that there are columns with the same name?

By the way, why are you trying to predict YearBuilt with all columns (x is YearBuilt) ? Don't you want to do the opposite? And try to predict YearBuilt with the other columns? (y is YearBuilt)?

Cheers,
Florian

from ppscore.

alexandersmedley avatar alexandersmedley commented on July 21, 2024

Hi Florian !

Thx for your answer. I've uploaded the csv and jupyter notebook to Google Drive. Here is the link : https://drive.google.com/open?id=127BwkUTcKF18_Kh599jHlsMVIFnpwU3S

The original data is available on Kaggle here :
https://www.kaggle.com/city-of-seattle/sea-building-energy-benchmarking#2015-building-energy-benchmarking.csv

The data I am using was obtained from the original by combining the 2015 and 2016 tables and cleaning them.

Yes, I also found it strange that the .dtype looks like it's called on a DataFrame. It's particularly weird as that happens only wen task is specified, not when task = None. The columns all have a unique name.

Using 'YearBuilt' to predict was just a test ! I wanted to understand which variable was being considered a categorical variable. My initial objective was to calculate the whole pps matrix and compare it to the correlation matrix to see if it could provide more insight, like you did in your article.

Cheers,

Alexander

from ppscore.

8080labs avatar 8080labs commented on July 21, 2024

Hi Alexander,

thank you for sending over the data.

The first error appeared because the logic with overriding the categorical breakpoint does not seem to work. I will have to look into this again. It failed for the following code:
pps.score(df, x = 'YearBuilt', y = "NumberofBuildings")
And it worked when explicitly passing the task.
pps.score(df, x = 'YearBuilt', y = "NumberofBuildings", task="regression")

The second error is due to the fact that the for-loop resulted in the following:
pps.score(df, x = 'YearBuilt', y = "YearBuilt", task="regression")
Internally, this resulted into a dataframe with two identical columns and hence we saw the error. This needs to be fixed.

Cheers,
Florian

from ppscore.

FlorianWetschoreck avatar FlorianWetschoreck commented on July 21, 2024

There have been two issues here:

  • adjusting the task which will be done in the future based on the dtype and not on numeric_breakpoints
  • allowing pps.score(df, x = 'YearBuilt', y = "YearBuilt")

from ppscore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.