Comments (6)
Hi Florian,
I'm happy to learn the data helped you identify the problems :)
I had a hint the categorical breakpoint might not work but couldn't be sure as the for loop was acting weird. Didn't anticipate the x = y exception !
Thanks again for providing this package and taking the time to update and support it.
Cheers,
Alexander
from ppscore.
Yes, the data was very helpful - thank you for that!
from ppscore.
Hi Alex,
It would be great if you can share your data. You can choose any method that works for you eg a Google Drive Link, other file upload, send us the original source, ...
There is definitely some work to be done from our side to further clarify or catch the errors that are currently raised from sklearn.
The error "'DataFrame' object has no attribute 'dtype'" seems strange to me and it seems like internally .dtype is trying to be called on a DataFrame instead of a series. Is it possible that there are columns with the same name?
By the way, why are you trying to predict YearBuilt with all columns (x is YearBuilt) ? Don't you want to do the opposite? And try to predict YearBuilt with the other columns? (y is YearBuilt)?
Cheers,
Florian
from ppscore.
Hi Florian !
Thx for your answer. I've uploaded the csv and jupyter notebook to Google Drive. Here is the link : https://drive.google.com/open?id=127BwkUTcKF18_Kh599jHlsMVIFnpwU3S
The original data is available on Kaggle here :
https://www.kaggle.com/city-of-seattle/sea-building-energy-benchmarking#2015-building-energy-benchmarking.csv
The data I am using was obtained from the original by combining the 2015 and 2016 tables and cleaning them.
Yes, I also found it strange that the .dtype looks like it's called on a DataFrame. It's particularly weird as that happens only wen task is specified, not when task = None. The columns all have a unique name.
Using 'YearBuilt' to predict was just a test ! I wanted to understand which variable was being considered a categorical variable. My initial objective was to calculate the whole pps matrix and compare it to the correlation matrix to see if it could provide more insight, like you did in your article.
Cheers,
Alexander
from ppscore.
Hi Alexander,
thank you for sending over the data.
The first error appeared because the logic with overriding the categorical breakpoint does not seem to work. I will have to look into this again. It failed for the following code:
pps.score(df, x = 'YearBuilt', y = "NumberofBuildings")
And it worked when explicitly passing the task.
pps.score(df, x = 'YearBuilt', y = "NumberofBuildings", task="regression")
The second error is due to the fact that the for-loop resulted in the following:
pps.score(df, x = 'YearBuilt', y = "YearBuilt", task="regression")
Internally, this resulted into a dataframe with two identical columns and hence we saw the error. This needs to be fixed.
Cheers,
Florian
from ppscore.
There have been two issues here:
- adjusting the task which will be done in the future based on the dtype and not on numeric_breakpoints
- allowing
pps.score(df, x = 'YearBuilt', y = "YearBuilt")
from ppscore.
Related Issues (20)
- Data preprocessing and information leakage HOT 14
- [SUGGEST] Release a verson supported GPU HOT 2
- ppscore when model_score>baseline_score HOT 3
- There should be an option to override the attribute type like PyCaret HOT 4
- Scikit-learn dependency < 1.0.0 HOT 14
- [Suggestion]: Plot the Decision Tree for pps.score HOT 3
- Readme / docs unclear about using ppscore on time series data HOT 3
- pytests failing with pandas==1.4.0 HOT 1
- Thought on a possible enhancement of the PPS HOT 2
- What does PPS score? HOT 4
- Add support to release Linux aarch64 wheels HOT 4
- Cannot install ppscore HOT 1
- Your package isn't compatible with scikit-learn 1.0.1 HOT 2
- How to report PPS HOT 1
- Question About Data Order HOT 12
- y predicted values given x HOT 3
- Performance HOT 3
- differnt baseline scores for the same y HOT 1
- How to deal with heavy imbalanced data? For example, when the target is 99 "negative" to 1 "positive"
- pandas >2 support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ppscore.