Comments (6)
This has come up before, notably here: #21
Then it was implemented in the two encoders where I could figure out how to here: 056a483
For something like binary encoding the output columns don't map 1-to-1 with categories, so it doesn't quite work out.
Does that help?
from category_encoders.
Sorry, I missed that issue when looking.
I'd still like to have an option to use variable values instead of indices, and leave the responsibility to the user. (Though any string value is supported as a name in pandas, correct?) In my case, they're nearly always 'nice'. This is the standard behaviour of other things I've used (e.g. R or pandas.get_dummies
). Anyway, because this isn't supported, I'm rolling my own mixin (in part as I need to roll a custom one anyway for other things).
Happy to close this issue if you disagree.
from category_encoders.
No worries, if left as an option that defaults to current behavior I’d not be opposed to including that kind of thing for the encoders where it makes sense, if you’re interested in working on a PR
from category_encoders.
won't be able to offer a PR sorry (at least in next few months)
from category_encoders.
Ok, I can work on one if I get a chance, but am marking this as seeking a contributor in the meantime. If anyone would like to work on this, chime in here.
from category_encoders.
Merged PR #65
from category_encoders.
Related Issues (20)
- Sklearn pipeline compatibility and pandas dependencies
- Quadratic time intersection on Pandas categories HOT 2
- get_feature_names_out is incompatible with sklearn estimators and eli5, consequently HOT 3
- Equivalent method to sklearn's partial_fit? HOT 1
- CountEncoder incorrectly counts Timestamp columns HOT 3
- Target encoding categories with a single training example HOT 1
- DOC: one of the source links is dead HOT 1
- Missing text in documentation HOT 2
- Support Pandas 2.1 HOT 1
- Feature Request: Count-Based Target Encoder (Dracula)? HOT 1
- Pandas' string columns are not recognized HOT 3
- Pandas copy-on-write doesn't work properly HOT 2
- pd.NA should behave as np.nan HOT 5
- Multidimensional/composite target encoding HOT 4
- FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. HOT 2
- Support for Spark HOT 1
- EOF Error Raised while Calling HashingEncoders function HOT 6
- why we combine this library with main sklearn ? HOT 1
- catboost encoder get different result with catboost HOT 8
- Combining with set_output can produce errors HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from category_encoders.