Comments (6)
Should we stick with
transform_
for the attribute, or do you think we should go withtransformer_
?
Ooo, maybe transformer_
is better to pair with the class names (CCATransformer
, MahalanobisTransformer
, etc.). Maybe we've gone back and forth on this one 😉
from sknnr.
@aazuspan, I like it! I especially like the prevention of instantiation by code rather than by convention and, as you say, it gets rid of some duplicated code as well. A couple of quick questions/comments just to make sure I get it:
GNN
andMSN
would be, for example,GNNRegressor(YFitMixin, TransformedKNeighborsRegressor)
, correct? Just wanted to verify thatYFitMixin.fit
gets called beforeTransformedKNeighborsRegressor.fit
...- I don't know if this was intentional, but
_set_fit_transform
could be confused with the more genericfit_transform
method defined in each transformer, which fits/transforms in a single step. But I think the intention of_set_fit_transform
is really just to set the transform? I actually wonder if it's even necessary to have that method - perhaps this instead? Is there an advantage to this that I'm overlooking?
class TransformedKNeighborsRegressor(RawKNNRegressor, ABC):
...
@abstractmethod
def _get_transform(self) -> TransformerMixin:
"""Return the transformer to use for fitting. Must be implemented by subclasses."""
...
def fit(self, X, y):
"""Fit using transformed feature data."""
self._validate_data(X, y, force_all_finite=True, multi_output=True)
self._set_dataframe_index_in(X)
self.transform_ = self._get_transform().fit(X, y)
X_transformed = self.transform_.transform(X)
return super().fit(X_transformed, y)
from sknnr.
GNN and MSN would be, for example, GNNRegressor(YFitMixin, TransformedKNeighborsRegressor), correct?
Bingo!
I don't know if this was intentional, but _set_fit_transform could be confused with the more generic fit_transform method defined in each transformer, which fits/transforms in a single step. But I think the intention of _set_fit_transform is really just to set the transform?
Yes, great point! That was unintentional - we should definitely choose a better name. Is _set_fitted_transform
still too similar? Or _set_fitted_transformer
(probably combined with _get_transformer
)?
I actually wonder if it's even necessary to have that method - perhaps this instead? Is there an advantage to this that I'm overlooking?
Separating out the transformer fitting is a workaround to an inheritance problem that arises when YFitMixin.fit
calls super().fit
, which calls TransformedKNeighborsRegressor.fit
and refits the transformer with the wrong y data. I also played around with 1) adding a reset=True
argument to TransformedKNeighborsRegressor.fit
to avoid resetting the transformer if called intentionally, and 2) having TransformedKNeighborsRegressor.fit
accept **kwargs
and fit the transformer with y_fit
if present. Those generally worked, but I wanted to keep as much of the y_fit
logic out of TransformedKNeighborsRegressor
as possible, and having a separate fitting method allowed that.
It's 100% possible there's a better workaround that I didn't think of though!
from sknnr.
Yes, great point! That was unintentional - we should definitely choose a better name. Is
_set_fitted_transform
still too similar? Or_set_fitted_transformer
(probably combined with_get_transformer
)?
I guess I have a slight preference for the latter, but either is fine by me.
Separating out the transformer fitting is a workaround to an inheritance problem that arises when
YFitMixin.fit
callssuper().fit
, which callsTransformedKNeighborsRegressor.fit
and refits the transformer with the wrong y data. I also played around with 1) adding areset=True
argument toTransformedKNeighborsRegressor.fit
to avoid resetting the transformer if called intentionally, and 2) havingTransformedKNeighborsRegressor.fit
accept**kwargs
and fit the transformer withy_fit
if present. Those generally worked, but I wanted to keep as much of they_fit
logic out ofTransformedKNeighborsRegressor
as possible, and having a separate fitting method allowed that.
Oof, let's avoid that complexity! Having a separate _set_fitted_transform
or _set_fitted_transformer
method sounds like a preferable solution.
from sknnr.
I guess I have a slight preference for the latter, but either is fine by me.
Agreed! Should we stick with transform_
for the attribute, or do you think we should go with transformer_
?
Oof, let's avoid that complexity!
Sounds like a plan! I already worked most of this up to make sure it was possible, so I'll make a PR shortly.
from sknnr.
Resolved by #52
from sknnr.
Related Issues (20)
- Get all `sklearn` estimator checks passing HOT 1
- Unwanted double transformation using TransformedKNeighborsMixin.predict HOT 8
- Design and implementation of random forest nearest neighbors (RF-NN) HOT 2
- Test accuracy of `predict` methods HOT 5
- Licensing HOT 2
- Simplify fit parameters for `CCATransformer` and `CCorATransformer`? HOT 4
- Support use of reduced number of axes in CCorATransformer, CCATransformer HOT 1
- Warning when fitting transformed estimator with dataframe with no feature names HOT 5
- Possibility of using built-in Mahalanobis distance HOT 4
- Add ruff to pre-commit hooks HOT 9
- Replace existing port tests with automated regression tests HOT 6
- Independent predict and score methods for KNeighborsRegressor derived estimators HOT 10
- Using properties as accessors in CCA/CCorA leads to slow run times HOT 9
- Refactor `CCA` and `CCorA` classes to reduce repeated operations
- Test estimators with `y_fit` data HOT 5
- `list.index` accidentally stored as `dataframe_index_in_` HOT 3
- Change installed name to `sknnr` HOT 9
- Test all Python versions with Hatch 1.8.0 HOT 3
- Expose dataset loading functions as public API HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sknnr.