Comments (8)
Oops, the getting started docs was not up to date! The scorer should be a simple function, like
def scorer(y_true, y_pred):
# do some arthimetic
return scalar
So if you define the scorer (f1
) as
def f1(y_true, y_pred):
return f1_score(y_true, y_pred, average='micro')
You should be fine (works on my machine). I'll update the documentation when I get a chance!
Let me know if it works and I'll close the issue.
from mlens.
Thanks for the quick response! Maybe I'm missing a crucial point but this throws up an attribute error, just including the key lines:
`
def f1(y_true, y_pred):
return f1_score(y_true, y_pred, average='micro')
ensemble = SuperLearner(scorer=f1, random_state=seed)
...
ensemble.fit(X[:75], y[:75])
`
Process PoolWorker-6:
Traceback (most recent call last):
File "C:\Anaconda2\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda2\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Anaconda2\lib\multiprocessing\pool.py", line 102, in worker
task = get()
File "C:\Anaconda2\lib\site-packages\mlens\externals\joblib\pool.py", line 362, in get
return recv()
AttributeError: 'module' object has no attribute 'f1'
from mlens.
Hm that makes no sense...
The following works fine for me:
import numpy as np
from pandas import DataFrame
from sklearn.metrics import f1_score
from sklearn.datasets import load_iris
from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
seed = 2017
np.random.seed(seed)
def f1(y, p): return f1_score(y, p, average='micro')
data = load_iris()
idx = np.random.permutation(150)
X = data.data[idx]
y = data.target[idx]
ensemble = SuperLearner(scorer=f1, random_state=seed)
ensemble.add([RandomForestClassifier(random_state=seed), SVC()])
ensemble.add_meta(LogisticRegression())
ensemble.fit(X[:75], y[:75])
Using a scoring function is unit-tested on Mac, Linux and Windows (in Python 3), so if you still have an issue the problem is more fundamental. If the above code doesn't work for you, could you send over a minimum code example that throws error?
It looks like you are running on Windows, which Python version are you using?
from mlens.
You're correct, I'm on Windows and the above output was from Python 2.7. I just installed Python 3.6 and reran the script, it's giving out a very similar error:
Process SpawnPoolWorker-9:
Traceback (most recent call last):
File "C:\Anaconda2\envs\py36\lib\multiprocessing\process.py", line 249, in _bootstrap
self.run()
File "C:\Anaconda2\envs\py36\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Anaconda2\envs\py36\lib\multiprocessing\pool.py", line 108, in worker
task = get()
File "C:\Anaconda2\envs\py36\lib\site-packages\mlens\externals\joblib\pool.py", line 362, in get
return recv()
File "C:\Anaconda2\envs\py36\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'f1' on <module 'main' (built-in)>
Narrowing this down, setting n_jobs =1 does not throw the error! So this seems to be a problem with the multi threading, see Mlensdebug.zip below for notebook. Looks like this might be relevant: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
I'll try and dig further and see if I can engineer a fix.
from mlens.
@Mikesev thanks for debugging! I'm not getting this error on OSX, so I suspect this is Windows specific. Could I ask you to try two things?
-
set
backend='threading'
in the ensemble. Whenn_jobs!=1
with multiprocessing, the parent process is copied to sub-processes. By using threading instead, all computations are handled by the parent process so the scoring function should be accessible. -
If this doesn't work, could you try installing the developer version (to be version 0.1.4), there are some bug-fixes there that would be good to rule out? To do this, run:
git clone https://flennerhag/mlens.git; cd mlens;
python install setup.py
Thanks!
from mlens.
No worries! Yep, I will do so.
It definitely seems to be related to the Pool issue that's raised on the SO post and elaborated on here: https://bugs.python.org/issue25053.
I just imported the f1 function from a separate file and it worked straight away with n_jobs = -1.
from mlens.
- set
backend='threading'
in the ensemble
This works!
from mlens.
Great! Threading will be default from 0.1.4 so I will close this issue. Thanks for the debugging : )
from mlens.
Related Issues (20)
- Serialize mlens superlearner with KerasRegressor inside HOT 1
- mlen superlearner for MIMO multi-input multi-output HOT 2
- Error when using sklearn StratifiedKFold in Evaluator CV HOT 1
- getting zero score accuracy on test data
- If I already have trained models, how can I use mlens HOT 3
- confirmation
- Save / Restore model HOT 5
- How do I know the weight of the base model assigned by the meta model?
- Adding custom models in the superlearner
- Apply preprocessing to target variable as well
- Monotonic constraints
- Error when using preprocessing per case in model selection HOT 2
- Error involving Collections Module
- Getting error when executing the ensemble.fit(X_train, y_train) command HOT 1
- Prediction failing with 1 row of test data
- why the predict_proba() function do not return the probabilities?
- Error while running ensemble.fit(X_train, y_train)
- Error in index/base.py when using NumPy 1.24 or higher - Replace `np.int` with `np.int_`
- Superlearnerl on google colab (python 3.10 or 3.7) HOT 3
- How to predict class labels in X and their probability estimates?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlens.