Code Monkey home page Code Monkey logo

Comments (8)

flennerhag avatar flennerhag commented on June 15, 2024

Oops, the getting started docs was not up to date! The scorer should be a simple function, like

def scorer(y_true, y_pred):
    # do some arthimetic
    return scalar

So if you define the scorer (f1) as

def f1(y_true, y_pred):
    return f1_score(y_true, y_pred, average='micro')

You should be fine (works on my machine). I'll update the documentation when I get a chance!
Let me know if it works and I'll close the issue.

from mlens.

Mikesev avatar Mikesev commented on June 15, 2024

Thanks for the quick response! Maybe I'm missing a crucial point but this throws up an attribute error, just including the key lines:

`
def f1(y_true, y_pred):
return f1_score(y_true, y_pred, average='micro')

ensemble = SuperLearner(scorer=f1, random_state=seed)
...
ensemble.fit(X[:75], y[:75])
`

Process PoolWorker-6:
Traceback (most recent call last):
File "C:\Anaconda2\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Anaconda2\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "C:\Anaconda2\lib\multiprocessing\pool.py", line 102, in worker
task = get()
File "C:\Anaconda2\lib\site-packages\mlens\externals\joblib\pool.py", line 362, in get
return recv()
AttributeError: 'module' object has no attribute 'f1'

from mlens.

flennerhag avatar flennerhag commented on June 15, 2024

Hm that makes no sense...

The following works fine for me:

import numpy as np
from pandas import DataFrame
from sklearn.metrics import f1_score
from sklearn.datasets import load_iris
from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

seed = 2017
np.random.seed(seed)

def f1(y, p): return f1_score(y, p, average='micro')

data = load_iris()
idx = np.random.permutation(150)
X = data.data[idx]
y = data.target[idx]

ensemble = SuperLearner(scorer=f1, random_state=seed)
ensemble.add([RandomForestClassifier(random_state=seed), SVC()])
ensemble.add_meta(LogisticRegression())
ensemble.fit(X[:75], y[:75])

Using a scoring function is unit-tested on Mac, Linux and Windows (in Python 3), so if you still have an issue the problem is more fundamental. If the above code doesn't work for you, could you send over a minimum code example that throws error?

It looks like you are running on Windows, which Python version are you using?

from mlens.

Mikesev avatar Mikesev commented on June 15, 2024

You're correct, I'm on Windows and the above output was from Python 2.7. I just installed Python 3.6 and reran the script, it's giving out a very similar error:

Process SpawnPoolWorker-9:
Traceback (most recent call last):
File "C:\Anaconda2\envs\py36\lib\multiprocessing\process.py", line 249, in _bootstrap
self.run()
File "C:\Anaconda2\envs\py36\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Anaconda2\envs\py36\lib\multiprocessing\pool.py", line 108, in worker
task = get()
File "C:\Anaconda2\envs\py36\lib\site-packages\mlens\externals\joblib\pool.py", line 362, in get
return recv()
File "C:\Anaconda2\envs\py36\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can't get attribute 'f1' on <module 'main' (built-in)>

Narrowing this down, setting n_jobs =1 does not throw the error! So this seems to be a problem with the multi threading, see Mlensdebug.zip below for notebook. Looks like this might be relevant: https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

I'll try and dig further and see if I can engineer a fix.

MLens debug.zip

from mlens.

flennerhag avatar flennerhag commented on June 15, 2024

@Mikesev thanks for debugging! I'm not getting this error on OSX, so I suspect this is Windows specific. Could I ask you to try two things?

  1. set backend='threading' in the ensemble. When n_jobs!=1 with multiprocessing, the parent process is copied to sub-processes. By using threading instead, all computations are handled by the parent process so the scoring function should be accessible.

  2. If this doesn't work, could you try installing the developer version (to be version 0.1.4), there are some bug-fixes there that would be good to rule out? To do this, run:

git clone https://flennerhag/mlens.git; cd mlens;
python install setup.py

Thanks!

from mlens.

Mikesev avatar Mikesev commented on June 15, 2024

No worries! Yep, I will do so.

It definitely seems to be related to the Pool issue that's raised on the SO post and elaborated on here: https://bugs.python.org/issue25053.

I just imported the f1 function from a separate file and it worked straight away with n_jobs = -1.

from mlens.

Mikesev avatar Mikesev commented on June 15, 2024
  1. set backend='threading' in the ensemble

This works!

from mlens.

flennerhag avatar flennerhag commented on June 15, 2024

Great! Threading will be default from 0.1.4 so I will close this issue. Thanks for the debugging : )

from mlens.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.