Comments (6)
I can't recreate the issue in Colab, which is running python 3.10.12.
I got an email with a traceback, but it's not here; some wires crossed in github, or you deleted it, or...?
That sounded like an issue with the parallelization and maybe not enough memory/space for that, but I'm not an expert about it.
from category_encoders.
I also couldn't reproduce it on my local linux machine using category-encoders 2.6.0 and python 3.10 in a fresh conda environment.
As Ben pointed out for the hashing encoder there are differences with windows when it comes to multi-processing. Are you using windows or Linux/Mac
from category_encoders.
Thanks.
I am using Mac air with M2.
from category_encoders.
Here is the error I got:
`
Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 269, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/test/test_dict.py", line 8, in
dd = he.fit_transform(dataset)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/base.py", line 848, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/utils.py", line 315, in fit
X_transformed = self.transform(X, override_return_df=True)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/utils.py", line 488, in transform
X = self._transform(X)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/hashing.py", line 174, in _transform
data_lock = multiprocessing.Manager().Lock()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/managers.py", line 562, in start
self._process.start()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "/test/test_dict.py", line 8, in
dd = he.fit_transform(dataset)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/base.py", line 848, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/utils.py", line 315, in fit
X_transformed = self.transform(X, override_return_df=True)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/utils.py", line 488, in transform
X = self._transform(X)
File "/Users/sss/virtualenvs/functions/lib/python3.10/site-packages/category_encoders/hashing.py", line 174, in _transform
data_lock = multiprocessing.Manager().Lock()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/managers.py", line 566, in start
self._address = reader.recv()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
raise EOFError
EOFError`
from category_encoders.
Thanks. Notice that the first traceback ends with an error from multiprocessing; the EOF is at the end of a second (identical?) traceback.
You might try a newer version of this package: #428 updated the hashing encoder significantly.
The same error shows up in StackOverflow, but I'm not sure how much it helps: https://stackoverflow.com/q/61931669/10495893
from category_encoders.
I could get access to an old macbook (still with an intel chip) but also could not reproduce the issue on that machine (using a fresh conda installation). Can you try version 2.6.3 as Ben suggests and see if that solves the issue?
from category_encoders.
Related Issues (20)
- Equivalent method to sklearn's partial_fit? HOT 1
- CountEncoder incorrectly counts Timestamp columns HOT 3
- Target encoding categories with a single training example HOT 1
- DOC: one of the source links is dead HOT 1
- Missing text in documentation HOT 2
- Support Pandas 2.1 HOT 1
- Feature Request: Count-Based Target Encoder (Dracula)? HOT 1
- Pandas' string columns are not recognized HOT 3
- Pandas copy-on-write doesn't work properly HOT 2
- pd.NA should behave as np.nan HOT 5
- Multidimensional/composite target encoding HOT 4
- FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. HOT 2
- Support for Spark HOT 1
- why we combine this library with main sklearn ? HOT 1
- catboost encoder get different result with catboost HOT 8
- Combining with set_output can produce errors HOT 1
- AttributeError: 'DataFrame' object has no attribute 'unique' HOT 1
- [Question; need help; support request] Possible to join multiple CountEncoders after parallel (multiprocessing) fitting? HOT 1
- FutureWarning in ordinal encoder when downcasting objects HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from category_encoders.