- Twitter: @rushter
- Blog: https://rushter.com/blog/
rushter / heamy Goto Github PK
View Code? Open in Web Editor NEWA set of useful tools for competitive data science.
Home Page: http://heamy.readthedocs.io/en/latest/
License: MIT License
A set of useful tools for competitive data science.
Home Page: http://heamy.readthedocs.io/en/latest/
License: MIT License
It is advised to use different feature sub-sets across the models for diversity.
Is it possible using heamy?
Heamy does not seem to support sparse matrices at the moment.
When I create a dataset where X_train
and X_test
are scipy sparse matrices, I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-37-cc350d1da8a6> in <module>()
1 pipeline = ModelsPipeline(*classifiers)
----> 2 pipeline.stack()
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/pipeline.pyc in stack(self, k, stratify, shuffle, seed, full_test, add_diff)
131
132 for model in self.models:
--> 133 result = model.stack(k=k, stratify=stratify, shuffle=shuffle, seed=seed, full_test=full_test)
134 train_df = pd.DataFrame(result.X_train, columns=generate_columns(result.X_train, model.name))
135 test_df = pd.DataFrame(result.X_test, columns=generate_columns(result.X_test, model.name))
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/estimator.pyc in stack(self, k, stratify, shuffle, seed, full_test)
245 if self.use_cache:
246 pdict = {'k': k, 'stratify': stratify, 'shuffle': shuffle, 'seed': seed, 'full_test': full_test}
--> 247 dhash = self._dhash(pdict)
248 c = Cache(dhash, prefix='s')
249 if c.available:
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/estimator.pyc in _dhash(self, params)
132 """Get hash of the dictionary object."""
133 m = hashlib.new('md5')
--> 134 m.update(self.hash.encode('utf-8'))
135 for key in sorted(params.keys()):
136 h_string = ('%s-%s' % (key, params[key])).encode('utf-8')
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/estimator.pyc in hash(self)
78 m.update(h_string)
79 m.update(self.estimator_name.encode('utf-8'))
---> 80 m.update(self.dataset.hash.encode('utf-8'))
81
82 if not self._is_class:
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/dataset.pyc in hash(self)
235 m = hashlib.new('md5')
236 if self._preprocessor is None:
--> 237 m.update(numpy_buffer(self._X_train))
238 m.update(numpy_buffer(self._y_train))
239 if self._X_test is not None:
/home/agrigorev/anaconda2/lib/python2.7/site-packages/heamy/cache.pyc in numpy_buffer(ndarray)
55 ndarray = ndarray.values
56
---> 57 if ndarray.flags.c_contiguous:
58 obj_c_contiguous = ndarray
59 elif ndarray.flags.f_contiguous:
/home/agrigorev/anaconda2/lib/python2.7/site-packages/scipy/sparse/base.pyc in __getattr__(self, attr)
523 return self.getnnz()
524 else:
--> 525 raise AttributeError(attr + " not found")
526
527 def transpose(self):
AttributeError: flags not found
The matrices are obtained via DictVectorizer
from sklearn
As a temporary solution, I use X.toarray()
@rushter , thanks for your library!
Can you please add references (articles, etc.) describes which (exactly) algorithms heamy realizes?
Thanks.
Hi,
does this really work to one hot encode features to always the same labels? I think of deploying my model as an api where possibly new labels will show up.
train[column] = train[column].astype('category', categories=categories)
test[column] = test[column].astype('category', categories=categories)
# from: https://github.com/rushter/heamy/blob/master/heamy/feature.py
How it possible to set custom path for caching objects?
I tried the "heamy" module as shown in this example and it works as expected.
https://github.com/rushter/heamy/blob/master/examples/walkthrough.ipynb
I expected the line 24 to fail as Sequential class is not imported anywhere in the script. mlp_model function should not complete without error, I guess. What am I missing?
Hi,
First thanks for your work, I'm pretty existing to test it and play with it !
I search on documentation and examples but can't find it. I would like to use my stacking process and predict the result (like .predict() in Scikit)
I made a notebook to illustrate my problem.
I'm pretty sure I miss something...
A.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.