Comments (15)
Very nice - I can reproduce the StackOverflowError using your example files. Will investigate and fix it in the upcoming JPMML-SkLearn version that will be released either later today or tomorrow.
I suspect that Scikit-Learn has changed something about the encoding of random forest models. I've tested with Scikit-Learn versions 0.16.0
through 0.17.1
. What's your Scikit-Learn version?
import sklearn
print(sklearn.__version__)
from jpmml-converter.
Thank you very much. The version is 0.17.
from jpmml-converter.
This looks like a legitimate StackOverflowError, because the first member tree model in your random forest model is over 2000 levels deep. That's highly unusual.
How was your sklearn.ensemble.RandomForestRegressor
instance parametrized? You should set the value of max_depth
parameter to some sensible value such as 100
.
from jpmml-converter.
There's a related issue, where a StackOverflowError happens when converting a random forest model that has been trained using the Iris dataset. It should be impossible to train a 2000-level deep tree model using a dataset that contains only 150 training instances.
from jpmml-converter.
Thank you very much for your prompt response. I have set the max_depth to 100 and still getting the error. My java version is 1.7.0_79.
from jpmml-converter.
I have also tested it with Oracle Java 1.8.0_40.
from jpmml-converter.
The error however has changed to:
Exception in thread "main" java.lang.StackOverflowError
at sun.misc.FDBigInteger.leftShift(FDBigInteger.java:511)
at sun.misc.FDBigInteger.valueOfMulPow52(FDBigInteger.java:324)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(FloatingDecimal.java:714)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.access$100(FloatingDecimal.java:259)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1785)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1738)
at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:70)
at java.lang.Double.toString(Double.java:204)
at org.jpmml.converter.ValueUtil.formatValue(ValueUtil.java:118)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:81)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
which is the same as
https://github.com/jpmml/sklearn2pmml/issues/4
Which java version should I use?
from jpmml-converter.
You probably can't solve the issue simply by using a different Java version.
The problem is more fundamental, and appears to be an unpickling error (which is manifested on some Java versions, and not on others) or something like that. As a result, we have a situation where the unpickled Scikit-Learn data contains (invalid-) cross-references, which make the TreeModelUtil#encodeNode
jump back and forth between two nodes, until the JVM dies with a StackOverflowError.
from jpmml-converter.
How were the example pickle files in the Model1.zip file generated? I am unable to unpickle them for closer inspection using either sklearn.externals.joblib
or pickle
modules:
>>> from sklearn.externals import joblib
>>> forest = joblib.load("pp_model_1_forest.pkl")
Traceback (most recent call last):
File "load_joblib.py", line 3, in <module>
forest = joblib.load("pp_model_1_forest.pkl")
File "/usr/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 459, in load
obj = unpickler.load()
File "/usr/lib64/python3.4/pickle.py", line 1038, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.4/pickle.py", line 1384, in load_reduce
value = func(*args)
File "sklearn/tree/_tree.pyx", line 579, in sklearn.tree._tree.Tree.__cinit__ (sklearn/tree/_tree.c:6774)
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int'
and
>>> import pickle
>>> forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))
Traceback (most recent call last):
File "load_pickle.py", line 3, in <module>
forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))
_pickle.UnpicklingError: invalid load key, 'Z'.
from jpmml-converter.
test.zip
I receive the same error for loading the pickle even for the Iris example provided (see test.zip). I have also put complied jar file. So may be the problem is in the joblib dump of the random forest not in the converter?
def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)
from jpmml-converter.
The JPMML-SkLearn library should be able to consume the following dumps:
sklearn.externals.joblib
joblib
pickle
Option 1 is recommended by Scikit-Learn documentation (eg. see http://scikit-learn.org/stable/modules/model_persistence.html). However, it may happen that this module is outdated and/or out of sync with other modules.
You could try dumping the RF object manually using options 2 and 3, and use the JPMML-SkLearn command-line application to do the conversion.
from jpmml-converter.
I have tested all methods for dumping the .pkl files. Still stackoverflow error even with Iris data. The log file is provided in the attached file.
test.zip
I use Python 2.7 32bit (Anaconda).
This is the code for the model1.zip
from sklearn.externals import joblib
model 1.zip
def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)
pp_model_regression = LinearRegression()
pp_model_regression.fit(pp_X, pp_y)
pp_model_forest = RandomForestRegressor(max_depth=100,min_samples_leaf = 5)
pp_model_forest.fit(pp_X, pp_y)
store_pkl(pp_mapper, "pp_mapper_1.pkl")
store_pkl(pp_model_regression, "pp_model_1_regression.pkl")
store_pkl(pp_model_forest, "pp_model_1_forest.pkl")
you should be able to load them with joblib. Can you please try again? I tried different java versions as well. So I am really confused.
from jpmml-converter.
I use Python 2.7 32bit (Anaconda)
This could be a 32-bit vs. 64-bit compatibility issue.
I'm running a 64-bit OS, and the JPMML-SkLearn project has been tested against 64-bit versions of Python2(.7) and Python3(.4).
My unpickling error message (ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int'
) fits perfectly into this picture, as for me SIZE_t
is long
, not int
.
from jpmml-converter.
Fixed! Thank you very much for all your help. The problem was the compatibility of python 32 and java 64.
from jpmml-converter.
Closing this issue in favour of the following one: jpmml/jpmml-sklearn#6
from jpmml-converter.
Related Issues (20)
- Can you provide a parameter to control the Separator of output result HOT 3
- Support for transformer-only pipelines HOT 3
- SVM's classificationMethod is always "OneAgainstOne" HOT 3
- Ability to show/hide default attribute values
- Option to choose the carrier data format (XML vs. JSON vs. YAML) in all end user-facing converter tools HOT 2
- Support for transforming labels
- Reusable visitor for (re-)generating score distributions from leaf elements
- PMML version in xmlns tag does not match version tag HOT 1
- Controlling scientific notation in PMML document HOT 2
- Support for `forecast::ets` models
- Version of jpmml-converter correspond to version of spark HOT 1
- Failing to prune XGBoost tree models HOT 3
- Error on a pipeline with OneHotEncoder and xgboost HOT 2
- Streaming conversion mode
- Request Support for 'survival::coxph' models HOT 1
- support tensorflow HOT 14
- PMML conversion : Casting float to decimal causing loss of Precision. HOT 7
- Support for more complex workflows HOT 1
- Constant elements should require a data type hint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-converter.