I am trying to convert a random forest model for pkl to pmml, and I get stack overflow

StackOverflowError about jpmml-converter HOT 15 CLOSED

commented on August 15, 2024

StackOverflowError

from jpmml-converter.

Comments (15)

vruusmann commented on August 15, 2024

Very nice - I can reproduce the StackOverflowError using your example files. Will investigate and fix it in the upcoming JPMML-SkLearn version that will be released either later today or tomorrow.

I suspect that Scikit-Learn has changed something about the encoding of random forest models. I've tested with Scikit-Learn versions 0.16.0 through 0.17.1. What's your Scikit-Learn version?

import sklearn
print(sklearn.__version__)

from jpmml-converter.

commented on August 15, 2024

Thank you very much. The version is 0.17.

from jpmml-converter.

vruusmann commented on August 15, 2024

This looks like a legitimate StackOverflowError, because the first member tree model in your random forest model is over 2000 levels deep. That's highly unusual.

How was your sklearn.ensemble.RandomForestRegressor instance parametrized? You should set the value of max_depth parameter to some sensible value such as 100.

from jpmml-converter.

vruusmann commented on August 15, 2024

There's a related issue, where a StackOverflowError happens when converting a random forest model that has been trained using the Iris dataset. It should be impossible to train a 2000-level deep tree model using a dataset that contains only 150 training instances.

jpmml/sklearn2pmml#4

from jpmml-converter.

commented on August 15, 2024

Thank you very much for your prompt response. I have set the max_depth to 100 and still getting the error. My java version is 1.7.0_79.

from jpmml-converter.

commented on August 15, 2024

I have also tested it with Oracle Java 1.8.0_40.

from jpmml-converter.

commented on August 15, 2024

The error however has changed to:

Exception in thread "main" java.lang.StackOverflowError
at sun.misc.FDBigInteger.leftShift(FDBigInteger.java:511)
at sun.misc.FDBigInteger.valueOfMulPow52(FDBigInteger.java:324)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.dtoa(FloatingDecimal.java:714)
at sun.misc.FloatingDecimal$BinaryToASCIIBuffer.access$100(FloatingDecimal.java:259)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1785)
at sun.misc.FloatingDecimal.getBinaryToASCIIConverter(FloatingDecimal.java:1738)
at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:70)
at java.lang.Double.toString(Double.java:204)
at org.jpmml.converter.ValueUtil.formatValue(ValueUtil.java:118)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:81)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:96)

which is the same as
https://github.com/jpmml/sklearn2pmml/issues/4

Which java version should I use?

from jpmml-converter.

vruusmann commented on August 15, 2024

You probably can't solve the issue simply by using a different Java version.

The problem is more fundamental, and appears to be an unpickling error (which is manifested on some Java versions, and not on others) or something like that. As a result, we have a situation where the unpickled Scikit-Learn data contains (invalid-) cross-references, which make the TreeModelUtil#encodeNode jump back and forth between two nodes, until the JVM dies with a StackOverflowError.

from jpmml-converter.

vruusmann commented on August 15, 2024

How were the example pickle files in the Model1.zip file generated? I am unable to unpickle them for closer inspection using either sklearn.externals.joblib or pickle modules:

>>> from sklearn.externals import joblib
>>> forest = joblib.load("pp_model_1_forest.pkl")

Traceback (most recent call last):
  File "load_joblib.py", line 3, in <module>
    forest = joblib.load("pp_model_1_forest.pkl")
  File "/usr/lib/python3.4/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 459, in load
    obj = unpickler.load()
  File "/usr/lib64/python3.4/pickle.py", line 1038, in load
    dispatch[key[0]](self)
  File "/usr/lib64/python3.4/pickle.py", line 1384, in load_reduce
    value = func(*args)
  File "sklearn/tree/_tree.pyx", line 579, in sklearn.tree._tree.Tree.__cinit__ (sklearn/tree/_tree.c:6774)
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int'

and

>>> import pickle
>>> forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))

Traceback (most recent call last):
  File "load_pickle.py", line 3, in <module>
    forest = pickle.load(open("pp_model_1_forest.pkl", "rb"))
_pickle.UnpicklingError: invalid load key, 'Z'.

from jpmml-converter.

commented on August 15, 2024

test.zip
I receive the same error for loading the pickle even for the Iris example provided (see test.zip). I have also put complied jar file. So may be the problem is in the joblib dump of the random forest not in the converter?

def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)

from jpmml-converter.

vruusmann commented on August 15, 2024

The JPMML-SkLearn library should be able to consume the following dumps:

sklearn.externals.joblib
joblib
pickle

Option 1 is recommended by Scikit-Learn documentation (eg. see http://scikit-learn.org/stable/modules/model_persistence.html). However, it may happen that this module is outdated and/or out of sync with other modules.

You could try dumping the RF object manually using options 2 and 3, and use the JPMML-SkLearn command-line application to do the conversion.

from jpmml-converter.

commented on August 15, 2024

I have tested all methods for dumping the .pkl files. Still stackoverflow error even with Iris data. The log file is provided in the attached file.
test.zip

I use Python 2.7 32bit (Anaconda).

This is the code for the model1.zip
from sklearn.externals import joblib
model 1.zip

def store_pkl(obj, name):
joblib.dump(obj,"pkl/" + name, compress = 9)

pp_model_regression = LinearRegression()
pp_model_regression.fit(pp_X, pp_y)

pp_model_forest = RandomForestRegressor(max_depth=100,min_samples_leaf = 5)
pp_model_forest.fit(pp_X, pp_y)

store_pkl(pp_mapper, "pp_mapper_1.pkl")
store_pkl(pp_model_regression, "pp_model_1_regression.pkl")
store_pkl(pp_model_forest, "pp_model_1_forest.pkl")

you should be able to load them with joblib. Can you please try again? I tried different java versions as well. So I am really confused.

from jpmml-converter.

vruusmann commented on August 15, 2024

I use Python 2.7 32bit (Anaconda)

This could be a 32-bit vs. 64-bit compatibility issue.

I'm running a 64-bit OS, and the JPMML-SkLearn project has been tested against 64-bit versions of Python2(.7) and Python3(.4).

My unpickling error message (ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'int') fits perfectly into this picture, as for me SIZE_t is long, not int.

from jpmml-converter.

commented on August 15, 2024

Fixed! Thank you very much for all your help. The problem was the compatibility of python 32 and java 64.

from jpmml-converter.

vruusmann commented on August 15, 2024

Closing this issue in favour of the following one: jpmml/jpmml-sklearn#6

from jpmml-converter.

StackOverflowError about jpmml-converter HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent