jpmml / jpmml-converter Goto Github PK
View Code? Open in Web Editor NEWJava library for authoring PMML
License: GNU Affero General Public License v3.0
Java library for authoring PMML
License: GNU Affero General Public License v3.0
Several transformers expect that the incoming feature is backed by a DataField
element. This causes problems with more functional workflows, where the DataField
element has been "replaced" with a DerivedField
element.
See: https://groups.google.com/d/msg/jpmml/ellpOHvWyrk/7kskrINNAQAJ
Method PMMLUtil#createConstant(Object)
must be replaced with method PMMLUtil#createConstant(Object, DataType)
.
The standard pmml
package can convert ksvm
objects that have been trained using the kernlab::ksvm
function. Unfortunately, the converter implementation is rather limited, because it fails to handle ksvm
objects that have been trained using alternative means.
For example, it is impossible to convert a ksvm
object that was trained using the caret
package:
library("caret")
library("kernlab")
library("pmml")
iris.ksvm = ksvm(Species ~ ., data = iris)
class(iris.ksvm)
ksvm.pmml = pmml(iris.ksvm, dataset = iris)
iris_x = iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]
iris_y = iris[, c("Species")]
iris.train = train(x = iris_x, y = iris_y, data = iris, method = "svmRadial")
class(iris.train$finalModel)
# Error in if (field$class[[1]][1] == "numeric") { :
# argument is of length zero
train.pmml = pmml(iris.train$finalModel, dataset = iris)
I am trying to convert a random forest model for pkl to pmml, and I get stack overflow error. I can covert the regression version of the same model without any problem. Attached is the pkl files for regression and random forest and the mapper.
Exception in thread "main" java.lang.StackOverflowError
at java.lang.StrictMath.floorOrCeil(StrictMath.java:355)
at java.lang.StrictMath.floor(StrictMath.java:340)
at java.lang.Math.floor(Math.java:424)
at sun.misc.FloatingDecimal.dtoa(FloatingDecimal.java:629)
at sun.misc.FloatingDecimal.(FloatingDecimal.java:468)
at java.lang.Double.toString(Double.java:196)
at org.jpmml.converter.PMMLUtil.formatValue(PMMLUtil.java:387)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:82)
at sklearn.tree.TreeModelUtil.encodeNode(TreeModelUtil.java:97)
Based on jpmml/jpmml-sklearn#86
At the moment it's impossible to generate transformer-only pipelines, because the ModelEncoder#encodePMML(Model)
method applies a set of visitors that clean the soon-to-be-generated PMML document from all unused preprocessing instructions:
https://github.com/jpmml/jpmml-converter/blob/master/src/main/java/org/jpmml/converter/ModelEncoder.java#L53-L56
Possible solution: class ModelEncoder
should provide a "transformer-only" conversion mode.
Hi, I want to use sklearn2pmml() function to convert a PMML file.
I created an issuse below, but I was not able to reopen it so I create this new issue and just copy the content again here.
jpmml/jpmml-sklearn#160
Here is my code to create a pipeline. But I saw an error
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
How can I solve it? My version is 0.73.1
The standout is
Standard output is empty
Standard error:
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 219 ms.
Jul 01, 2021 8:33:28 PM org.jpmml.sklearn.Main run
INFO: Converting PKL to PMML..
Jul 01, 2021 8:33:30 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException
at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
at sklearn.Estimator.encode(Estimator.java:83)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
at org.jpmml.sklearn.Main.run(Main.java:226)
at org.jpmml.sklearn.Main.main(Main.java:143)
Exception in thread "main" java.lang.IllegalArgumentException
at org.jpmml.converter.visitors.AbstractTreeModelTransformer.initScore(AbstractTreeModelTransformer.java:173)
at org.jpmml.converter.visitors.TreeModelPruner.exitNode(TreeModelPruner.java:81)
at org.jpmml.converter.visitors.AbstractTreeModelTransformer.popParent(AbstractTreeModelTransformer.java:61)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:120)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.tree.SimpleNode.accept(SimpleNode.java:113)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.tree.TreeModel.accept(TreeModel.java:401)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
at org.jpmml.xgboost.Learner.encodeMiningModel(Learner.java:354)
at xgboost.sklearn.BoosterUtil.encodeBooster(BoosterUtil.java:63)
at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:45)
at xgboost.sklearn.XGBClassifier.encodeModel(XGBClassifier.java:27)
at sklearn.Estimator.encode(Estimator.java:83)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:235)
at org.jpmml.sklearn.Main.run(Main.java:226)
at org.jpmml.sklearn.Main.main(Main.java:143)
Hello,
I've been actively using the PySpark2PMML package to write RF spark models into PMML documents, and was just noticing that sometimes I get scientific notation in the output:
< ScoreDistribution value="0" recordCount="2.3252954E7" />
Is there a way to control whether or not scientific notation is used in the output? I'd prefer that it isn't used, as my C++ parser isn't written to accept it. Thanks!
Patrick Hofmann
Under PMML - General Regression, CoxRegression is listed as a model type. Would it be easy to add support for this modeling framework?:
test <- survival::coxph(survival::Surv(futime,fustat) ~ age + rx + ecog.ps, survival::ovarian, x=TRUE)
print(test)
library(r2pmml)
r2pmml(test, "test.pmml")
With the update to PMML schema version 4.4, it's time to shake things up some more!
In Scikit-Learn:
sklearn2pmml(pipeline, "MyPipeline.pmml.json", format = "json")
In R:
r2pmml(model, "MyModel.pmml.yaml", format = "yaml")
Hi,
I am trying to generate a PMML for Isolation Forest Using sklearn2pmml. While generating a PMML file, variable thresholds are getting changed in PMML file.
It shows correct result when we print Tree using Python pickle file but in Actual PMML variable values are changed.
In Python Tree:
if AA.SUB (#1) <= 526225.988222:
return [[ 0.36039911]]
else: # if AA.SUB (#1) > 526225.988222
if BB.SUB (#1) <= 5192.53104377:
return [[ 0.41983035]]
else: # if BB.SUB (#1) > 5192.53104377
return [[ 0.88258597]]
In PMML Tree:
<Node id="712">
<SimplePredicate field="AA.SUB" operator="greaterThan" value="526226"/>
<Node id="713" score="17.018322573802887">
<SimplePredicate field="BB.SUB" operator="lessOrEqual" value="5192.5312"/>
</Node>
</Node>
In above case 526225.988222 getting changed in to 526226 and 5192.53104377 into 5192.5312
I have analyzed the source code of jpmml-converter and found that the way converting Float values to Double is wrong.( in ValueUtil.java ).
Can you please analyze and see if this is the issue. If yes, can you please suggest any workaround/solution for this issue.
Hi, when I use jpmml-converter and jpmml-sparkml,the following bugs arise:
(1)jpmml-converter-1.3.9 : Caused by: java.lang.ClassNotFoundException: org.jpmml.converter.HasNativeConfiguration
(2)jpmml-converter-1.4.6: Caused by: java.lang.IllegalArgumentException: Expected Apache Spark ML version 2.4, got version 2.3 (2.3.1)
So, is there the relationship between jpmml-converter and spark?
Environment:spark-2.3.1
The pmml files generated by sklearn2pmml appear to have a version discrepancy in the header. Here's a header of a pmml file that I just generated
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
<Header>
<Application name="JPMML-SkLearn" version="1.6.3"/>
<Timestamp>2020-07-27T18:08:57Z</Timestamp>
Note the discrepancy between PMML-4_4 and version 4.3. Compare to the header of an older pmml file we generated a month ago:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
<Header>
<Application name="JPMML-SkLearn" version="1.5.35"/>
<Timestamp>2020-06-15T20:11:06Z</Timestamp>
I don't know what versions we were using last month, but I'm currently using sklearn2pmml version 0.60.0. The java version is:
openjdk version "1.8.0_152-release"
OpenJDK Runtime Environment (build 1.8.0_152-release-1056-b12)
OpenJDK 64-Bit Server VM (build 25.152-b12, mixed mode)
The problem is easy to fix for our purposes, but I thought I'd inform you of the bug.
It should be possible to represent ets
objects using the TimeSeriesModel
element:
library("fpp")
library("forecast")
livestock.forecast = forecast(livestock)
print(livestock.forecast)
print(livestock.forecast$model)
plot(livestock.forecast)
Hi @vruusmann, through the code, I see the prediction name and probability name of output result is hard coded. Is it possible to provide a parameter to control the Separator of output result? For example, change probability(1) to probability-1
Below is the existing naming rules, and separator is brackets:
static
public OutputField createProbabilityField(DataType dataType, String value){
return createProbabilityField(FieldName.create("probability(" + value + ")"), dataType, value);
}
Thanks a lot.
Hi guys,
I wonder if you guys have in the roadmap the conversion of tensorflow models into pmml?
Thank you,
Eliano
The JPMML-Converter library is currently stuck in the " file in -> PMML file out" mindset. End users keep asking for more output options, such as URL or plain streaming:
https://stackoverflow.com/questions/74656521/how-to-save-xgboost-lightgbm-model-to-postgresql-database-in-python-for-subseque/74659868#comment131843709_74659868
Conceptually, JPMML converter command-line applications could support UNIX-style piping, which may make integration with non-Java application environments easier:
$ java -jar pmml-converter-executable-${version} < input.file > output.file
Hello,
I trained a PMMLPipeline with OneHotEncoder and XGBClassifier using the following code snippet.
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import OneHotEncoder
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from xgboost.sklearn import XGBClassifier
mapper = DataFrameMapper(
[(col, None) for col in numerical_cols] +
[([col], OneHotEncoder(handle_unknown='ignore')) for col in categorical_cols]
)
pipeline = PMMLPipeline(
steps=[
('mapper', mapper),
('classifier', XGBClassifier())
]
)
pipeline.fit(X, y)
The pipeline seemed to work and I was able to use it to do predictions.
But I got an error when I tried to turn the pipeline into a pmml file
sklearn2pmml(pipeline, "testing.pmml", with_repr=True)
Standard error:
Exception in thread "main" org.jpmml.model.MissingAttributeException: Required attribute Value@value is not defined
at org.dmg.pmml.Value.requireValue(Value.java:67)
at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:139)
at org.jpmml.converter.PMMLUtil.getValues(PMMLUtil.java:124)
at org.jpmml.converter.CategoricalFeature.<init>(CategoricalFeature.java:35)
at org.jpmml.converter.WildcardFeature.toCategoricalFeature(WildcardFeature.java:61)
at sklearn.preprocessing.MultiOneHotEncoder.encodeFeatures(MultiOneHotEncoder.java:118)
at sklearn.Transformer.encode(Transformer.java:69)
at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:67)
at sklearn.Transformer.encode(Transformer.java:69)
at sklearn.Composite.encodeFeatures(Composite.java:119)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:212)
at com.sklearn2pmml.Main.run(Main.java:84)
at com.sklearn2pmml.Main.main(Main.java:62)
Can someone give me some advice on what I might have done wrong? Thanks.
Translating from one target value space to another (eg. from integer indices to string labels), or reversing the order of class labels (for binary classification problems) as outlined in jpmml/r2pmml#46 (comment).
Its much easier to generate PMML code based on a transformed label, than to try to "rewrite" an existing PMML document to achieve similar effect.
Not sure this is the right place to ask this question.
When i try to export a SVC model (a pipeline) by sklearn2Pmml, i always get a pmml with classificationMethod="OneAgainstOne", though i explicitly specifying decision_function_shape as "ovr" in python. The pipeline is defined as following,
# create pipeline
model_pipeline = PMMLPipeline([
("mapper", DataFrameMapper([
(feat_names, [ContinuousDomain(with_data=False)])
])),
("SVC", SVC(probability=True, random_state=2018, decision_function_shape="ovr"))
])
After i checked the source code in converter/support_vector_machine/LibSVMUtil.java:116, i found the SupportVectorMachineModel.ClassificationMethod is initialized as ONE_AGAINST_ONE and without any reseting by the decision_function shape set in python.
Please correct me is anything i missed.
Many thanks for your help. You made a great project!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.