Comments (8)
@vruusmann Sorry for the off-topic i will delete the question but now i run into another issue when i try to buildFile from the pmmlBuilder object it says format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o57101.buildFile.
: java.lang.IllegalArgumentException: Expected 3 target categories, got 2 target category, raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'Expected 3 target categories, got 2 target categories'. I cannot understand why do you have a clue ?
from jpmml-sparkml.
The JPMML-SparkML library assumes that the label column of classification models is a "native" categorical label (in PMML, corresponds to a DataDictionary/DataField
element), not a "transformed" categorical label (corresponds to a TransformationDictionary/DerivedField
element).
I've been taking it granted, and forgot to actually implement this "native" vs "transformed" check around ModelConverter.java:82
.
It's possible to make your example work, by applying the Binarize
transformation to the dataset outside of the pipeline, and then treating its output column "DepDelay_Bin" as a "native" categorical label:
binarizer = Binarizer(threshold=15.0, inputCol="DepDelay_Double", outputCol="DepDelay_Bin")
data2007 = binarizer.transform(data2007) # THIS!
stringIndexer = StringIndexer(inputCol="DepDelay_Bin", outputCol="DepDelay_Bin_Label") # THIS!
featuresAssembler = VectorAssembler(inputCols=["Month", "CRSDepTime", "Distance"], outputCol="features")
rfc3 = RandomForestClassifier(labelCol="DepDelay_Bin_Label", featuresCol="features", numTrees=3, maxDepth=5, seed=10305)
pipelineRF3 = Pipeline(stages=[stringIndexer, featuresAssembler, rfc3]) # THIS: start the pipeline with StringIndexer not Binarizer
model3 = pipelineRF3.fit(data2007)
from jpmml_sparkml import toPMMLBytes
pmmlBytes = toPMMLBytes(sc, data2007, model3)
print(pmmlBytes.decode("UTF-8"))
from jpmml-sparkml.
Technically, it shouldn't be much work to make JPMML-SparkML work with "transformed" labels, so keeping this issue open to track progress towards this functionality.
from jpmml-sparkml.
Looks like it can be closed for current version:
Binarizer binarizer = new Binarizer()
.setInputCol("Sepal_Length")
.setOutputCol("Sepal_Length_Binar_")
.setThreshold(5.0)
;
StringIndexer labelIndexer = new StringIndexer()
.setInputCol("Species")
.setOutputCol("Species_Bin");
VectorAssembler vectorAssembler = new VectorAssembler()
.setInputCols(new String[]{
"Sepal_Length_Binar_",
"Sepal_Width",
"Petal_Length",
"Petal_Width"})
.setOutputCol("features");
RandomForestClassifier classifier = new RandomForestClassifier()
.setLabelCol("Species_Bin");
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[]{binarizer, labelIndexer, vectorAssembler, classifier});
PipelineModel model = pipeline.fit(dataset);
PMMLBuilder builder = new PMMLBuilder(schema, model);
final PMML build = builder.build();
JAXBUtil.marshalPMML(build, new StreamResult(System.out));
from jpmml-sparkml.
Looks like it can be closed for current version
Nope, I'd like to be able to use Sepal_Length_Binar_
as the label column here.
from jpmml-sparkml.
Can someone help me with this error: AttributeError: 'Pipeline' object has no attribute '_transfer_param_map_to_java' error. I get it when i try to execute the PMMLBuilder()
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="features")
evaluator = MulticlassClassificationEvaluator(labelCol='indexedLabel', predictionCol='prediction', metricName='f1')
paramGrid = (ParamGridBuilder()
.addGrid(dt.maxDepth, [1, 2, 6])
.addGrid(dt.maxBins, [570, 570])
.build())
stages += [dt]
pipeline = Pipeline(stages=stages)
cv = CrossValidator(estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=3)
cvModel = cv.fit(dataSet)
train_dataset = cvModel.transform(dataSet)
train_dataset.show()
print(evaluator.evaluate(train_dataset))
pmmlBuilder = PMMLBuilder(spark, dataSet, cvModel) \
.putOption(dt, "compact", True)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")
I cannot find any fix to this what I am doing wrong ?
from jpmml-sparkml.
AttributeError: 'Pipeline' object has no attribute '_transfer_param_map_to_java' error
This is clearly a low-level PySpark error, which has got nothing to do with PySpark2PMML or JPMML-SparkML.
Maybe your PySpark and Apache Spark versions are out of sync.
from jpmml-sparkml.
@vruusmann Thank you. My PySpark and Apache versions are up to date. The problem was you must pass the pipeline's bestmodel in my case cvModel.bestModel do the work.
from jpmml-sparkml.
Related Issues (20)
- MultilayerPerceptronClassificationModel IllegalArgumentException("Expected 3 target categories, got 2 target categories"); HOT 1
- How to import the training data schema in libsvm format HOT 15
- Wrong code path for multinomial logistic regression model HOT 1
- Probability column not being found when using it in a stacked model HOT 6
- StringIndexerModelConverter gives java.lang.IllegalArgumentException HOT 4
- java.lang.ClassNotFoundException: org.jpmml.converter.BaseNFeature HOT 5
- Support for custom Java-backed models (eg. factorization machine) HOT 1
- Why One-Hot-Encoding is not visible in PMML? HOT 1
- py4j.protocol.Py4JError: org.jpmml.sparkml.PMMLBuilder does not exist in the JVM HOT 1
- Error with LightGBMClassificationModel HOT 5
- Support for `XGBoostRegressor.missing` property HOT 6
- Troubleshooting XGBoost model performance HOT 17
- Support for Apache Spark 3.3.X HOT 2
- 2.x jars missing from Maven Central HOT 3
- Support for `replace` SQL function HOT 6
- Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/azure/synapse/ml/codegen/Wrappable
- java.lang.NoSuchMethodError: org.jpmml.sparkml.SparkMLEncoder.getDataField HOT 1
- Databricks Install HOT 1
- Version v4 is not supported HOT 2
- Cannot convert (partially-) unfitted pipelines HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-sparkml.