Comments (5)
This issue is a functional duplicate of jpmml/jpmml-evaluator#84
In brief, your testing data is not "compatible" with the training data - for some categorical feature(s), the testing data contains category values that were not present in training data. The JPMML-Evaluator library then refuses to score such data records, because the prediction would be non-sensical.
from jpmml-sparkml.
Also, what is your Apache Spark ML version, and how is the StringIndexer
transformation configured?
For example, in Apache Spark ML version 2.2 it is possible to specify how invalid values should be handled by setting the StringIndexer@handleInvalid
attribute. In this case, the value of this attribute should be set to "keep".
from jpmml-sparkml.
I am using Spark ML version 2.2.0
This is my StringIndexer configuration.
def getStringIndexer(columnNames:Array[String]):Array[PipelineStage]={
var stringIndexers : Array[PipelineStage] = new Array[PipelineStage](0)
for (i <- 0 until columnNames.length)
{
val indexer = new StringIndexer()
.setInputCol(columnNames(i))
.setOutputCol(columnNames(i)+"Indexer")
stringIndexers = stringIndexers:+indexer
}
stringIndexers
}
from jpmml-sparkml.
This is my StringIndexer configuration.
You're using the default StringIndexer@handleInvalid
attribute value, which is "error". Check Apache Spark ML documentation - it is a scoring error (implemented as InvalidResultException
in the JPMML-Evaluator library) if the input contains a previously unseen category value.
Please note that the native Apache Spark ML workflow would also fail to make a prediction in this case.
You need to do the following to make unseen categories scorable:
val indexer = new StringIndexer()
.setInputCol(columnNames(i))
.setOutputCol(columnNames(i) + "Indexer")
.setHandleInvalid("keep"); // THIS!
from jpmml-sparkml.
Thanks a lot @vruusmann ! That's really helpful. ^O^
from jpmml-sparkml.
Related Issues (20)
- MultilayerPerceptronClassificationModel IllegalArgumentException("Expected 3 target categories, got 2 target categories"); HOT 1
- How to import the training data schema in libsvm format HOT 15
- Wrong code path for multinomial logistic regression model HOT 1
- Probability column not being found when using it in a stacked model HOT 6
- StringIndexerModelConverter gives java.lang.IllegalArgumentException HOT 4
- java.lang.ClassNotFoundException: org.jpmml.converter.BaseNFeature HOT 5
- Support for custom Java-backed models (eg. factorization machine) HOT 1
- Why One-Hot-Encoding is not visible in PMML? HOT 1
- py4j.protocol.Py4JError: org.jpmml.sparkml.PMMLBuilder does not exist in the JVM HOT 1
- Error with LightGBMClassificationModel HOT 5
- Support for `XGBoostRegressor.missing` property HOT 6
- Troubleshooting XGBoost model performance HOT 17
- Support for Apache Spark 3.3.X HOT 2
- 2.x jars missing from Maven Central HOT 3
- Support for `replace` SQL function HOT 6
- Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/azure/synapse/ml/codegen/Wrappable
- java.lang.NoSuchMethodError: org.jpmml.sparkml.SparkMLEncoder.getDataField HOT 1
- Databricks Install HOT 1
- Version v4 is not supported HOT 2
- Cannot convert (partially-) unfitted pipelines HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-sparkml.