Comments (5)
Does your project include the org.jpmml:pmml-sparkml-lightgbm
dependency? This is where the converters for com.microsoft.azure.synapse.ml.lightgbm.*
estimator classes reside.
from jpmml-sparkml.
Does your project include the
org.jpmml:pmml-sparkml-lightgbm
dependency? This is where the converters forcom.microsoft.azure.synapse.ml.lightgbm.*
estimator classes reside.
Thanks for the fast response. No, it doesn't. I will add now. It wasn't clear that it was needed. Thanks!
from jpmml-sparkml.
Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency?
No, it doesn't. It wasn't clear that it was needed.
The idea is that if some piece of conversion functionality is dependent on external libraries, then it's "isolated" into a separate JPMML-SparkML module. For example, pmml-sparkml-lightgbm
and pmml-sparkml-xgboost
.
The third party library for LightGBM support is 3-5X bigger than the JPMML-SparkML library itself, and probably does include some JNI libraries (ie. is operating system/platform dependent). So, it doesn't make sense to include it by default.
from jpmml-sparkml.
@davidmcomfort I can see in my e-mail inbox that you've added some interesting technical comments to this thread overnight, but have deleted them afterwards.
Modifying issue/project history is not a particularly polite behaviour. It will get you blocked next time.
from jpmml-sparkml.
My Apologies. I was able to get both Scala and Pyspark versions working so I can convert SynapseML LightGBM models to PMML. The setup I am using is Databrick Runtime 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)
with the org.jpmml:pmml-sparkml:2.2.0
, com.microsoft.azure:synapseml_2.12:0.9.5
, org.jpmml:pmml-sparkml-lightgbm:2.2.0
libraries installed. And the pyspark2pmml-0.5.1
wrapper for Pyspark.
An code example is:
%scala
import com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.RFormula
val irisData = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/Iris_2_classes.csv")
val irisSchema = irisData.schema
val rFormula = new RFormula().setFormula("Species ~ .")
val dtClassifier = new LightGBMClassifier().setFeaturesCol(rFormula.getFeaturesCol)
val pipeline = new Pipeline().setStages(Array(rFormula, dtClassifier))
val pipelineModel = pipeline.fit(irisData)
import org.jpmml.sparkml.PMMLBuilder
val pmml = new PMMLBuilder(irisSchema, pipelineModel).build()
import javax.xml.transform.stream.StreamResult
import org.jpmml.model.JAXBUtil
JAXBUtil.marshalPMML(pmml, new StreamResult(System.out))
And in Pyspark:
!pip install pyspark2pmml
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula
df = spark.read.csv("dbfs:/FileStore/Iris_2_classes.csv", header = True, inferSchema = True)
formula = RFormula(formula = "Species ~ .")
from synapse.ml.lightgbm import LightGBMClassifier
classifier = LightGBMClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
pmmlBuilder.buildFile("/dbfs/FileStore/LightGBMClassificationIris_v2.pmml")
I would like to write to a S3 Bucket, but haven't been able to do that yet.
Thanks again for your help.
from jpmml-sparkml.
Related Issues (20)
- MultilayerPerceptronClassificationModel IllegalArgumentException("Expected 3 target categories, got 2 target categories"); HOT 1
- How to import the training data schema in libsvm format HOT 15
- Wrong code path for multinomial logistic regression model HOT 1
- Probability column not being found when using it in a stacked model HOT 6
- StringIndexerModelConverter gives java.lang.IllegalArgumentException HOT 4
- java.lang.ClassNotFoundException: org.jpmml.converter.BaseNFeature HOT 5
- Support for custom Java-backed models (eg. factorization machine) HOT 1
- Why One-Hot-Encoding is not visible in PMML? HOT 1
- py4j.protocol.Py4JError: org.jpmml.sparkml.PMMLBuilder does not exist in the JVM HOT 1
- Support for `XGBoostRegressor.missing` property HOT 6
- Troubleshooting XGBoost model performance HOT 17
- Support for Apache Spark 3.3.X HOT 2
- 2.x jars missing from Maven Central HOT 3
- Support for `replace` SQL function HOT 6
- Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/azure/synapse/ml/codegen/Wrappable
- java.lang.NoSuchMethodError: org.jpmml.sparkml.SparkMLEncoder.getDataField HOT 1
- Databricks Install HOT 1
- Version v4 is not supported HOT 2
- Cannot convert (partially-) unfitted pipelines HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-sparkml.