I keep getting an error when trying to convert Pipeline models built using SynapseML L

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error with LightGBMClassificationModel about jpmml-sparkml HOT 5 CLOSED

davidmcomfort commented on August 11, 2024

Error with LightGBMClassificationModel

from jpmml-sparkml.

Comments (5)

vruusmann commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency? This is where the converters for com.microsoft.azure.synapse.ml.lightgbm.*estimator classes reside.

from jpmml-sparkml.

davidmcomfort commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency? This is where the converters for com.microsoft.azure.synapse.ml.lightgbm.*estimator classes reside.

Thanks for the fast response. No, it doesn't. I will add now. It wasn't clear that it was needed. Thanks!

from jpmml-sparkml.

vruusmann commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency?

No, it doesn't. It wasn't clear that it was needed.

The idea is that if some piece of conversion functionality is dependent on external libraries, then it's "isolated" into a separate JPMML-SparkML module. For example, pmml-sparkml-lightgbm and pmml-sparkml-xgboost.

The third party library for LightGBM support is 3-5X bigger than the JPMML-SparkML library itself, and probably does include some JNI libraries (ie. is operating system/platform dependent). So, it doesn't make sense to include it by default.

from jpmml-sparkml.

vruusmann commented on August 11, 2024

@davidmcomfort I can see in my e-mail inbox that you've added some interesting technical comments to this thread overnight, but have deleted them afterwards.

Modifying issue/project history is not a particularly polite behaviour. It will get you blocked next time.

from jpmml-sparkml.

davidmcomfort commented on August 11, 2024

My Apologies. I was able to get both Scala and Pyspark versions working so I can convert SynapseML LightGBM models to PMML. The setup I am using is Databrick Runtime 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12) with the org.jpmml:pmml-sparkml:2.2.0, com.microsoft.azure:synapseml_2.12:0.9.5, org.jpmml:pmml-sparkml-lightgbm:2.2.0 libraries installed. And the pyspark2pmml-0.5.1 wrapper for Pyspark.

An code example is:

%scala
import com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.RFormula

val irisData = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/Iris_2_classes.csv")
val irisSchema = irisData.schema

val rFormula = new RFormula().setFormula("Species ~ .")
val dtClassifier = new LightGBMClassifier().setFeaturesCol(rFormula.getFeaturesCol)
val pipeline = new Pipeline().setStages(Array(rFormula, dtClassifier))

val pipelineModel = pipeline.fit(irisData)

import org.jpmml.sparkml.PMMLBuilder

val pmml = new PMMLBuilder(irisSchema, pipelineModel).build()

import javax.xml.transform.stream.StreamResult
import org.jpmml.model.JAXBUtil

JAXBUtil.marshalPMML(pmml, new StreamResult(System.out))

And in Pyspark:

!pip install pyspark2pmml
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula

df = spark.read.csv("dbfs:/FileStore/Iris_2_classes.csv", header = True, inferSchema = True)

formula = RFormula(formula = "Species ~ .")

from synapse.ml.lightgbm import LightGBMClassifier
classifier = LightGBMClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)

from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)

pmmlBuilder.buildFile("/dbfs/FileStore/LightGBMClassificationIris_v2.pmml")

I would like to write to a S3 Bucket, but haven't been able to do that yet.
Thanks again for your help.

from jpmml-sparkml.

Error with LightGBMClassificationModel about jpmml-sparkml HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent