Code Monkey home page Code Monkey logo

Comments (5)

vruusmann avatar vruusmann commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency? This is where the converters for com.microsoft.azure.synapse.ml.lightgbm.*estimator classes reside.

from jpmml-sparkml.

davidmcomfort avatar davidmcomfort commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency? This is where the converters for com.microsoft.azure.synapse.ml.lightgbm.*estimator classes reside.

Thanks for the fast response. No, it doesn't. I will add now. It wasn't clear that it was needed. Thanks!

from jpmml-sparkml.

vruusmann avatar vruusmann commented on August 11, 2024

Does your project include the org.jpmml:pmml-sparkml-lightgbm dependency?

No, it doesn't. It wasn't clear that it was needed.

The idea is that if some piece of conversion functionality is dependent on external libraries, then it's "isolated" into a separate JPMML-SparkML module. For example, pmml-sparkml-lightgbm and pmml-sparkml-xgboost.

The third party library for LightGBM support is 3-5X bigger than the JPMML-SparkML library itself, and probably does include some JNI libraries (ie. is operating system/platform dependent). So, it doesn't make sense to include it by default.

from jpmml-sparkml.

vruusmann avatar vruusmann commented on August 11, 2024

@davidmcomfort I can see in my e-mail inbox that you've added some interesting technical comments to this thread overnight, but have deleted them afterwards.

Modifying issue/project history is not a particularly polite behaviour. It will get you blocked next time.

from jpmml-sparkml.

davidmcomfort avatar davidmcomfort commented on August 11, 2024

My Apologies. I was able to get both Scala and Pyspark versions working so I can convert SynapseML LightGBM models to PMML. The setup I am using is Databrick Runtime 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12) with the org.jpmml:pmml-sparkml:2.2.0, com.microsoft.azure:synapseml_2.12:0.9.5, org.jpmml:pmml-sparkml-lightgbm:2.2.0 libraries installed. And the pyspark2pmml-0.5.1 wrapper for Pyspark.

An code example is:

%scala
import com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassifier
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.RFormula

val irisData = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/Iris_2_classes.csv")
val irisSchema = irisData.schema

val rFormula = new RFormula().setFormula("Species ~ .")
val dtClassifier = new LightGBMClassifier().setFeaturesCol(rFormula.getFeaturesCol)
val pipeline = new Pipeline().setStages(Array(rFormula, dtClassifier))

val pipelineModel = pipeline.fit(irisData)

import org.jpmml.sparkml.PMMLBuilder

val pmml = new PMMLBuilder(irisSchema, pipelineModel).build()

import javax.xml.transform.stream.StreamResult
import org.jpmml.model.JAXBUtil

JAXBUtil.marshalPMML(pmml, new StreamResult(System.out))

And in Pyspark:

!pip install pyspark2pmml
from pyspark.ml import Pipeline
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import RFormula

df = spark.read.csv("dbfs:/FileStore/Iris_2_classes.csv", header = True, inferSchema = True)

formula = RFormula(formula = "Species ~ .")

from synapse.ml.lightgbm import LightGBMClassifier
classifier = LightGBMClassifier()
pipeline = Pipeline(stages = [formula, classifier])
pipelineModel = pipeline.fit(df)

from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)

pmmlBuilder.buildFile("/dbfs/FileStore/LightGBMClassificationIris_v2.pmml")

I would like to write to a S3 Bucket, but haven't been able to do that yet.
Thanks again for your help.

from jpmml-sparkml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.