Code Monkey home page Code Monkey logo

Comments (6)

vruusmann avatar vruusmann commented on August 11, 2024

My pandas dataframe has np.nan in it.

DMG.org issues a clarification (between PMML schema versions 4.3 and 4.4) that Double#NaN and Float#NaN values should be treated as invalid values, not missing values.

So, if your data frame contains numpy.nan and None, then the former are treated as invalid values and the latter as missing values. You better replace numpy.nan manually before invoking JPMML-Evaluator evaluation methods.

The PMML has missing handling in the mining field

GitHub ate your example code?

I assume that your MiningField@invalidValueTreatment is set to returnInvalid, otherwise there would be no reason to raise an InvalidResultException.

The same data evaluates from a CSV file using org.jpmml.evaluator.example.EvaluationExample

JPMML-Evaluator command-line application and JPMML-Evaluator-Python python package should behave identically.

If there are any differences, then they are most likely caused by CSV parsing layer (JPMML-Evaluator's internal CSV parser component maps N/A string values to Java null reference, whereas Pandas' read_csv maps them to numpy.nan).

from jpmml-evaluator-python.

bindernet avatar bindernet commented on August 11, 2024

If there are any differences, then they are most likely caused by CSV parsing layer (JPMML-Evaluator's internal CSV parser component maps N/A string values to Java null reference, whereas Pandas' read_csv maps them to numpy.nan).

This explains the difference. Thanks.

I still don't quite have it working though (not a major issue, I can test with Java):

This is what happens when I replace np.nan with Nones

---> 18 results = evaluator.evaluateAll(X_test.head(1).replace({np.nan: None}))
     19 
     20 print(results)

~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df)
     89                 result_records = []
     90                 for argument_record in argument_records:
---> 91                         result_record = self.evaluate(argument_record)
     92                         result_records.append(result_record)
     93                 return DataFrame.from_records(result_records)

~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluate(self, arguments)
     78 
     79         def evaluate(self, arguments):
---> 80                 javaArguments = self.backend.dict2map(arguments)
     81                 javaArguments = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "encodeKeys", javaArguments)
     82                 javaResults = self.javaEvaluator.evaluate(javaArguments)

~/.local/lib/python3.8/site-packages/jpmml_evaluator/pyjnius.py in dict2map(self, pyDict)
     36                                 javaValue = self.newObject("java.lang.Boolean", v)
     37                         else:
---> 38                                 raise ValueError("Python data type {0} is not supported".format(type(v)))
     39                         javaMap.put(javaKey, javaValue)
     40                 return javaMap

ValueError: Python data type <class 'NoneType'> is not supported

This BTW was my mining field. PMML v4.3

<MiningField name="xxxxx" missingValueReplacement="-999.01" missingValueTreatment="asValue" />

from jpmml-evaluator-python.

bindernet avatar bindernet commented on August 11, 2024

Interestingly, running with py4j backend and Nones works. Evaluating with np.nans gives an Invalid Result error.

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on August 11, 2024

This BTW was my mining field.

Your MiningField element only clarifies the handling of missing values (using missingValueTreatment and missingValueReplacement attributes).

As explained above, numpy.nan is an invalid value, so you'd need to clarify the handling of invalid values as well (using invalidValueTreatment and invalidValueReplacement attributes).

Interestingly, running with py4j backend and Nones works

The mapping between Java and Python values is backend-specific:
https://github.com/jpmml/jpmml-evaluator-python/blob/0.3.1/jpmml_evaluator/__init__.py#L15-L19

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on August 11, 2024

Interestingly, running with py4j backend and Nones works.

That's a valuable observation!

Reopening, in the form of a PyJNIus backend bug.

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on August 11, 2024

Additionally, there dict2map method could take an extra flag nan_as_missing, which would perform the conversion from numpy.nan to None automatically.

from jpmml-evaluator-python.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.