Comments (6)
My pandas dataframe has np.nan in it.
DMG.org issues a clarification (between PMML schema versions 4.3 and 4.4) that Double#NaN
and Float#NaN
values should be treated as invalid values, not missing values.
So, if your data frame contains numpy.nan
and None
, then the former are treated as invalid values and the latter as missing values. You better replace numpy.nan
manually before invoking JPMML-Evaluator evaluation methods.
The PMML has missing handling in the mining field
GitHub ate your example code?
I assume that your MiningField@invalidValueTreatment
is set to returnInvalid
, otherwise there would be no reason to raise an InvalidResultException
.
The same data evaluates from a CSV file using org.jpmml.evaluator.example.EvaluationExample
JPMML-Evaluator command-line application and JPMML-Evaluator-Python python package should behave identically.
If there are any differences, then they are most likely caused by CSV parsing layer (JPMML-Evaluator's internal CSV parser component maps N/A
string values to Java null
reference, whereas Pandas' read_csv
maps them to numpy.nan
).
from jpmml-evaluator-python.
If there are any differences, then they are most likely caused by CSV parsing layer (JPMML-Evaluator's internal CSV parser component maps N/A string values to Java null reference, whereas Pandas' read_csv maps them to numpy.nan).
This explains the difference. Thanks.
I still don't quite have it working though (not a major issue, I can test with Java):
This is what happens when I replace np.nan
with None
s
---> 18 results = evaluator.evaluateAll(X_test.head(1).replace({np.nan: None}))
19
20 print(results)
~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df)
89 result_records = []
90 for argument_record in argument_records:
---> 91 result_record = self.evaluate(argument_record)
92 result_records.append(result_record)
93 return DataFrame.from_records(result_records)
~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py in evaluate(self, arguments)
78
79 def evaluate(self, arguments):
---> 80 javaArguments = self.backend.dict2map(arguments)
81 javaArguments = self.backend.staticInvoke("org.jpmml.evaluator.EvaluatorUtil", "encodeKeys", javaArguments)
82 javaResults = self.javaEvaluator.evaluate(javaArguments)
~/.local/lib/python3.8/site-packages/jpmml_evaluator/pyjnius.py in dict2map(self, pyDict)
36 javaValue = self.newObject("java.lang.Boolean", v)
37 else:
---> 38 raise ValueError("Python data type {0} is not supported".format(type(v)))
39 javaMap.put(javaKey, javaValue)
40 return javaMap
ValueError: Python data type <class 'NoneType'> is not supported
This BTW was my mining field. PMML v4.3
<MiningField name="xxxxx" missingValueReplacement="-999.01" missingValueTreatment="asValue" />
from jpmml-evaluator-python.
Interestingly, running with py4j backend and None
s works. Evaluating with np.nan
s gives an Invalid Result error.
from jpmml-evaluator-python.
This BTW was my mining field.
Your MiningField
element only clarifies the handling of missing values (using missingValueTreatment
and missingValueReplacement
attributes).
As explained above, numpy.nan
is an invalid value, so you'd need to clarify the handling of invalid values as well (using invalidValueTreatment
and invalidValueReplacement
attributes).
Interestingly, running with py4j backend and Nones works
The mapping between Java and Python values is backend-specific:
https://github.com/jpmml/jpmml-evaluator-python/blob/0.3.1/jpmml_evaluator/__init__.py#L15-L19
from jpmml-evaluator-python.
Interestingly, running with py4j backend and Nones works.
That's a valuable observation!
Reopening, in the form of a PyJNIus backend bug.
from jpmml-evaluator-python.
Additionally, there dict2map
method could take an extra flag nan_as_missing
, which would perform the conversion from numpy.nan
to None
automatically.
from jpmml-evaluator-python.
Related Issues (19)
- Considering jnius for the jni communication HOT 11
- AttributeError: 'Timestamp' object has no attribute '_get_object_id' HOT 5
- Atomic data exchange between Python and Java HOT 1
- Reporting of PMML HOT 2
- Is there a way to turn off `too many input fields` exception? HOT 6
- Reflect Java exception hierarchy in Python HOT 2
- How to handle NaN fields HOT 2
- Problems when inputting values for date/datetime fields HOT 27
- Function "lessOrEqual" cannot accept missing value at position 0 HOT 6
- Using PMML with SkLearn's train-test split workflow HOT 12
- py4j.protocol.Py4JNetworkError: Answer from Java side is empty HOT 7
- Advice for debugging erroneous input and/or PMML documents HOT 4
- Choosing a default backend depending on the system architecture
- Setting JAVA_HOME required although java is installed in PATH HOT 7
- py4j.protocol.Py4JNetworkError: Answer from Java side is empty HOT 2
- Question: Can I use sklearn2pmml plugin in jpmml evaluator for Python? HOT 1
- Using Python equivalent of the basic usage of jpmml-evaluator from Java HOT 2
- Getting subprocess.CalledProcessError: Command '['which', 'javac']' returned non-zero exit status 1 when calling make_evaluator with jnius HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jpmml-evaluator-python.