Code Monkey home page Code Monkey logo

Comments (4)

brother-darion avatar brother-darion commented on September 15, 2024

on the batch predict scene, I think it would be better if has option to choose raise exception or set this record result to NAN and keep predict the others.

The Java interface o.j.e.Evaluator only supports single-row prediction mode via Evaluator#evaluate(Map).

The Python interface builds its batch prediction mode jpmml_evaluator.Evaluator.evaluateAll(DataFrame) on top it. The main benefit of the batch interface is to send all rows from Java to Python as a single call (instead of many calls, one call per row).

Now, this is actually a good idea that the JPMML-Evaluator-Python should provide an option for configuring a "what to do about an EvaluationException".

I can quickly think of two options:

  1. "return invalid" aka "as-is". Matches the current behaviour, where the Java exception is propagated to the top, and the evaluation is stopped at that location.
  2. "replace with NaN" aka "ignore". The Java component will catch a row-specific exception, and replaces the result for that row with Double#NaN (or some other user-specified constant?).

Also, in "return invalid" aka "as-is" mode, it should be possible to configure if partial results can be returned or not. Suppose there is a batch of 10'000 rows, and the evaluation fails on row 8566 because of some data input error. I think it might makse sense to return the leading 8565 results in that case.

right, it's really friendly options; and this two options is adding under current version behavior which is just throw exception, right? like you said , it's importance to clear feedback, this options is importance either.

and I was thinking the "replace with NaN" need a threshold or specified rows number to stop evaluation or not, because on some specified scene which is people use the wrong data, it would be a little annoying that still evaluation all data.

what is your thinking?

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on September 15, 2024

There is a third option - "omit row" aka "drop". If there are evaluation errors, then the corresponding rows are simply omitted from the results batch.

The "omit row" option assumes that the user has assigned custom identifiers to the rows of the arguments batch. So, if there are 156 argument rows, and only 144 result rows (meaning that 12 rows errored out), then the user can locally identify "successful" vs "failed" rows in her application code.

See #23 about row identifiers.

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on September 15, 2024

As a general comment - my "design assumption" behind the Evaluator.evaluateAll(X) method is that the size of the arguments dataframe is about/up to 10'000 cells (eg. a dataframe of 10 features x 1000 rows).

My thinking is that the data is being moved between Python and Java environments using the Pickle protocol. If the pickle payload gets really big (say, 1'000'000 cells instead of 10'000 cells), then the Java component responsible for loading/dumping might start hitting unexpected memory/processing limitations.

If the dataset is much bigger than 10'000 cells, then it should be partitioned into multiple chunks in Python application code. And the chunking algorithm should be prepared to handle the "omit row" option gracefully.

from jpmml-evaluator-python.

vruusmann avatar vruusmann commented on September 15, 2024

my "design assumption" behind the Evaluator.evaluateAll(X) method is that the size of the arguments dataframe is about/up to 10'000 cells

The Evaluator.evaluateAll(X) method should have an extra parameter for controlling the batch size. The default would be my design assumption - about 10'000 cells. But the end user can increase or decrease its value if needed.

This way, the chunking logic would be nicely available at the JPMML-Evaluator-Python library level, leaving the actual Python application code clean.

from jpmml-evaluator-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.