Code Monkey home page Code Monkey logo

imixs-ml's People

Contributors

dependabot[bot] avatar rsoika avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

imixs-ml's Issues

Implement Jax-rs client

Use a Jax-rs client impl to send the Imixs-ML training and anayse objects.

Provide additional XMLRoot classes

Introduce AdapterConcept for analyzing entites

Entites stored in a workitem can be represented in different format in a document. For example the float value 1042.0 can be

1042,00
1042.00
1.042,00
1,042.00

Another case is the representation of a date or an IBAN.

To support different representations of a value we need a n Adapter concept.
Implementation CDI Observer Pattern.

Jakarta 8 support

add Jakarta 8 support
refactoring module setup (imixs-ml-core has wrong location!)

imixs-ml-spacy - provide health endpoint

provide a health endpoint for imxis-ml-spacy to configure a Liveness probe in Kubernetes

The endpoint just need to verify if models are available. If not we can assume that something went wrong

Refactoring project structure

Refactoring project structure

  • imixs-ml-spacy - wrapper code for spacy
  • imixs-ml-core - core java classes (e.g. training data obejcts)
  • imixs-ml-workflow - workflow integration (analyzing)
  • imixs-ml-training - training api

TrainingService - refine training mode

In the current implementation training data is only generated for a worktitem with 100% match of all ML entities.
The reason for this is that bad data should not be send to the ML Service as this data can downgrade the model quality.

For example if the OCR extraction did not extract the IBAN correctly but the Workitem has the correct iban, than this workitem should NEVER be used for training as the text is wrong.

The Problem

We recognized that if we train invoice data there might be a relevant amount of invoice data not including cdtr.iban and cdtr.bic. This is true for a lot of invoices. Currently we ignore those worktitems because of the missing items cdtr.iban and cdtr.bic.

The Solution

If iban / bic are not included in the work item, we can assume that this type of entity data is not relevant at all and that the text data is probably not of bad quality.

But this might not be true for all kind of entites. For example an cdtr.name or a invoice.date and invoice.total are essential for training a invoice workitem.

So we can solve this by marking items as 'optional' to indicate that if these kind of items are empty in the workitem the workitem can also be relevant for training.

As an example a Amazon invoice can be taken.

Implement a JSF Front-End Integration

Implement a JSF Front-End Integration to display entity values suggested from the ml analyses
and provide a ajax search method to search text phrases within the current document content

  • provide jsf components / subfomrs
  • provide javaScript library

Example for training data

To get started, we first need some kind of simple example code to train an empty model for entity recognition.

Change API Endpoints for a multi-model support

For a multi model support each api endpoint should consume the model name

In this scenario it is necessary that a workitem hold a list of ml.definition objects containing the details of a ml service endpoint. e.g. the locales or the ml.status.

SpaCy - initialize model with categories

because categories can not be added dynamically like ner entities we need a separate method in the spacy wrapper to initialize a blank model with a set of categories. This allows to us to use a simplified api for incremental training of new workitems.

See also discussion here: explosion/spaCy#6905 (comment)

Adapter - Textlength

The Adatper classes should support an optional text length. In some cases the returned text is to long.

Change default lcale to GERMANY, UK

the current default lcale is set to "GERMAN" "UK" . In that case for germany only the language is defined.
Correct default should be

GERMANY, UK

MLController - improve findMaches

improve the MLController method 'findMaches()'

  • increase suggest size from 32 length to 64
  • add more text variants including spaces

training-service

add a training-service module providing a microservice to train and maintain models based on training data provided by an Imixs-Worklfow instance.

MLService - support trainingdata quality level

currentliy the trainingdata quality level is ignored.
But in case a quality level FULL is required and the workiem does not match that level the workitem should not be used for training!

Implement MLAdapter class

Implement a Signal Adapter class to be added into a BPMN model

 MLAdapter

This adapter class is used for ml analysis based on the Imixs-ML project.

The Adapter is configured through the model by defining a workflow result item named 'ml'.

Example:

<item name="ml_config">
    <endpoint>https://localhost:8111/api/resource/</endpoint>
    <locales>DE,UK</locales>
</item>

MLAdapter - aggregate text content

In case more than on attachment exists the MLAdapter should aggregete the text and call the ML API Endpoint only once with the complete text.

There for an optional filtering by file name should be supported: In this way a event can analyse ony a specific file type using a regular expression.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.