imixs / imixs-ml Goto Github PK
View Code? Open in Web Editor NEWMachine Learning for Business Process Management
License: GNU General Public License v3.0
Machine Learning for Business Process Management
License: GNU General Public License v3.0
MLTrainingScheduler has a typo in config param
rename:
ml.trainng.scheduler.enabled => ml.training.scheduler.enabled
A training entity should not be created in case the Traning Scheduler is disabled
Change modul dependency form Imixs-Archive-Documents into Imixs-Archive-ORC.
during the build method the quality is set to TRAININGDATA_QUALITY_LEVEL_PARTIAL even if no entites are found.
If no entities are found the quality need to be set to TRAININGDATA_QUALITY_LEVEL_BAD
Use a Jax-rs client impl to send the Imixs-ML training and anayse objects.
Provide additional XMLRoot classes
663.52 is not detected with float 663,52
invoice np
also invoice date in german long format is not detected during anaysis
26. Januar 2017
invoice kraxi
Entites stored in a workitem can be represented in different format in a document. For example the float value 1042.0 can be
1042,00
1042.00
1.042,00
1,042.00
Another case is the representation of a date or an IBAN.
To support different representations of a value we need a n Adapter concept.
Implementation CDI Observer Pattern.
implement a test resource to test a existing model against a data set of workitems
Upgrade Imixs-Workflow 5.2.7
Add the new feature for "Text Classification"
The MLTrainingScheduler does not remove events after processing
add Jakarta 8 support
refactoring module setup (imixs-ml-core has wrong location!)
Add a sub project based on maven for testing via junit
We need to update the training method in the python object datatrain
.
In your default use case we did not provide more than one training set at one time
See also:
provide a health endpoint for imxis-ml-spacy to configure a Liveness probe in Kubernetes
The endpoint just need to verify if models are available. If not we can assume that something went wrong
Refactoring project structure
Add configuration for Docker Hub Images
Update pom.xml
AnalyseText must return the categories including the score.
Its up to the application to interpret the score of a category
We need to extend the datamodel returned by analyseText method.
separate the code for data objects and training methods in separate modules
In the current implementation training data is only generated for a worktitem with 100% match of all ML entities.
The reason for this is that bad data should not be send to the ML Service as this data can downgrade the model quality.
For example if the OCR extraction did not extract the IBAN correctly but the Workitem has the correct iban, than this workitem should NEVER be used for training as the text is wrong.
We recognized that if we train invoice data there might be a relevant amount of invoice data not including cdtr.iban
and cdtr.bic
. This is true for a lot of invoices. Currently we ignore those worktitems because of the missing items cdtr.iban
and cdtr.bic
.
If iban / bic are not included in the work item, we can assume that this type of entity data is not relevant at all and that the text data is probably not of bad quality.
But this might not be true for all kind of entites. For example an cdtr.name
or a invoice.date
and invoice.total
are essential for training a invoice workitem.
So we can solve this by marking items as 'optional' to indicate that if these kind of items are empty in the workitem the workitem can also be relevant for training.
As an example a Amazon invoice can be taken.
Replace XMLConfig with ItemCollection to get more flexibility in providing additional config data.
The XMLTrainingData method cleanTextdata should replace newlines with pilcow sign.
And also we should not strip spaces as this can be a hint for the ml framework to regognize entities in a better quality
rename api module into imixs-ml-training
Implement a JSF Front-End Integration to display entity values suggested from the ml analyses
and provide a ajax search method to search text phrases within the current document content
To get started, we first need some kind of simple example code to train an empty model for entity recognition.
For a multi model support each api endpoint should consume the model name
In this scenario it is necessary that a workitem hold a list of ml.definition objects containing the details of a ml service endpoint. e.g. the locales or the ml.status.
add TraingDataBuilder implementing a builder pattern to restructure code
because categories can not be added dynamically like ner entities we need a separate method in the spacy wrapper to initialize a blank model with a set of categories. This allows to us to use a simplified api for incremental training of new workitems.
See also discussion here: explosion/spaCy#6905 (comment)
Upgrade to spaCy version 3.0
Add optional Tika Options into the config to get more configuration options concering the tika server.
The MLService endpoint must not include the /analyse/ resource. The resource is added by the corresponding service method only.
The Adatper classes should support an optional text length. In some cases the returned text is to long.
Add maven release management for java modules
Empty Strings in a defaultValue @ConfigProperty are not allowed with microprofile 3.3.
Change MLService
add licence files
the current default lcale is set to "GERMAN" "UK" . In that case for germany only the language is defined.
Correct default should be
GERMANY, UK
Extend the CDI events and provide separate events for text and object adaption
Provide a docker image to run a basic microservice with exposing spaCy functionality.
See discussion here: https://stackoverflow.com/questions/60964785/how-to-expose-spacy-as-an-rest-api
Move spacy wrapper into a separate module Imixs-ML-SpaCy
improve the MLController method 'findMaches()'
It should be possible to reset the ML Status flag by a BPMN event. Example:
<ml-config name="status">suggest</ml-config>
should reset the status to 'suggest'
In case of a ml api error (e.g. form spacy wrapper service) the MLAdapter should interrupt the processing life cycle with a ProcessingException.
Improve suggest box with a keyUp/keyDown feature
The MLAdapter must ignore ml items not defined in the workflow model. Otherwise the ml adapter would create irrelevant content for a workitem.
add a training-service module providing a microservice to train and maintain models based on training data provided by an Imixs-Worklfow instance.
MLServcie - typo ITEM_ML_ITEMES
currentliy the trainingdata quality level is ignored.
But in case a quality level FULL is required and the workiem does not match that level the workitem should not be used for training!
Implement a Signal Adapter class to be added into a BPMN model
MLAdapter
This adapter class is used for ml analysis based on the Imixs-ML project.
The Adapter is configured through the model by defining a workflow result item named 'ml'.
Example:
<item name="ml_config">
<endpoint>https://localhost:8111/api/resource/</endpoint>
<locales>DE,UK</locales>
</item>
In case more than on attachment exists the MLAdapter should aggregete the text and call the ML API Endpoint only once with the complete text.
There for an optional filtering by file name should be supported: In this way a event can analyse ony a specific file type using a regular expression.
See Issue #44
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.