Code Monkey home page Code Monkey logo

jube-home / jube Goto Github PK

View Code? Open in Web Editor NEW
27.0 2.0 2.0 190.18 MB

Jube is an open-source software designed for monitoring transactions and events. It offers a range of powerful features including real-time data wrangling, artificial intelligence, decision making, and case management. Jube's exceptional performance is particularly evident in its application to fraud prevention and abuse detection scenarios.

Home Page: https://www.jube.io

License: GNU Affero General Public License v3.0

C# 78.70% HTML 9.36% JavaScript 2.40% CSS 7.18% Less 0.02% Rich Text Format 1.50% Smalltalk 0.85% Dockerfile 0.01%
case-management data-mining data-visualization event-monitoring faas faas-platform fraud fraud-detection fraud-prevention machine-learning

jube's Introduction

Image

About Jube

Jube is open-source transaction and event monitoring software. Jube implements real-time data wrangling, artificial intelligence, decision making and case management. Jube is particularly strong when implemented in fraud and abuse detection use cases.

Data wrangling is real-time. Data wrangling is directed via a series of rules created using either a point-and-click rule builder or an intuitive rule coder. Rules are in-memory matching functions tested against data returned from high-performance cache tables, where datasets are fetched only once for each key that the rules roll up to for each transaction or event processing, with the matches aggregating using a variety of functions. Alternative means of maintaining a lightweight long-term state to facilitate data wrangling is Time To Live (TTL) Counters which are incremented on rule match and then decremented for that incrementation on time-lapse.

Data wrangling return values are independently available for use as features in artificial intelligence training and real-time recall or tested by rules to perform a specific action (e.g., the rejection of a transaction or event). Wrangled values are returned in the real-time response payload and can facilitate a Function as a Service (FaaS) pattern. Response payload data is also stored in an addressable fashion, improving the experience of advanced analytical reporting while also reducing database resource \ compute cost.

Jube is developed stateless and can support massive horizontal scalability and separation of concerns in the infrastructure.

Jube takes a novel approach to artificial intelligence, ultimately Supervised Learning, yet blending anomaly detection with confirmed class data to ensure datasets of sufficient amounts of class data. Using data archived from its processing, Jube searches for optimal input variables, hidden layers and processing elements. The result is small, optimal, generalised and computationally inexpensive models for efficient real-time recall. The approach taken by Jube allows artificial intelligence's benefits to be available very early in an implementation's lifecycle. It avoids over-fitting models to typology long since passed.

Transaction or event monitoring overlooks the embedding in human-engaging business processes. Jube is real-time, but this does not forgo the need for manual intervention; hence Jube makes comprehensive and highly customisable case management and visualisation intrinsically available in the user interface.

To ensure the segregation of user responsibilities, a user, role and permission model is in place, which controls access to each page within the Jube user interface. Detailed audit logs are available. Any update to a configuration by a user retains a version history, in which the original is logically deleted and then replaced with the new version.

Jube is multi-tenanted, allowing a single infrastructure to be shared among many logically isolated entities, maintaining total isolation between tenant data with no loss of function in the user interface.

Stargazing

Please consider giving the project a GitHub Star. Thank you in advance!

Quickstart

Jube runs on commodity Linux. The Quickstart has the following prerequisites:

  • .Net 8 Runtime.
  • Postgres database version 13 onwards (tested on 15.4 but no significant database development to cause a breaking change).
  • Optional but recommended: Redis version 6 or above (it probably works fine on earlier versions, as the command used are basic. RESP wire compatible implies that it is possible to use KeyDB, DragonflyDB, Garnet or any RESP compliant wire protocol database).

Subject to prerequisites, Jube can be up and running in minutes:

git clone https://github.com/jube-home/jube.git
cd jube/Jube.App
export ConnectionString="Host=<host>;Port=<port>;Database=<defaultdb>;Username=<username>;Password=<password>;Pooling=true;Minimum Pool Size=0;Maximum Pool Size=100;SSL Mode=Require;Trust Server Certificate=true;"
export RedisConnectionString="<host>"
export ASPNETCORE_URLS="https://localhost:5001"
export JWTKey="IMPORTANT:_ChangeThisKey_~%pvif3KRo!3Mk|1oMC50TvAPi%{mUt<9"B&|>DP|GZy"YYWeVrNUqLQE}mz{L_UsingThisKeyIsDangerous"
dotnet run

For security, there is no means to pass configuration values via anything other than Environment Variables, and the contents of those Environment Variables are never - ever - stored by Jube (which is something the CodeQL security scanner tests for).

The use of Redis is encouraged as it provides a 33% improvement in response times, and a marked improvement in response time variance contrasted against using Postgres Database. Redis also does not require Cache table indexing jobs, and while such indexing is automatic on existing data for Postgres, it does create some delay in the creation of Search Keys retroactively, however by contrast Search Keys in Redis can only be created on a forward only basis and there is no preexisting data. In general the trade of between Key \ Value Pair in-memory databases and RDMBS durable databases is not trivial. In general, the use of Postgres Database is probably the right choice for low volume or cost sensitive implementations where the staff and infrastructure complexity costs can't be justified, whereas for any serious real-time implementation given infrastructure technical capacity, doubtless Redis is the better choice. Setting the Redis Environment Variable to false will fall back to using the Postgres Database for cache, and is the more simple implementation:

export Redis="False"

There are sensitive cryptographic values that need to be included at startup. At a minimum the JWTKey value is required:

export JWTKey="IMPORTANT:_ChangeThisKey_~%pvif3KRo!3Mk|1oMC50TvAPi%{mUt<9"B&|>DP|GZy"YYWeVrNUqLQE}mz{L_UsingThisKeyIsDangerous"

The JWTKey value is used to encrypt access tokens providing for API authentication, and therefore user interface authentication.

While outside of the scope of this installation documentation, other sensitive variables, while optional, are strongly suggested:

export PasswordHashingKey="IMPORTANT:_ChangeThisKey_~%pvif3KRo!3Mk|1oMC50TvAPi%{mUt<9"B&|>DP|GZy"YYWeVrNUqLQE}mz{L_UsingThisKeyIsDangerous"

It is imperative that the keys be changed from their defaults and kept safe in appropriate storage. Jube will not start if the keys above are used.

Change the template value for setting the ConnectionString and JWTKey Environment Variables, then run the shell script as above. Wait for the build to complete, after which observe the welcome message.

Waiting a few moments more will ensure that the embedded Kestrel web server is started correctly. In a web browser, navigate to the bound URL https://localhost:5001/ as per the ASPNETCORE_URLS Environment Variable.

The default user name \ password combination is Administrator \ Administrator, although the password will be need to be changed on first login.

A more comprehensive installation guide is available in the Getting Started of the documentation.

Documentation

The documentation has been drafted to include all features, and there should not be any undocumented know-how. The documentation adopts an instructional style that will explain most features step-by-step with extensive use of screenshots.

The documentation has been drafted to include all features, and there should not be any undocumented know-how. The documentation adopts an instructional style that will explain most features step-by-step with extensive use of screenshots.

Jube is committed to high-quality instructional documentation and maintains it as part of the overall release methodology. If documentation is inadequate, unclear or missing, raise a Github Issue.

Training

Jube offers a training program that focuses on achieving proficiency in the effective implementation and utilization of Jube.

For the Americas, biannually, a training program is conducted in New York, US, at a venue to be shared closer to the time. For Europe, Middle East and Africa, biannually also, a training program is conducted in Larnaca, Cyprus, at a venue to be shared closer to the time. The Larnaca program is more cost effective owing to foreign travel not being required of the trainer. The training is delivered by Richard Churchman, the author of Jube.

The schedule covers a duration of three days, with the length of each day ranging from 6 to 8 hours, depending on the undertaking of Elective Modules. Elective Modules cover in-depth training in advanced administrative concepts using dedicated training servers. Elective Modules are targeted at technical participants whom are likely to assume overall system administrative responsibility of an implementation of Jube.

Day 1:

  • Introduction.
  • User Interface.
  • HTTP Messaging.
  • Models and Payload.
  • Inline Functions.
  • Abstraction Rules.
  • Abstraction Calculations.
  • Lists and Dictionaries.
  • Activation Rules.
  • Elective: Architecture and Caching.
  • Elective: Environment Variables.
  • Elective: Installation and Log Configuration.

Day 2:

  • Suppression.
  • Sanctions Fuzzy Matching.
  • Time To Live (TTL) Counters.
  • Introduction to Artificial Intelligence (AI).
  • Exhaustive AI training.
  • Case Management.
  • Security.
  • Elective: Tracing Transaction Flow and Response Time Analysis.
  • Elective: High Availability.
  • Elective: Performance Counters.
  • Elective: AMQP.

Day 3:

  • SQL database discovery.
  • Performance Monitoring.
  • Visualisation and Reporting.
  • Inline Scripts.
  • Scores via R Plumber (HTTP).
  • Elective: Cache Bottleneck Analysis.
  • Elective: Archive Bottleneck Analysis.
  • Elective: Multi-Tenancy.
  • Elective: Git Definitions Backup and Recovery.

The training program is available on the subsequent dates:

  • New York, US. In 2024, starting Tuesday October 22nd through end of Thursday October 24th. USD 2950 per participant.
  • Larnaca, Cyprus: In 2024, starting Tuesday December 3rd through end of Thursday December 5th. EUR 1395 per participant.
  • New York, US. In 2025, starting Tuesday April 22nd through end of Thursday April 24th. USD 2950 per participant.
  • Larnaca, Cyprus: starting Tuesday June 3rd through end of Thursday June 5th. EUR 1395 per participant.

A maximum of 8, and a minimum of 4 participants. Includes lunch and refreshments. Additionally, participants will have access to four hours of Commercial Support, valid for six months after completion of the program.

For further details, including the detailed training plan, kindly contact [email protected].

Where confidentially considerations exist, the same program can be made available at the client's premises for a daily rate of EUR 800 or USD 857, excluding customary business travel and accommodation costs. Remote programs are not generally offered given observations of reduced practical participation and outcome.

Support

Free Support is available via Github Issues on a best endeavour basis. Commercial Support available at a daily rate of EUR 800 or or USD 857 prorated. It is uncommon for a client to require more than two days of Commercial Support per month given an active production implementation, although implementation demands vary depending on client technical proficiency. For further details, please contact [email protected].

Reporting Vulnerabilities

Please do not file GitHub issues for security vulnerabilities, as they are public.

Jube takes security issues very seriously. If you have any concerns about Jube or believe you have uncovered a vulnerability, please contact via the e-mail address [email protected]. In the message, try to describe the issue and, ideally, a way of reproducing it.

Please report any security problems to Jube before disclosing them publicly.

Governance

Jube Holdings Limited is a Cyprus company registered HE404521. Jube Holdings Limited owns Jube software and Trademarks (registered or otherwise). Jube is maintained by Jube Operations Limited, a United Kingdom company with registration 14442207. Jube Operations Limited is a wholly owned subsidiary of Jube Holdings Limited. Jube Operations Limited provides training and support services for Jube. Jube and "Jooby" (the logo) is a registered trademark in Cyprus.

Licence

Jube is distributed under AGPL-3.0-or-later.

jube's People

Contributors

dependabot[bot] avatar richard-churchman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

jube's Issues

Replace Binary Serializer with Newtonsoft Json Serialiser (if possible) or find work around

In .Net 7 the BinarySerializer is deprecated. The BinarySerializer is not used extensively, except to save Neural Network model states.

The suggested replacement for BinarySerialization is Json, however, it is not clear the extent to which this absolutely serializes an object, including internal properties and the total state.

This ticket is to replace all transient and persistent serialized objects with a Json serialisation if possible, and figure out a workaround.

This refactoring is currently the main blocker in a path to .Net 7.

Create a catalog of fields in use in rules to suggest covered indexes and reduce fetch from database.

Oftentimes the amount of data being used in rules is a small subset of the data that is presented. It follows that forcing the database to go out to Page and Tuple is quite expensive, especially when the index is likely in the buffer cache.

It is not currently possible to see the fields in use by rules that depend on cache data.

Create a function in model sync that will examine the Request XPath elements in existence in rules and create a catalogue of fields in use.

With the catalogue, only select data back from the database that is required in the rules. Limiting the select statement has a big impact on query performance and jsonb field parsing.

Optionally, build - or at a minimum suggest - indexes which cover the fields to avoid the need to go to Page and Tuple.

Improve Support for Docker

Jube can be built to Docker but there is currently no option to do this in the default build, instead, it is compiled as debug, using dotnet run. This works, but it is only really a demonstration and expects the end user to publish for their release.

Include a Docker file and .yaml file as required in the software, compiling Docker to release and not debug.

Most users appear to require that the software is architected to support containerisation, which it is, with differing appetite for Docker, but universal appetite for Kubenates or similar cloud provider scaling options.

Operations that change non-concurrent collections must have exclusive access

2024-06-22 21:13:15,819 Entity Invoke: GUID a7e70157-0002-45d0-8ac3-92d4e65e539c and model 1 has created a general error as System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
at System.Collections.Generic.Dictionary2.TryInsert(TKey key, TValue value, InsertionBehavior behavior) at System.Collections.Generic.Dictionary2.Add(TKey key, TValue value)
at Jube.Engine.Invoke.EntityAnalysisModelInvoke.WaitReadTasks(List`1 pendingReadTasks) in /home/richard.churchman/RiderProjects/jube/Jube.Engine/Invoke/EntityAnalysisModelInvoke.cs:line 826
at Jube.Engine.Invoke.EntityAnalysisModelInvoke.Start() in /home/richard.churchman/RiderProjects/jube/Jube.Engine/Invoke/EntityAnalysisModelInvoke.cs:line 748.

This has been an error that has long since needed attention and relates to counter functionality use of collections for these counter. More generally the int's should be of a safe type for multi threading and the collections used should be concurrent. I would prefer to not Sync Lock around bits of code if there is already a tool available.

Model Wizard

Creating a machine learning model in Jube can be a convoluted process involving creating a model, specifying fields to be extracted, specifying tags and then loading data via HTTP endpoint, before being available for training in the embedded Exhaustive machine learning algorithm. The requirements contrast to products which can achieve the same through the application of a CSV file. It follows that despite having more advanced capabilities the adoption may be reduced to other products. While Jube was not designed as an automated machine learning Wizard, there appears increasing overlap

It is proposed that a Model Wizard be created to take a CSV file and parse the metadata and data itself, automatically creating all configuration elements that are otherwise created manually. The file will be parsed for its data to identify the universe of categorical variables, with these being created as Boolean XPath expressions (a process which currently is done typically outside of Jube).

Task: Ensure JSON Path Expression returns a Boolean value

As categorical data pivoting will be done in Jube, JSON Path must be available in the Request XPath Model Configuration to return based on Expression, for example, $.[?(@.=='Politician')].

Task: Create a new page to parse the CSV file

The new page called Model Wizard, existing under the Models menu item, will accept a CSV file as an upload and proceed to parse the headers. For each header the data will be inspected:

  • Is all numeric, in which case will be treated as Float for the purpose of model configuration.
  • Has the presence of string data, in which case will be treated as String for the purpose of model configuration.

In keeping with the stateless nature of the design, the parsing will be stored in tables in the database for recall by the user interface. At this stage, the model will not be created.

Task: Allocate Dependent Variable

With the metadata having been established, the page must accept further configuration parameters, specifically including the dependent variable, which will go on to be a tag value, corresponding Exhaustive Model and Activation Rule.

Task: Create Model

Based on metadata and configuration create the model in Jube comprising:

  • Headers will be transposed to Request XPath configuration elements.
  • For each String in Categorical variables the header will be transposed as an expression (i.e. Categorical Data Pivoting).
  • For each String in the Categorical variable specified as Dependent Variable a Tag element will be created and;
  • An Exhaustive configuration element will be created to target the Tag disposition for machine learning and;
  • For good measure, an Activation Rule element will be created targeting the return value from Exhaustive models, where > 0.5 will drive activation. The Activation Rule is not strictly necessary as the Exhaustive recall values are available in their raw form on recall.

Task: Load Data from CSV into JSON for storage in the Archive

Transpose the CSV file to a JSON representation and store it in the Archive table which will make the data available for Exhaustive training.

Task: Synchronise Model

Insert data to cause the model to synchronise and thus start Exhaustive training.

Make clear Default is Demo Training Dataset on Exhaustive Page

For reasons of being able to demonstrate the Exhaustive training functionality in the default installation Exhaustive training a model on a demonstration dataset. The absence of a clear message may lead users to wonder what the platform is not training on data tagged or laid out in the database.

Create a clear splash note on the page that makes it clear that it is targeting demonstration data, make the dataset available for download or inspection, and mention the environment variable that needs to be changed for production data to be used.

SQL Statements against cache for abstraction rules selecting only columns required by abstraction rule

The Abstraction Rule process performed a prepared select statement against the cache tables. At some point this will be an index only lookup for reason of improved use of covered indexes.

The select statement is performed only once for each key and logic processed against that dataset in memory. The amount is data fields being brought back need only be that required of the rule. Reducing the select will have the effect of increasing performance through less data coming across the wire and also less casting of fields that are never to be used.

This ticket is to perform analysis on rule parsing to ensure that only fields that used are included in the select.

Instruct index creation out of model synchronisation to a background queue. Build indexes concurrently. Drop Duplicate Indexes.

At the moment indexes on search keys are done inside model synchronisation, which is blocking. Also, and it is a bug, the indexes are built without using the concurrent option which brings about locking.

Instead of building the index in the model synchronisation routine, send that instruction to a background table queue if the definition of that index does not already exist. A separate thread will poll this table queue and begin the process of concurrent index creation.

Support covered fields in the indexes based upon the fields in use in rules.

Not returning all values uploaded into a list

Hello,

I uploaded the enclosed file with 659 records.

Unfortunately I can only see the first 18 records in the grid and have no way to see the other records (see screen shot enclosed)

When I query the DB I can see the records are there (see enclosed)

select "EntityAnalysisModelListId", count(*), "Deleted"  from "EntityAnalysisModelListValue"  
where "Deleted" is null
group by  "EntityAnalysisModelListId",  "Deleted"

Can you consider adding:

  • A button to allow a user to progress to see more records / edit them
  • A counter stating how many records there are in total

Thank you for your consideration.
malicious user agents.txt
List display not showing all records
list db query showing records

Change Cache Indexing to use a Hash Index and not a B-Tree Index

Currently, the indexing on the cache table uses B-Tree indexing (not covered, but they may be in a separate project).

The use of a B-Tree index is redundant as queries to the cache table are on equality only. It follows that greater performance could be obtained without any penalty other than adding the index type in the creation process.

This change won't be breaking, but it would mean that indexes would need to be duplicated for existing users on the upgrade, pending a drop manually.

Replace Kendo ListView control

In Jube's List and Dictionary functionality, each value is presented and managed by a Kendo ListView control. This control used to be used a lot however the implementation was always problematic, and the user experience was poor. For example, the control was replaced on the Models page and has a much more elegant customised way to maintain lists.

Review the user interface and replace all instances of the ListView Kendo control with either Kendo Grid (most likely) or the tooling used in the Models page (which would be more elegant but require more work).

Password Requirements too strict

Hello,

I recently spun up jube.io and on the first login screen it asked me to change my password.

This seems to take a password policy (regex) from a social media platform i.e. 16 characters, upper and lower case, special characters limited.

I would appreciate if the passwords could be made more complex i.e. the length is unlimited or increased to 100 and all special characters can be used.

Thank you for your consideration.

Build Error After Merge Duplicate Attributes in Accord.net Projects

The most recent merge to master has introduced a problem in the build step relating to duplicate attributes in the build step. It is not clear what resources have been moved or ignored that this can happen, such that it built and run in development branch and not master despite no merge conflicts.

5>Accord.Genetic.AssemblyInfo.cs(14,12): Error CS0579 : Duplicate 'System.Reflection.AssemblyConfigurationAttribute' attribute
5>Accord.Genetic.AssemblyInfo.cs(18,12): Error CS0579 : Duplicate 'System.Reflection.AssemblyTitleAttribute' attribute
5>------- Finished building project: Accord.Genetic. Succeeded: False. Errors: 2. Warnings: 0

7>Accord.MachineLearning.AssemblyInfo.cs(14,12): Error CS0579 : Duplicate 'System.Reflection.AssemblyConfigurationAttribute' attribute
7>Accord.MachineLearning.AssemblyInfo.cs(18,12): Error CS0579 : Duplicate 'System.Reflection.AssemblyTitleAttribute' attribute
7>------- Finished building project: Accord.MachineLearning. Succeeded: False. Errors: 2. Warnings: 0

This did not happen in the branch and appears to be owing to the generation of assembly information for the project. Adding the attribute:

false

In the project files should resolve the issue.

Move PostgreSQL Cache Updated and Inserts to a Single Transaction. Move Case Creation and TTL Counter Entry to Bulk Insert.

During online transaction processing, rather the invoke process, there are several insert \ update interactions with the PostgreSQL database which happen inline. These interactions are expensive, but given PostgreSQL being used as cache unavoidable.

All inserts and updates should be done inside the same transaction to avoid excessive commit. It follows that it is necessary to batch up the inserts \ updates, where not possible to bulk insert (in the case of Time To Live (TTL) Counter Entries and Case Creation) and execute in a single transaction.

For inserts which are not time sensitive, such as TTL Counter Entries (used to wind back TTL counters) and Case creation, move these back bulk and \ or background processes.

Remove Accors.net Obsolete Dependencies

Following the migration of the Accord.net open-source code to .net 6 and as part of the Jube solution, several obsolete warnings have emerged:

Serializer.cs(118, 21): [SYSLIB0011] 'BinaryFormatter.Serialize(Stream, object)' is obsolete: 'BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.'

Serializer.cs(122, 17): [SYSLIB0011] 'BinaryFormatter.Serialize(Stream, object)' is obsolete: 'BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.'

Serializer.cs(369, 35): [SYSLIB0011] 'BinaryFormatter.Deserialize(Stream)' is obsolete: 'BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.'

Serializer.cs(373, 31): [SYSLIB0011] 'BinaryFormatter.Deserialize(Stream)' is obsolete: 'BinaryFormatter serialization is obsolete and should not be used. See https://aka.ms/binaryformatter for more information.'

ExtensionMethods.cs(672, 29): [SYSLIB0014] 'WebClient.WebClient()' is obsolete: 'WebRequest, HttpWebRequest, ServicePoint, and WebClient are obsolete. Use HttpClient instead.'

In the case of serialisation, this should not be handled by Accord.net at all now. In the case of a web client, there should not be any use at all. All can be removed.

Convert Invoke Method to support async fully

The Redis project calls for reads to be performed against the Redis cache in parallel to the PostgreSQL Cache. More generally the async methods provide for a much better experience where the thread would otherwise be waiting on IO. The async methods have been proven to perform better than blocking methods by quite a margin and the total linear processing of the invoke method, putting aside the ForkAstractionKeys setting which is badly implemented.

Upgrade all PostgreSQL cache calls to fully support async methods.

Deprecate ForkAstractionKeys and make this the only processing method using task completion joining on wait complete.

Can't turn an Exhaustive model on after training.

If an Exhaustive mode is not set to Active at a point before training, it is not able to be activated after training. The Update button is hidden by design, however, it would be better if only certain fields were locked for update:

image

Update the page to lock certain fields rather than remove the Update button altogether.

Partition Strategy in Cache and Archive Tables

Data that is used for real-time processing is already logically separated from slower-moving data in the Archive, a partition of sorts. However, Jube greatly underutilises the partitioning capabilities of Postgres.

The task is to modify the Cache and Archive tables to have a partition hierarchy as Tenant Registry ID \ Default >>> Model Default >>>> CreatedDate \ Default.

Most ideally some functionality will exist in the system to be to create and prune partitions automatically. For example, the Cache may have one-day partitions up to a maximum of 7 days. The Archive may have monthly partitions up to a maximum of a year. In a similar manner to the index server, this partition management should exist in a thread inside a Jube instance.

Export Models

Hi, thank you for such a smart software) We just recently started experimenting with it and a question arose: Is it possible somehow to export/import a model with all its request XPaths and rules?

Break controllers out into a service layer

There is a decent data layer that implements the Linq2DB ORM. One of the big projects that starts in August is the migration of the user interface to Blazor and Radzen tooling, It would be preferable to invoke services in Blazor rather than have the overhead of serialisation and HTTP. For each controller move the code into a service and leave only the validation, authentication and HTTP status code logic. This ticket also has some use in the import and export functionality #6.

Include trace in processed payload when switch is passed

For response time trouble shooting the method is to enable INFO level logging in the application which writes out the verbose logging for the transaction. This has some production implications.

Create a switch in the HTTP or AMQP headers that will produce this same trace in the processed payload. This will facilitate the more rapid tracking of response time problems for a given transaction..

Move Jobs to Crystal Quartz

There are several jobs that exist in Jube, characterised as being very long running procedures launched periodically. Currently these jobs exist inside a perpetual loop based on long thread sleep. It works just fine however does not cluster all that well and complicates the tear down of the process.

It is desirable to move Jobs to Quartz.net, indeed Crystal Quartz.net to provide for a user interface into job execution.

Implement Crystal Quartz.net for the purpose of existing jobs. Implement in such a way that additional job binaries can be included to provide for some extensibility of the software without the need for external processes.

All values available in current Dependency Injection should be available, passed to, the Quartz.net context such that the new IJob implementations are a near drop in replacement.

Implement Redis Cache

PostgreSQL is used for the purpose of caching transaction data currently, and while it cannot be considered to be an in memory database, the shared buffers means that read performance, while slower than an in memory cache, it is not to an extent that materially affects response time when traded off against the durability guarantees provided by PostgreSQL. Read performance aside, in memory databases are extremely expensive to run, a nightmare to administer and demand a degree of duplication - in Jube at least - given its key value pair access paths (while PostgreSQL queries are indexed on multiple keys, these keys would instead be duplicated, with transaction history being stored in its value HSET).

There is no contest in writes, and Jube response times are severely impacted. For example, reading Abstraction keys overall might take 3ms to 6ms, writing might be 17 ms, which is hard to defend in a real time system. Currently writes to the cache are made in the transaction flow, which is important, as serialisation across requests is required. Ideally all writes would be moved out to background threads performing bulk inserts, but this would not provide for the serialisation guarantees from transaction to transaction (consider a velocity rule made up of counts on the PostgreSQL cache). Turning asynchronous commit on allows some relief, but without moving to UNLOGGED tables (which attract their own problems) it still does not come close to write performance of desirable.

Redis will be implemented as an in memory database as follows.

In respect to the Abstraction cache:

  • In the event that cache is enabled, no comparable inserts will be made to the PostgreSQL cache tables. Instead the value will be key set in Redis on the basis of the search key and values (e.g. IP:123.456.789) and HSET of a MessagePack serialisation of the payload dictionary, for each transaction. In this respect the key serves to index, where the transactions are covered in the HSET values.
  • On each transaction, using async methods, a request will be made to Redis on Abstraction key (e.g. IP:123.456.789).
  • A Time To Live (TTL) definition will be created to accompany the specification of the search key. Given the TTL definition, the expiry of the key will be extended out on each transaction (otherwise the key will be allowed to expire), removing all HSET values. There is no member expire supported in Redis at this time, which means that data will not expire until there are no further transactions on that key. It follows that in the real-time flow there should be some online pruning of the values and \ or;
  • A background job that serves to prune the expired HSET values also.

In respect to TTL Counter:

  • The transactional incrementing of TTL Counters will be done in a Redis write.
  • The writing of the TTL Counter entry, which is used to decrement TTL Counters in a background process will be written to an in memory asynchronous queue for bulk insert, which will be done in a separate thread across all models and all tenants (as above). It follows hat PostgreSQL will continue in use for winding back TTL Counters.
  • The background process will update the Redis cache instead of the same table in PostgreSQL (this is to say not duplicated). The durability guarantees provided by Redis cache of the AOF log \ rewriting will ensure that the Redis cache is unlikely to need to be reconstituted, and the risk of 1 second of incremental counter loss can be conveyed as a risk.

In respect to cached Abstraction values:

  • The background process responsible for calculating counters will also write the values to Redis based on the Abstraction key with the aggregations being stored in the HREF.
  • The transactional process will instead read from the Redis cache rather than the equivalent in PostgreSQL.
  • There are no proposals to deprecate the writing of aggregations to PostgreSQL as this is useful for tracing the calculations, which is a complex process and benefits from the verbose trace.
  • Same durability considerations in respect to guarantees provided by Redis cache of the AOF log \ rewriting.

The functionality will be optional and in the absence of a Redis cache being available, existing functionality will prevail.

Connection strings to Redis should be contained at the application level and fully support multiple endpoints such that FAILOVER can be invoked to resize Redis instances.

Instruct Stop Training of a Neural Network in Exhaustive Adaptation

There is currently no means to instruct the stop of an Exhaustive Adaptation training process. Include a button on the Exhaustive

Adaptation training page that will set a flag in the instance to Stop, which will be checked for each new topology exploration. At this stage, it is not proposed to send termination instructions to the thread, as in production this will more than likely be instantiated in a dedicated thread, for a dedicated training instance.

Text box to capture notification body text too small

The text box to capture notifications throughout the system is far too small:

image

Style this properly to accept rich text or html style notifications.

Some thought should be given to the dispatch of the body, to ensure that it is HTML.

Prune Accord.Net Libraries

It is hard to imagine finding time to work on this ticket, however, creating anyway.

Recently a project was concluded to upgrade from .Net 6 to .Net 8. This was a big project as some the machine learning libraries that were in use were archived. The archived code contained uses of the BinarySeralizer that was unsafe and would not build after .Net 6. The Accord.Net libraries, being written in C#, were brought into the solution and built through to .Net 8, removing all of the references to BinarySeralizer and any other obsolete code. Serialisation of Neural Networks was further complicated by there being no drop in replacement to BinarySeralizer and modifications needed to be made to make it work with Newtonsoft json serialization.

The Accord.Net libraries are massive, and the Jube use is highly partial. At some point this library code needs to be pruned to remove any methods that are not in use to reduce the ongoing maintenance cost. Mostly obsolete methods are being removed and their use swapped with not supported exceptions, and this does not appear to have caused any breaking changes in Jube.

This ticket is to examine the use of Accord.Net in Jube and remove any code that is not used, then set about refactoring code that is in use. The code is not all that bad, as it build and works under .Net 8, hence this ticket is not an immediate priority.

Upgrade to .Net 8

The software is currently written in .Net 6. It should be trivial to upgrade this to .Net 7, as part of a general Nuget Package upgrade.

There are known issues relating to the use of the BinarySerializer for saving Neural Network Models in the database, as BinarySerlizer is deprecated in .Net 7. A separate research and implementation ticket is open to replace the use of the BinarySerializer.

Upgrade LINQ2DB to latest version

An observation from other projects worked on recently, the patterns and version of LINQ2DB are slightly old and would benefit from being upgraded to the very latest version. As part of this, explore the manner in which the LINQ2DB context is being instantiated.

This task if part of a wider effort to support .Net 7 and bring all Nuget packages current.

Improve Startup Time by having Assembly Hash Cache in the database too

Jube is written to support scalable cloud operations. It means that many small instances of Jube, perhaps containers, can be created to achieve scalability. A principle requirement of the strategy where containers are created to handle bursts dynamically is new instances of the software must load very quickly, in under a minute. A requirement that is coming up extensively is the support of cloud operations and dynamic scalability via Kubernetes (although the use of containers is less so, more so very lightweight VMs amounting to the same thing).

The software does load quickly, however, new instances will bring back all configurations from the database, for all models, and proceed to lay that out in the instance memory, a process that is very fast (not to mention unavoidable). The issue is that for each configuration that contacts rule code, this will compile to an assembly, and then be stored locally in the hash cache (dictionary of code hash to its assembly), so as not to duplicate the compilation of identical code.

The task is to refactor the hash cache to be part of the compilation class first, moving the compilation class to an instance in place of the hash cache. This compile class, which will now also include the hash cache, will also use a table in Postgres storing byte arrays and the code hash.

On a call to compile, as now, the hash cache will first be inspected for the key value combination (hashed code vs assembly), then in the absence of that, it will fall back to a table in Postgres for the same (noting that the initial Rosyln compilation is a byte array that should be trivial to store) and only in the event of unavailability will the code go on to be compiled to an assembly. It is of course case that newly compiled code be made available to the hash cache in both Postgres and the instance.

The approach will remove the need for code to be recompiled as new instances are created in the cluster, which should improve the startup time, making models available soon after instantiation.

It would also be advantageous to include more compile-time data and errors for the purpose of monitoring and production support. At the moment the compile errors only appear in logs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.