opendatahub-io-contrib / genai-llm-rag-pattern Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://opendatahub-io-contrib.github.io/genai-llm-rag-pattern/
License: Apache License 2.0
Home Page: https://opendatahub-io-contrib.github.io/genai-llm-rag-pattern/
License: Apache License 2.0
Have reference code for deploying an instance and credential configuration.
Motivation:
DoD:
GPU utilisation is important for admins:
Dashboard should include:
Primary utilisation metric is GPU memory
Secondary is FLOPs / GPU processor.
Clusters which are lifecycle managed via OCM such as ROSA, ARO and OCD uses machine pools and autoscaling managed from the OCM console.
instascale does allow the use of a pool managed from OCM.
This requires a workflow to setup the pool. (This may be partially done by hand initially; potentially in the future via an ansible job or similar).
As an AI developer, I want a simple process to pull model X from huggingface and convert it to Caikit format.
Most likely by a data pipeline.
Definition of Done:
As a model developer, i want to be able to finetune separate specialised AI Agents for specific tasks, and orchestrate them.
Investigate https://flowiseai.com tooling or similar to support orchestration of open source LLMs
Today many of our demos use MCAD - but are blocking e.g. a user needs to keep their workbench running, and needs to progress the steps to shut down the ray cluster / MCAD job.
Provide a workflow where the MCAD batch job (e.g. fine tuning) is done asynchronously.
Document how to use the artifacts (e.g. jupyter notebooks) both inline and in docs website.
In a production context, it is important to have an evaluation framework included as part of the CI/CD pipeline to allow continuous evaluation.
Evaluate this or similar: https://github.com/explodinggradients/ragas
AV to provide which EC2 to provide:
Bryon deploy ROSA instance
OpenLLMetry is a set of extensions built on top of OpenTelemetry that gives complete observability over your LLM application.
We should integrate our stack with OpenTelemetry Collector, instrumenting the LLM Providers / VectorDB.
Update as appropriate to ubi8 or ubi9 based node20 e.g.:
https://catalog.redhat.com/software/containers/ubi8/nodejs-20/647466671b1440a9c7cd4704
My understanding is traceloop is a SaaS platform. Today the default deployment is not configured and I believe should be a no-op. TO BE TESTED.
Either:
Presumes 'replacement' based batch ingestion.
@tnscorcoran to populate
Create reference recipe for deploying an instance and credential configuration.
DoD:
Historical repo was here:
https://github.com/tnscorcoran/rhoai-llm-demos-gitops
migrate content for infrastructure layer across
Today the PR review stage is not mandatory. Require the PR pipeline to pass before merging.
Config manages usecase defined by backend code (app/engines/loaders)
Open question as to whether S2I or Tekton can be used to build images
Overall footprint:
V100 minimum spec for non-TGIS
Flash attention minimum e.g. required for MoE Models
Implications of flash attention not being present.
Use drawio please
Potentially with a hardware partner who has access to a Lab who would be able to use it.
Currently minio is (mostly) deploying.
Automate the configuration of Minio including a default set of access credentials (configured from the validated patterns secrets file).
DoD
Today build is done inside of the container... do upstream and copy in required artifacts.
Preconfigure a data science project or projects* including:
Open questions:
The repository root README.md file has had a number of violations of both the superlinter and mdformat.
This is a problem as all "README.md" files had to be exlcluded in order for running superlinter to be consistent.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.