Code Monkey home page Code Monkey logo

kg-llm-interface's People

Contributors

cmdoret avatar daniilzhyrov avatar rmfranken avatar supermaxiste avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

datenmischwerk

kg-llm-interface's Issues

KG-LLM: Add SPARQL query generation

Currently the KG-LLM module only injects semantically related triples into the prompt via chromaDB. To ask questions about the data, we need to generate SPARQL queries that can be executed against the triplestore.

Objective: Add SPARQL generation capability.

Requirements:

  • Use existing injection method to retrieve relevant portions of the ontology / schema to be used in the query.
    • This is because the ontology will often be too large to fit in the prompt.
  • Generate SPARQL queries
  • Improve using few CoT and/or few shots if needed
  • Run against triple store
  • Return results

KG-LLM: Add SPARQL query generation

Add the ability to generate and execute SPARQL queries.

Requirements:

  • Tune SPARQL config to allow 2 graphs: ontology and instances
  • Only embed ontology graph into ChromaDB
  • LLM for query generation with ontology chunks from chromaDB
  • Query execution via SPARQLWrapper

Allow remote LLM

Currently, aikg.config.chat does not support connecting to a remote (e.g. openai) API.
We should modify the config to support this, with the goal of connecting to an independent LLM service, or ChatGPT.

Update prefect version

Currently we're using aynio 3.7.1 as a temporary fix for the version of prefect we're currently using (2.10.6). See this issue from the prefect repo.
To prevent issues stemming from deprecated functions and vulnerabilities, we should use at least prefect 2.12.0 which is compatible with the latest version of aynio. We should also check for conflicts with other dependencies.

[k8s] setup openllm service

Instead of deploying an LLM inside the kg-llm service (gateway server), we should use a dedicated openllm service.

Current setup:

Note: diamond shape means "needs GPU"

flowchart TD
    L[llmchat] <--> S
    S{kg-llm} <--> G[graphdb]
    S <--> C[chroma]

Desired setup:

flowchart TD
    L[llmchat] <--> S
    S[kg-llm] <--> G[graphdb]
    S <--> C[chroma]
    S <--> E{openllm}

allow multiple chroma collections

We use a chroma collection (currently named test, should be named schema) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.

Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.

Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.

Objective: support multi-collecthion chroma via ensemble or merger retriever.

Requirements:

  • Update chroma_build flow to take multiple input files (?) and create 1 chroma collection per input file
  • Update chroma config to take a list of collection names, instead of a single one
    • optionally a weight / top k associated with each collection
  • Update generation functions to use langchain's ensemble/merger retriever

[k8s] use init container to preload graphdb+chromadb

We currently use k8s jobs to load the schema into chromadb and schema+data into graphdb. This is causing issues where the job keeps failing while the services are starting.

We should use an init container for this instead.

KG-LLM: Provide SPARQL endpoint service

The docker-compose setup added in #3 only packages the chat-server and chromaDB. A complete backend, also needs a SPARQL endpoint from which to feed the ChromaDB. To make the services "production-ready" we are converting the docker-compose setup to kubernetes manifests managed with Kustomize.

Objective: Add Apache fuseki service to the deployment config

Requirements:

  • Identify a triple-store that is easy to containerize and deploy (Most likely GraphDB-free or Fuseki)
    • Fuseki has a more permissive license, is open source and recommended by Zazuko. It is also used by the Renku team which maintains a docker image and helm chart for it in renku-jena
  • Add manifests for jena-fuseki
  • Ensure sparql endpoint is accessible from chat-server
  • Ensure a repository is accessible (add boostrapping script if needed)

Resources:

KG-LLM: Streamline data ingestion

Currently, ChromaDB can be populated from an RDF file or SPARQL endpoint using chroma_build.py. In practice populating Chroma from RDF files directly is impractical and extremely slow. This should be split into 2 separate steps:

  1. RDF files -> SPARQL endpoint (trivial)
  2. SPARQL endpoint -> ChromaDB

Requirements:

  • Drop support for RDF -> Chroma
  • Simplify Chroma config
  • Helper script to load RDF files into SPARQL endpoint

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.