sdsc-ordes / kg-llm-interface Goto Github PK

Langchain-powered natural language interface to knowledge-graphs.

License: Apache License 2.0

Python 42.52% Makefile 1.44% Jupyter Notebook 53.92% Dockerfile 0.90% Shell 1.22%

k8s knowledge-graph llm question-answering rest-api

kg-llm-interface's Issues

KG-LLM: Add SPARQL query generation

Currently the KG-LLM module only injects semantically related triples into the prompt via chromaDB. To ask questions about the data, we need to generate SPARQL queries that can be executed against the triplestore.

Objective: Add SPARQL generation capability.

Requirements:

Use existing injection method to retrieve relevant portions of the ontology / schema to be used in the query.
- This is because the ontology will often be too large to fit in the prompt.
Generate SPARQL queries
Improve using few CoT and/or few shots if needed
Run against triple store
Return results

KG-LLM: Few shot learning

Objective: Improve results of current KG-LLM query system using few shot prompts.

KG-LLM: Add SPARQL query generation

Add the ability to generate and execute SPARQL queries.

Requirements:

Tune SPARQL config to allow 2 graphs: ontology and instances
Only embed ontology graph into ChromaDB
LLM for query generation with ontology chunks from chromaDB
Query execution via SPARQLWrapper

Allow remote LLM

Currently, aikg.config.chat does not support connecting to a remote (e.g. openai) API.
We should modify the config to support this, with the goal of connecting to an independent LLM service, or ChatGPT.

Update prefect version

Currently we're using aynio 3.7.1 as a temporary fix for the version of prefect we're currently using (2.10.6). See this issue from the prefect repo.
To prevent issues stemming from deprecated functions and vulnerabilities, we should use at least prefect 2.12.0 which is compatible with the latest version of aynio. We should also check for conflicts with other dependencies.

[k8s] setup openllm service

Instead of deploying an LLM inside the kg-llm service (gateway server), we should use a dedicated openllm service.

Current setup:

Note: diamond shape means "needs GPU"

flowchart TD
    L[llmchat] <--> S
    S{kg-llm} <--> G[graphdb]
    S <--> C[chroma]

Desired setup:

flowchart TD
    L[llmchat] <--> S
    S[kg-llm] <--> G[graphdb]
    S <--> C[chroma]
    S <--> E{openllm}

allow multiple chroma collections

We use a chroma collection (currently named test, should be named schema) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.

Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.

Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.

Objective: support multi-collecthion chroma via ensemble or merger retriever.

Requirements:

Update chroma_build flow to take multiple input files (?) and create 1 chroma collection per input file
Update chroma config to take a list of collection names, instead of a single one
- optionally a weight / top k associated with each collection
Update generation functions to use langchain's ensemble/merger retriever

[k8s] use init container to preload graphdb+chromadb

We currently use k8s jobs to load the schema into chromadb and schema+data into graphdb. This is causing issues where the job keeps failing while the services are starting.

We should use an init container for this instead.

KG-LLM: Provide SPARQL endpoint service

The docker-compose setup added in #3 only packages the chat-server and chromaDB. A complete backend, also needs a SPARQL endpoint from which to feed the ChromaDB. To make the services "production-ready" we are converting the docker-compose setup to kubernetes manifests managed with Kustomize.

Objective: Add Apache fuseki service to the deployment config

Requirements:

Identify a triple-store that is easy to containerize and deploy (Most likely GraphDB-free or Fuseki)
- Fuseki has a more permissive license, is open source and recommended by Zazuko. It is also used by the Renku team which maintains a docker image and helm chart for it in renku-jena
Add manifests for jena-fuseki
Ensure sparql endpoint is accessible from chat-server
Ensure a repository is accessible (add boostrapping script if needed)

Resources:

https://jena.apache.org/documentation/fuseki2/fuseki-docker.html

KG-LLM: Streamline data ingestion

Currently, ChromaDB can be populated from an RDF file or SPARQL endpoint using chroma_build.py. In practice populating Chroma from RDF files directly is impractical and extremely slow. This should be split into 2 separate steps:

RDF files -> SPARQL endpoint (trivial)
SPARQL endpoint -> ChromaDB

Requirements:

Drop support for RDF -> Chroma
Simplify Chroma config
Helper script to load RDF files into SPARQL endpoint

sdsc-ordes / kg-llm-interface Goto Github PK

kg-llm-interface's People

Contributors

Stargazers

Watchers

Forkers

kg-llm-interface's Issues

KG-LLM: Add SPARQL query generation

KG-LLM: Few shot learning

KG-LLM: Add SPARQL query generation

Allow remote LLM

Update prefect version

[k8s] setup openllm service

allow multiple chroma collections

[k8s] use init container to preload graphdb+chromadb

KG-LLM: Provide SPARQL endpoint service

KG-LLM: Streamline data ingestion

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent