sdsc-ordes / kg-llm-interface Goto Github PK
View Code? Open in Web Editor NEWLangchain-powered natural language interface to knowledge-graphs.
License: Apache License 2.0
Langchain-powered natural language interface to knowledge-graphs.
License: Apache License 2.0
Currently the KG-LLM module only injects semantically related triples into the prompt via chromaDB. To ask questions about the data, we need to generate SPARQL queries that can be executed against the triplestore.
Objective: Add SPARQL generation capability.
Requirements:
Objective: Improve results of current KG-LLM query system using few shot prompts.
Add the ability to generate and execute SPARQL queries.
Requirements:
Currently, aikg.config.chat
does not support connecting to a remote (e.g. openai) API.
We should modify the config to support this, with the goal of connecting to an independent LLM service, or ChatGPT.
Currently we're using aynio 3.7.1
as a temporary fix for the version of prefect
we're currently using (2.10.6
). See this issue from the prefect repo.
To prevent issues stemming from deprecated functions and vulnerabilities, we should use at least prefect 2.12.0
which is compatible with the latest version of aynio
. We should also check for conflicts with other dependencies.
Instead of deploying an LLM inside the kg-llm service (gateway server), we should use a dedicated openllm service.
Current setup:
Note: diamond shape means "needs GPU"
flowchart TD
L[llmchat] <--> S
S{kg-llm} <--> G[graphdb]
S <--> C[chroma]
Desired setup:
flowchart TD
L[llmchat] <--> S
S[kg-llm] <--> G[graphdb]
S <--> C[chroma]
S <--> E{openllm}
We use a chroma collection (currently named test
, should be named schema
) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.
Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.
Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.
Objective: support multi-collecthion chroma via ensemble or merger retriever.
Requirements:
We currently use k8s jobs to load the schema into chromadb and schema+data into graphdb. This is causing issues where the job keeps failing while the services are starting.
We should use an init container for this instead.
The docker-compose setup added in #3 only packages the chat-server and chromaDB. A complete backend, also needs a SPARQL endpoint from which to feed the ChromaDB. To make the services "production-ready" we are converting the docker-compose setup to kubernetes manifests managed with Kustomize.
Objective: Add Apache fuseki service to the deployment config
Requirements:
Resources:
Currently, ChromaDB can be populated from an RDF file or SPARQL endpoint using chroma_build.py. In practice populating Chroma from RDF files directly is impractical and extremely slow. This should be split into 2 separate steps:
Requirements:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.