Code Monkey home page Code Monkey logo

databee's Introduction

GPUtopia Databee

Python server for ingesting & parsing documents and other data magic

Running locally

  • poetry install
  • poetry shell
  • poetry run uvicorn databee:app --reload --port 8001

Running tests

  • poetry run pytest

databee's People

Stargazers

 avatar  avatar Christopher David avatar

Watchers

 avatar Christopher David avatar  avatar

databee's Issues

Add L402 support

We want to add L402 support to any API endpoint so we can start moving beyond the "account and balance" model to just paying for services you need with bitcoin, no questions asked

We will run a databee and anyone else can run their own databee, specifying what Lightning address they should receive payments at

Probably worth adapting https://github.com/Kodylow/matador

Extract topics from documents

Given a text document (PDF, Markdown, etc.) of any length, we need to extract a list of relevant topics, perhaps 5-20 or more.

This is to help us generate question/answer keypairs programmatically from a document which can be used for finetuning a model.

Once we have a list of topics, we can easily query the entire document for each topic for enough context to generate relevant question/answer keypairs via an LLM.

The question is, what is the best way to generate the list of topics?

Conceivably this is what topic modeling is for, but our current implementation using Gensim and LDA gives bad results. We can explore third-party services like this but unsure topic modeling is even the right approach, and prefer to minimize dependence on third parties.

Maybe we should generate topics another way like using semantic search via a service like Vectara or an LLM API like Claude that works with long documents.

Suggestions welcome!

Bounty: 1M sats to whomever describes an algorithm we end up using for this, or 2M sats if it's fully implemented with a PR we can easily merge.

Integrate RAG via Arguflow

As discussed, let's get function(s) added to this databee repo enabling RAG over documents (PDF, Markdown, text).

You can use your discretion about what parts of Arguflow should be added in what order, whatever maximally supports the goals explained below.

Code can go within an arguflow.py file in the databee folder.

Additional background:

Our forthcoming agents UI enables users to create one or more agents, upload docs to their "brain", ask them questions about the docs, give them tasks based on the docs, and finetune models based on the docs.

Our implementation is based partly on the Agent Protocol, but adding support for multiple agents per server.

Here are the (tentative) data models and relationships:

  • User has many Agents
  • Agent has many Tasks
  • Agent belongs to a Brain
  • Brain belongs to many Agents
  • Brain has many Documents
  • Brain has many Memories
  • Tasks have Steps
  • Steps have input and output
  • Tasks and Steps have Artifacts

RAG will help to enable:

  • Users can ask questions to Agents about the Documents in their Brain
  • Users can assign Tasks to Agents who will consult their Documents
  • Agents can ask questions of other Agents
  • Additional reflection/planning like from the Generative Agents paper

Thorough integration of Arguflow to enable RAG in support of the above workflow will earn a bounty of 25M sats. (Less for an incomplete integration that we need to do further work before including.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.