Code Monkey home page Code Monkey logo

Comments (2)

pieroit avatar pieroit commented on June 9, 2024 1

Currently, all documents uploaded into the rabbit hole are utilized to generate a user's response. However, in certain situations, there's a need to prioritize specific documents, especially when working with a large number of documents. This is where metadata can play a crucial role in facilitating this process.

I propose two necessary actions:

  • Introduce the capability to set metadata during the document upload API process;

Can be done via before_rabbithole_stores_documents or before_rabbithole_insert_memory hooks, not yet directly via API, have a look here:

def store_documents(self, stray, docs: List[Document], source: str) -> None:

The difference is, in the first hook you access all the chunks and can change them (included metata), summarize them, delete them.
In the second you deal with a single chunk.
In those chunk metadata you should use the same key you will use recall side (see below)

  • Introduce the capability to apply filters when a user sends a message.

You can do it via hook, using before_cat_recalls_<collection_name>_memories.
You can see how it is used in Cat Advanced Tools plugin

# hooks to change recall configs for each memory

In the hook you should be able to add "metadata": {....} to the recall config and that stuff should filter the retrieval, because it ends up here:

query_filter=self._qdrant_filter_from_dict(metadata),

If necessary the metadata you pass can be obtained via websocket, accessible via cat.user_message_json.xxx, or as an output of an entity extraction chain (that you can also place in the hook).

Let me know if all clear!
Thanks

from core.

Fede91 avatar Fede91 commented on June 9, 2024

Everything is clear @pieroit , thank you. However, using a plugin to set metadata implies that I won't have the ability to customize values depending on the file I'm uploading. Correct?

For the project I'm working on, we're uploading various documents categorized by type. If I create a plugin to set the documents' metadata, I would need to change the plugin's settings with every upload. On the other hand, if I could set the metadata directly via API, the process would be much faster.

Regarding filtering, I believe I've got it. I had modified the core of Cheshire Cat to automatically handle filters based on user messages, but perhaps this can also be achieved through a plugin. I'll give it a try. Thank you!

from core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.