Code Monkey home page Code Monkey logo

Comments (5)

pieroit avatar pieroit commented on May 18, 2024 1

@calebgcc I introduced in the rabbit_hole a TextSplitter that can be customized (chunk_size and chunk_overlap). So Cat users can decide themselves how long they want their text chunks.

You find in docs here a list of langchain Documents (which is just an object with text and metadata) to experiment with file summarization.

from core.

pieroit avatar pieroit commented on May 18, 2024 1

Increasing k is a good test, also if somebody uploads a doc and chooses a large chunk size the problem remains.

There should be a check before inserting memories in the prompt, if they are "too long" they should be summarized.

We can postpone the problem and close this issue as we are mostly covered, or if you feel like it also tackle the above.

Thanks 🙏

from core.

calebgcc avatar calebgcc commented on May 18, 2024

I closed issue #49 since this issue is more specific, we can use this issue to discuss how to further implement summarization 🙌.

About your comment in PR #52:

I can test other chain_type to see if I get the same problem with large files.

I'm going to dig a little deeper into the docs that you left (about llama-index) to understand better how to implement the custom summary chain, but if I understand correctly the basic idea is:

  • get a list of strings in input
  • group them in different docs
  • get summary from docs (which becomes new input)
  • repeat until we have one single short summary

from core.

pieroit avatar pieroit commented on May 18, 2024

PR #68 merged and now file uploads do summarization.
Next step is do summaries when the list of memories recalled here makes the prompt too large.
Leaving this issue open

from core.

calebgcc avatar calebgcc commented on May 18, 2024

@pieroit I was trying to trigger this error, but I think summarizing and chunking the documents solved it.

The documents that are retrieved from the cat are often too small to cause problems, and this adds up to the fact that k is by default 5.

Maybe prompt summarization is no longer necessary, let me know how to proceed, for example we can try increasing the value of k to see how it affects the prompt.

from core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.