Code Monkey home page Code Monkey logo

rag-resources's Introduction

RAG Resources (work in progress)

A collection of curated RAG (Retrieval Augmented Generation) resources.

What is RAG?

Retrieval Augmented Generation or RAG for short is the process of having a Large Language Model (LLM) generate text based on a given context.

  • Retrieval = Find relevant data (texts, images, etc) for a given query.
  • Augmented = Add the retrieved relevant data as context information for the query.
  • Generation = Generate responses based on the query and retrieved information.

The goal of RAG is to reduce hallucinations in LLMs (as in, prevent them from making up information that looks right but isn't).

Think of RAG as a tool to improve your calculator for words.

The workflow looks like this:

Query (e.g. a question) -> Find relevant resources to query (retrieval) -> Add relevant resources to query (augment) -> LLM creates a response to the query based on the context (generate).

The following list of resources is my own collection I've been saving in notes.

If I find something valuable, I'll add it here too (feel free to add a pull request with your own).

A focus will be on papers and blog posts with code attached.

Papers

  • May 2020 | [Original RAG paper] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | Paper
  • Dec 2022 | Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE - Hypothetical Document Embedding) | Paper
  • Oct 2023 | SELF-RAG: Learning to Retrieve, Generate and Critique through Self-reflection | Paper, GitHub

Guides and Recipes

Tutorials

Blog posts

10 second walkthrough

The following is a basic RAG workflow.

It can be implemented in ~100-200 lines of Python code (maybe less).

  1. Get a corpus of target data (e.g. pages of Wikipedia, transcripts of your favourite podcaster, your own notes lost in the void)
  2. Split target data into chunks (e.g. groups of 10 sentences, paragraphs, overlapping or not)
  3. Embed your chunks into vector space (e.g. text -> model from Hugging Face -> vectors)
  4. Store your embedded chunks in memory (e.g. this could be a simple np.array or a Python dictionary or a vector database)
  5. Make a query (e.g. "How many minutes a day should I spend in a cold plunge?")
  6. Embed the query using the same model from step 3 (crucial)
  7. Search across your embedded chunks from step 4 for items that are similar or related to your query (retrieval), to measure similarity between two vectors, use dot product or cosine distance (what you're doing here is "semantic similarity")
  8. Optional: Use the top-n (e.g. 3-5 or more if your LLM context window allows) results from step 7 to answer the question directly
  9. Use the top-n (e.g. 3-5 or more if your LLM context window allows) results from step 7 as input to your LLM prompt (augment) to help it generate an answer
  10. Hopefully reduce hallucinations in the LLM response thanks to the added context from the retrieved chunks

FAQ

RAG sounds like a hack, is it?

Yes, it's a nice hack that works.

Is RAG just another form of prompt engineering?

Yes.

Which embedding model should I use?

Start with a model somewhere near the top of Hugging Face MTEB (Massive Text Embedding) leaderboard, sentence-transformers is a great library too.

Do I need a vector database?

Under 100,000 chunks?

Probably not (use np.array instead).

100,000+ chunks?

See how you go without one first, calculating np.dot over 1,000s of embeddings is quite fast.

Which LLM should I use?

Generally, the biggest one you can afford will give the best results.

Log

  • 1 Nov 2023 - Start repo to collect my own resources.

rag-resources's People

Contributors

mrdbourke avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.