Code Monkey home page Code Monkey logo

whatd-i-miss's Introduction

Anthropic AI Hackathon 2023

Project for 2023 Anthropic AI Hackathon from lablab: https://lablab.ai/event/anthropic-ai-hackathon

Demo

Demo is live! https://wim.victorsothervector.com/

Though the supplied Antrhopic API Key can only handle one request at a time. Consider supplying your own

Setup

Current version should be able to be run out of the box (get's precomputed parts via GitHub's releases). Should be simple enough to follow these directions:

  1. Clone Repo: git clone [email protected]:MrGeislinger/anthropic-ai-hackathon-2023.git
  2. (Optional) Create an environment (like conda: `conda create --name wim python=3.11)
  • Note I used python 3.11 other recent versions might work but can't guarantee...
  1. Install requirements: pip install -r requirements.txt
  2. Run app: streamlit run app.py
  3. Enjoy! (localhost:8501)

Loading Config Example

For a simple example data config file, see config.json from v0.1.1. Simply download file to workspace to use in deployed Streamlit app.

Creating Your Own transcripts from Audio

Associated repo to create transcripts for this tool from a set of videos on YouTube playlist: https://github.com/MrGeislinger/whisper-extract Repo will be updated periodically indpendent from this project.

I suggest using the small model first since larger models can take a significant amount of time.

whatd-i-miss's People

Contributors

mrgeislinger avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

whatd-i-miss's Issues

Advanced UI

More options in the UI user to modify how the model will process the data. This can help adjust the tool to the situation.

  • More sentence ingestion from transcript (filter out fewer sentences)
  • Modify the "buffer" surrounding the similar sentences chosen
  • Model selection for generation
  • Select file for config (data/transcripts to be used)
  • Verification
    • Estimated number of tokens to be used (within limits of chosen model?)
  • User supplied API key

Ideally these options could be determined automatically and/or globally. However, this might give more flexibility and improve the user experience.

LLM output in computer readable format (JSON)

LLM should output a format that can be processed easily (such as JSON, YAML, etc.). This way the output of the LLM can be used directly by a script to include extra details such as time stamps to the video.

This will likely creating prompts that can be more reliably produce JSON of the actual results. However, fine-tuned LLMs are notorious in actually giving pure JSON responses without careful prompting .

Taking a subset of transcript(s)

As mentioned in #3 multiple transcripts could prove to be too large for the context window. Without addressing this, the model will not be able to take in the full context needed to properly answer user provided question.

Methods could include

  • Brute-force cut-off of long inputs (keeping the same prompt but cutting off the transcripts piece)
  • #7
  • Feed transcripts iteratively to LLM

Update Anthropic Client

Anthropic changed their python sdk - making this code line outdated.

client = anthropic.Client(api_key=api_key)


Would love to know if this might help - https://github.com/BerriAI/litellm

~Simple I/O library, that standardizes all the llm api calls to the OpenAI call

from litellm import completion

## set ENV variables
# ENV variables can be set in .env file, too. Example in .env.example
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["ANTHROPIC_API_KEY"] = "anthropic key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# anthropic call
response = completion("claude-v-2", messages)

Use more tags around sections & episodes/transcripts

Using a proven prompting method to help the LLM know that sections belong to different episodes/transcripts and that there are different sections of the transcript used in t he prompt.

Unknown if this will change the quality of the output and to what extent and under what circumstances.

Extract timestamp from "evidence" output

Ultimately want to use the "evidence" for each "key point" generated by the model to be used to create a timestamp from the transcript.

The most brute force solution is check each sentence of the transcript to see if the evidence sentence is contained.

However, sometimes the model will slightly alter the quoted evidence. Thus something like a similarity metric could be used to compare the sentences. This can help find modified quotes and reduce "hallucinated" quotes.

Select specific/multiple episodes/transcripts for context

Could be useful for user to select relevant transcript to ask questions on.

Context window can be very large with Anthropic's claude-v1-100k model (100,000 tokens as context), so more than one podcast could be given.

However, there would have to be some sort of limiting criteria since choosing something like "ALL" transcripts could end up being larger than the fairly large context.

Cache/store sentence embeddings for transcript

Processing multiple transcripts to get the sentence embeddings can take quite some time so speed up for multiple runs on the same transcripts can be beneficial to user experience.

The Streamlit app already uses a local cache but something more persistent would be ideal.

These are potential steps to help with multiple runs:

  • App uses local cache (considers the batch of transcripts as identifier to cache)
  • App uses local cache per transcript loaded
    • Don't repeat whole process when loading just additional transcript
  • Store embeddings as local file
    • Ensures that something like get_embeddings gets the embeddings
  • Use vector database
    • Could be deployed but also used locally for testing
    • Can help with filtering sentences by similarity (though that has yet been a bottleneck)
    • Could be leveraged for other uses (e.g. similar questions already asked)

Order of multiple transcripts to context

Multiple transcripts from the same series can have time dependence in answering the user provided question. Transcripts should be fed into context in order and/or identify the order within the context.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.