Code Monkey home page Code Monkey logo

gptflix's Introduction

GPTflix source code for deployment on Streamlit

What are we going to build?

This is the source code of www.gptflix.ai

We will build a GPTflix QA bot with OpenAI, Pinecone DB and Streamlit. You will learn how to prepare text to send to an embedding model. You will capture the embeddings and text returned from the model for upload to Pinecone DB. Afterwards you will setup a Pinecone DB index and upload the OpenAI embeddings to the DB for the bot to search over the embeddings.

Finally, we will setup a QA bot frontend chat app with Streamlit. When the user asks the bot a question, the bot will search over the movie text in your Pinecone DB. It will answer your question about a movie based on text from the DB.


What is the point?

This is meant as a basic scaffolding to build your own knowledge-retrieval systems, it's super basic for now!

This repo contains the GPTflix source code and a Streamlit deployment guide.


Setup prerequisites

This repo is set up for deployment on Streamlit, you will want to set your environment variables in streamlit like this:

  1. Fork the GPTflix repo to your GitHub account.

  2. Set up an account on Pinecone.io

  3. Set up an account on Streamlit cloud

  4. Create a new app on Streamlit. Link it to your fork of the repo on Github then point the app to /chat/main.py as the main executable.

  5. Go to your app settings, and navigate to Secrets. Set up the secret like this:

[API_KEYS]
pinecone = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx"
openai = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  1. Make a .env file in the the root of the project with your OpenAI API Key on your local machine.
PINECONE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Those need to be your pinecone and openai API keys of course ;)


How to add data?

This repo is set up to walk through a demo using the MPST data in /data_samples These are the steps:

  1. Run p1.generate_index_mpst.py to prepare the text from./data_sample/d0.mpst_1k_raw.csv into a format we can inject into a model and get its embedding.
    python p1.generate_index_mpst.py
  1. Run p2.make_jsonl_for_requests_mpst.py to convert your new d1.mpst_1k_converted.csv file to a jsonl file with instructions to run the embeddings requests against the OpenAI API.
    python p2.make_jsonl_for_requests_mpst.py
  1. Run p3.api_request_parallel_processor.py on the JSONL file from (2) to get embeddings.
python src/p3.api_request_parallel_processor.py \
  --requests_filepath data_sample/d2.embeddings_maker.jsonl \
  --save_filepath data_sample/d3.embeddings_maker_results.jsonl \
  --request_url https://api.openai.com/v1/embeddings \
  --max_requests_per_minute 1500 \
  --max_tokens_per_minute 6250000 \
  --token_encoding_name cl100k_base \
  --max_attempts 5 \
  --logging_level 20
  1. Run p4.convert_jsonl_with_embeddings_to_csv.py with the new jsonl file to make a pretty CSV with the text and embeddings. This is cosmetic and a bit of a waste of time in the process, feel free to clean it up.. -> actually that's not quite true: you don't care about making the CSV because you don't need to care about the index of the embeddings if you are only going to upload data to the index once, if you are going to be updating the indexing and adding more data, or need an offline / readable format to keep track of things then making the CSV kinda makes sense :)
    python p4.convert_jsonl_with_embeddings_to_csv.py
  1. Run p5.upload_to_pinecone.py with your api key and database settings to upload all that text data and embeddings.
    python p5.upload_to_pinecone.py

You can run the app locally but you'll need to remove the images (the paths are different on streamlit cloud)


What is included?

At the moment there is some data in sample_data, all taken from Kaggle as examples.


To do

[] Add memory: summarize previous questions / answers and prepend to prompt
[] Add different modes: wider search in database
[] Add different modes: AI tones / characters for responses
[] Better docs

BETTER DOCS COMING SOON! Feel free to contribute them :)

#LICENSE

MIT License

Copyright (c) 2023 Stephan Sturges

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

gptflix's People

Contributors

kylemcmearty avatar stephansturges avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gptflix's Issues

error while creating app

got the following error while i created the app. just to be clear, in the secrets file i wrote down the api key for openai and pinecone as specified, but nowhere in the readme did it tell where to put the environment and the index name for pinecone index, please reply on how to fix, am i missing something? where can i put index name and pinecone environment

pinecone.core.exceptions.PineconeProtocolError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.dict)
File "/app/gptbot/chat/main.py", line 271, in
output = answer_query_with_context_pinecone(user_input)
File "/app/gptbot/chat/main.py", line 222, in answer_query_with_context_pinecone
prompt = construct_prompt_pinecone(query) + "\n\n Q: " + query + "\n A:"
File "/app/gptbot/chat/main.py", line 140, in construct_prompt_pinecone
res = pineconeindex.query([xq], top_k=30, include_metadata=True, namespace="movies")
File "/home/appuser/venv/lib/python3.9/site-packages/pinecone/core/utils/error_handling.py", line 25, in inner_func
raise PineconeProtocolError(f'Failed to connect; did you specify the correct index name?') f

Avatar Modifications

This app is so awesome. The code is super clean and it's really easy to use. I thought I might put some information in here from things I've learned along the way. If this isn't appropriate just let me know (I'm extremely new to all this.)

I wanted to have a little more control over the avatars that appear on the chatbot. So, in here:
https://github.com/stephansturges/GPTflix/blob/main/chat/main.py

I updated the random integer code a bit like this:

# random user picture
# user_av = random.randint(0, 100)
user_av = 3

# random bott picture
# bott_av = random.randint(0, 100)
bott_av = 10

And then at the very bottom of main.py I can change the avatar_style here:

if st.session_state['generated']:
    for i in range(len(st.session_state['generated'])-1, -1, -1):
        message(st.session_state["generated"][i],seed=bott_av , key=str(i))
        message(st.session_state['past'][i], is_user=True,avatar_style="personas",seed=user_av, key=str(i) + '_user')

Honestly, I am still not exactly sure how the avatars work. I think it has something to do with these:
https://www.dicebear.com/styles

If I get to the bottom of it further I'll let you know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.