This project enables you to chat with multiple PDF documents using a conversational AI model. The system extracts text from uploaded PDFs, creates a vector store of the text chunks, and utilizes a conversational chain model to provide interactive chat capabilities. It is built with Streamlit, a Python framework for building web applications.
- Clone the repository: git clone
markdown Copy code
- Install the required dependencies: pip install -r requirements.txt
javascript Copy code
- Set up environment variables:
- Create a
.env
file in the root directory. - Add the following variables to the
.env
file:ReplaceVARIABLE_NAME=VALUE
VARIABLE_NAME
with the actual variable name andVALUE
with the corresponding value.
- Run the script: streamlit run main.py
markdown Copy code
-
Open the provided URL in your web browser.
-
Upload PDF Documents:
- In the sidebar, use the file uploader to upload your PDFs.
- You can upload multiple PDFs by selecting them together.
- Ask Questions:
- In the main chat window, enter your question about the documents in the text input field.
- Press Enter or click outside the input field to submit the question.
- View Responses:
- The system will process your question and generate a response.
- The conversation history will be displayed in the chat window.
- User messages are displayed on the left, and bot responses are displayed on the right.
- Explore More Questions:
- You can ask additional questions by entering them in the text input field.
- The system will maintain the conversation history and provide relevant responses.
main.py
: This is the main script that handles the Streamlit UI and user interactions.langchain
: This directory contains the language chain implementation and related modules.htmlTemplates
: This directory contains HTML templates used for displaying chat messages.
- The
ConversationChain
class from thelangchain.chains
module is used to manage the conversational flow and retrieve relevant responses. - The
PyPDF2
library is utilized to extract text from the uploaded PDF documents. - The
CharacterTextSplitter
class from thelangchain.text_splitter
module is used to split the extracted text into smaller chunks. - The
OpenAIEmbeddings
class from thelangchain.embeddings
module is used to generate embeddings for the text chunks. - The
FAISS
class from thelangchain.vectorstores
module is used to create a vector store from the text chunks and their embeddings. - The
ChatOpenAI
class from thelangchain.chat_models
module is an alternative model for conversational responses. - The
ConversationBufferMemory
class from thelangchain.memory
module is used to store and retrieve the conversation history. - The
HuggingFaceHub
class from thelangchain.llms
module is used to load the conversational model from the Hugging Face model hub.