Code Monkey home page Code Monkey logo

learninglion's Introduction

LearningLion

Description of the project

The project is a study on the use of generative AI to improve the services of SSC-ICT by supporting employees and optimizing internal processes. In particular, the focus is on generative large language models (LLM) because they can have the most significant impact on the daily work of SSC-ICT employees. The assignment (Speech Recognition & AI) has been approved by the SpB and has been running since the beginning of 2023.

Repository

The current repository contains a selection of project documents as well as the code to a Proof of Concept (PoC) Chatbot Demo. The demo is an example of Retrieval Augmented Generation (RAG) and allows for the use of Open-Source LLM's for CPU Inference on a local machine. It makes use of Langchain and FAISS libraries among other things to perform document Q&A. A schematic overview of how the application works is shown here:

alt text


Running Locally

Quickstart

  • Ensure you have downloaded the model of your choice in GGUF format and placed it into the models/ folder. Some examples:

  • Fill the data/ folder with .pdf, .doc(x) or .txt files you want to ask questions about

  • To build a vectorstore database of your files, launch the terminal from the project directory and run the following command
    python db_build.py

  • To start the application, run the following command:
    streamlit run main_st.py

  • Use the interface to choose a model and adjust the parameters

  • You can now start asking questions about your files

![Alt text](Placeholder screenshot of app)


Complete walkthrough (work in progress)

1. Clone Repository

  • Open a terminal

  • Navigate to the location where you want the cloned directory to be

  • Input the git clone command using the LearningLion repository link in the terminal

git clone https://github.com/SSC-ICT-Innovatie/LearningLion.git
  • Press enter to create your local clone

2. Download your models

alt text

  • Download the models you want and place them in the models/ folder

3. Input files for your database

  • Choose the files you want to include in your RAG knowledge base and place them in the data/ folder, the model will use these files to answer questions

  • Use .pdf, .docx, or .txt files

4. Install requirements

  • Create a virtual environment using conda or venv

  • Install the required packages and libraries to your virtual environment using the pip install command

pip install -r requirements.txt

Note

If you want to run this in an offline environment, read the following instructions first: Using offline embeddings

4. Build your vectorstore database

  • Run the db_build file in the terminal to build your vectorstore
python db_build.py
  • Wait for the function to finsih, this may take up to several minutes depending on the size of your data/ folder

5. Run the application

  • Start the application using Streamlit
streamlit run main_st.py
  • The application will automatically start your browser. You can also access it in a different browser using http://localhost:8501

6. Adjust your settings

  • Using the panel on the left side of the interface you can adjust your settings

  • The first setting allows you to select a model from the /models folder

  • The temperature setting controls the 'creativity' or randomness of the model

    • Low temperature (0) = deterministic, precise, focused
    • High temperature (1) = diverse, creative
  • max_length sets the amount of tokens a model is allowed to use in a response

    • A token is a chunk of text that a model reads or generates; a general introduction by OpenAI
    • A model will not always use every available token but in general allowing a model to use more tokens will lead to longer responses and longer required time to generate an answer
    • This setting is also useful when using commercial models, where you often pay per token
  • n_sources how many document chunks will be fed to the model to generate an answer

    • Using more sources will also lead to more input context and a longer runtime
  • prompt in this window you can adjust the system prompt

8. Asking questions

  • Now you're all set to use the application

  • You can use the clear history button to clear the chat history

    • The application has a memory function, meaning that it will use the previous questions and answers as context to answer follow-up questions
    • This also means that with every question, the context size increases, so the runtime increases
  • To shut down the application access the terminal and press Ctrl + C

9. Adding files to your database

  • It is possible to add files to the vector store, to do this simply add your new files to the data/ folder and run python db_build.py again

  • In case you want to remove files or some other error occurs, it will be necessary to delete your existing vectorstore by running python db_clear.py, afterwards create a new vectorstore using python db_build.py


Using offline embeddings

Necessary word embeddings are usually downloaded when running the application. This works for most use cases, but not for those where this application has to be run without any connection to the internet at all.

In those cases, perform the following steps:

  1. Download the desired embedding files from https://sbert.net/models
    • This repo uses all-MiniLM-L6-v2.zip
    • Unzip to folder: sentence-transformers_all-MiniLM-L6-v2/
    • If you want to use different embeddings, you should adjust the folder name and the reference to it in db_build.py (line 74)
  2. Go to the .cache/ folder on your offline machine
    • Can be found in C:/Users/[User]/ for most Windows machines
  3. Within this folder, create torch/sentence_transformers/ if nonexistent
  4. Place embedding folder from step 1 inside of /sentence_transformers/

If all steps were performed correctly, the application will find the embeddings locally and will not try to download the embeddings.


Tools

  • LangChain: Framework for developing applications powered by language models
  • LlamaCPP: Python bindings for the Transformer models implemented in C/C++
  • FAISS: Open-source library for efficient similarity search and clustering of dense vectors.
  • Sentence-Transformers (all-MiniLM-L6-v2): Open-source pre-trained transformer model for embedding text to a 384-dimensional dense vector space for tasks like clustering or semantic search.
  • Llama-2-7B-Chat: Open-source fine-tuned Llama 2 model designed for chat dialogue. Leverages publicly available instruction datasets and over 1 million human annotations.

Files and Content

  • /assets: Images relevant to the project
  • /config: Configuration files for LLM application
  • /data: Dataset used for this project
  • /models: Binary files of GGUF quantized LLM model (i.e., Llama-2-7B-Chat)
  • /src: Python codes of key components of LLM application, namely llm.py and utils.py
  • /vectorstore: FAISS vector store for documents
  • db_build.py: Python script to ingest dataset and generate FAISS vector store
  • db_clear.py: Python script to clear the previously built database
  • main_st.py: Main Python script to launch the streamlit application
  • requirements.txt: List of Python dependencies (and version)

Acknowledgements

This is a fork of Kenneth Leung's original repository, and also gratefully makes use of Dennis V's work.

References

learninglion's People

Contributors

jellevane avatar jitsegoutbeek avatar vgevers avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.