Code Monkey home page Code Monkey logo

dot-formor's Introduction

HELLO!

dot_desktop

This is Dot, a standalone open source app meant for easy use of local LLMs and RAG in particular to interact with documents and files similarly to Nvidia's Chat with RTX. Dot itself is completely standalone and is packaged with all dependencies including a copy of Mistral 7B, this is to ensure the app is as accessible as possible and no prior knowledge of programming or local LLMs is required to use it. You can install the app (available for Apple Silicon and Windows) here: Dot website

What does it do?

Dot can be used to load multiple documents into an llm and interact with them in a fully local environment through Retrieval Augmented Generation (RAG), supported documents are: pdf, docx, pptx, xlsx, and markdown. Apart from RAG, users can also switch to Big Dot for any interactions unrelated to their documents similarly to ChatGPT.

How does it work?

Dot is built with Electron JS, but its main functionalities come from a bundled install of Python that contains all libraries and necessary files. A multitude of libraries are used to make everything work, but perhaps the most important to be aware of are: llama-cpp to run the LLM, FAISS to create local vector stores, and Langchain & Huggingface to setup the conversation chains and embedding process.

Install

You can either install the packaged app in the Dot website or can set up the project for development, to do so follow these steps:

  • Clone the repository $ https://github.com/alexpinel/Dot.git
  • Install Node js and then run npm install inside the project repository, you can run npm install --force if you face any issues at this stage

Now, it is time to add a full python bundle to the app. The purpose of this is to create a distributable environment with all necessary libraries, if you only plan on using Dot from the console you might not need to follow this particular step but then make sure to replace the python path locations specified in src/index.js. Creating the python bundle is covered in detail here: https://til.simonwillison.net/electron/python-inside-electron , the bundles can also be installed from here: https://github.com/indygreg/python-build-standalone/releases/tag/20240224

Having created the bundle, please rename it to 'python' and place it inside the llm directory. It is now time to get all necessary libraries, keep in mind that running a simple pip install will not work without specifying the actual path of the bundle so use this instead: path/to/python/.bin/or/.exe -m pip install

Required python libraries:

  • pytorch link (CPU version recommended as it is lighter than GPU)
  • langchain link
  • FAISS link
  • HuggingFace link
  • llama-cpp link (Use CUDA implementation if you have an Nvidia GPU!)
  • pypdf link
  • docx2txt link
  • Unstructured link (Use pip install "unstructured[pptx, md, xlsx] for the file formats)

Now python should be setup and running! However, there is still a few more steps left, now is the time to add the final magic to Dot! First, create a folder inside the llm directory and name it mpnet, there you will need to install sentence-transformers to use for the document embeddings, fetch all the files from the following link and place them inside the new folder: sentence-transformers/all-mpnet-base-v2

Finally, download the Mistral 7B LLM from the following link and place it inside the llm/scripts directory alongside the python scripts used by Dot: TheBloke/Mistral-7B-Instruct-v0.2-GGUF

That's it! If you follow these steps you should be able to get it all running, please let me know if you are facing any issues :)

dot-formor's People

Contributors

alexpinel avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.