Code Monkey home page Code Monkey logo

localgpt's Introduction

LocalGPT: Secure, Local Conversations with Your Documents 🌐

🚨🚨 You can run localGPT on a pre-configured Virtual Machine. Make sure to use the code: PromptEngineering to get 50% off. I will get a small commision!

LocalGPT is an open-source initiative that allows you to converse with your documents without compromising your privacy. With everything running locally, you can be assured that no data ever leaves your computer. Dive into the world of secure, local document interactions with LocalGPT.

Features 🌟

  • Utmost Privacy: Your data remains on your computer, ensuring 100% security.
  • Versatile Model Support: Seamlessly integrate a variety of open-source models, including HF, GPTQ, GGML, and GGUF.
  • Diverse Embeddings: Choose from a range of open-source embeddings.
  • Reuse Your LLM: Once downloaded, reuse your LLM without the need for repeated downloads.
  • Chat History: Remembers your previous conversations (in a session).
  • API: LocalGPT has an API that you can use for building RAG Applications.
  • Graphical Interface: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
  • GPU, CPU & MPS Support: Supports multiple platforms out of the box, Chat with your data using CUDA, CPU or MPS and more!

Dive Deeper with Our Videos 🎥

Technical Details 🛠️

By selecting the right local models and the power of LangChain you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.

  • ingest.py uses LangChain tools to parse the document and create embeddings locally using InstructorEmbeddings. It then stores the result in a local vector database using Chroma vector store.
  • run_localGPT.py uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
  • You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.

This project was inspired by the original privateGPT.

Built Using 🧩

Environment Setup 🌍

  1. 📥 Clone the repo using git:
git clone https://github.com/PromtEngineer/localGPT.git
  1. 🐍 Install conda for virtual environment management. Create and activate a new virtual environment.
conda create -n localGPT python=3.10.0
conda activate localGPT
  1. 🛠️ Install the dependencies using pip

To set up your environment to run the code, first install all requirements:

pip install -r requirements.txt

Installing LLAMA-CPP :

LocalGPT uses LlamaCpp-Python for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.

If you want to use BLAS or Metal with llama-cpp you can set appropriate flags:

For NVIDIA GPUs support, use cuBLAS

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

For Apple Metal (M1/M2) support, use

# Example: METAL
CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

For more details, please refer to llama-cpp

Docker 🐳

Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system. As an alternative to Conda, you can use Docker with the provided Dockerfile. It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit. Build as docker build -t localgpt ., requires BuildKit. Docker BuildKit does not support GPU during docker build time right now, only during docker run. Run as docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt.

Test dataset

For testing, this repository comes with Constitution of USA as an example file to use.

Ingesting your OWN Data.

Put your files in the SOURCE_DOCUMENTS folder. You can put multiple folders within the SOURCE_DOCUMENTS folder and the code will recursively read your files.

Support file formats:

LocalGPT currently supports the following file formats. LocalGPT uses LangChain for loading these file formats. The code in constants.py uses a DOCUMENT_MAP dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader from LangChain.

DOCUMENT_MAP = {
    ".txt": TextLoader,
    ".md": TextLoader,
    ".py": TextLoader,
    ".pdf": PDFMinerLoader,
    ".csv": CSVLoader,
    ".xls": UnstructuredExcelLoader,
    ".xlsx": UnstructuredExcelLoader,
    ".docx": Docx2txtLoader,
    ".doc": Docx2txtLoader,
}

Ingest

Run the following command to ingest all the data.

If you have cuda setup on your system.

python ingest.py

You will see an output like this: Screenshot 2023-09-14 at 3 36 27 PM

Use the device type argument to specify a given device. To run on cpu

python ingest.py --device_type cpu

To run on M1/M2

python ingest.py --device_type mps

Use help for a full list of supported devices.

python ingest.py --help

This will create a new folder called DB and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database. If you want to start from an empty database, delete the DB and reingest your documents.

Note: When you run this for the first time, it will need internet access to download the embedding model (default: Instructor Embedding). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.

Ask questions to your documents, locally!

In order to chat with your documents, run the following command (by default, it will run on cuda).

python run_localGPT.py

You can also specify the device type just like ingest.py

python run_localGPT.py --device_type mps # to run on Apple silicon

This will load the ingested vector store and embedding model. You will be presented with a prompt:

> Enter a query:

After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below. Screenshot 2023-09-14 at 3 33 19 PM

Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.

Note: When you run this for the first time, it will need internet connection to download the LLM (default: TheBloke/Llama-2-7b-Chat-GGUF). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.

Type exit to finish the script.

Extra Options with run_localGPT.py

You can use the --show_sources flag with run_localGPT.py to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks

python run_localGPT.py --show_sources

Another option is to enable chat history. Note: This is disabled by default and can be enabled by using the --use_history flag. The context window is limited so keep in mind enabling history will use it and might overflow.

python run_localGPT.py --use_history

You can store user questions and model responses with flag --save_qa into a csv file /local_chat_history/qa_log.csv. Every interaction will be stored.

python run_localGPT.py --save_qa

Run the Graphical User Interface

  1. Open constants.py in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:

    MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
    MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
  2. Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.

  3. Navigate to the /LOCALGPT directory.

  4. Run the following command python run_localGPT_API.py. The API should being to run.

  5. Wait until everything has loaded in. You should see something like INFO:werkzeug:Press CTRL+C to quit.

  6. Open up a second terminal and activate the same python environment.

  7. Navigate to the /LOCALGPT/localGPTUI directory.

  8. Run the command python localGPTUI.py.

  9. Open up a web browser and go the address http://localhost:5111/.

How to select different LLM models?

To change the models you will need to set both MODEL_ID and MODEL_BASENAME.

  1. Open up constants.py in the editor of your choice.

  2. Change the MODEL_ID and MODEL_BASENAME. If you are using a quantized model (GGML, GPTQ, GGUF), you will need to provide MODEL_BASENAME. For unquantized models, set MODEL_BASENAME to NONE

  3. There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").

  4. For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.

    • Make sure you have a MODEL_ID selected. For example -> MODEL_ID = "TheBloke/guanaco-7B-HF"
    • Go to the HuggingFace Repo
  5. For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.

    • Make sure you have a MODEL_ID selected. For example -> model_id = "TheBloke/wizardLM-7B-GPTQ"
    • Got to the corresponding HuggingFace Repo and select "Files and versions".
    • Pick one of the model names and set it as MODEL_BASENAME. For example -> MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
  6. Follow the same steps for GGUF and GGML models.

GPU and VRAM Requirements

Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.

Mode Size (B) float32 float16 GPTQ 8bit GPTQ 4bit
7B 28 GB 14 GB 7 GB - 9 GB 3.5 GB - 5 GB
13B 52 GB 26 GB 13 GB - 15 GB 6.5 GB - 8 GB
32B 130 GB 65 GB 32.5 GB - 35 GB 16.25 GB - 19 GB
65B 260.8 GB 130.4 GB 65.2 GB - 67 GB 32.6 GB - 35 GB

System Requirements

Python Version

To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.

C++ Compiler

If you encounter an error while building a wheel during the pip install process, you may need to install a C++ compiler on your computer.

For Windows 10/11

To install a C++ compiler on Windows 10/11, follow these steps:

  1. Install Visual Studio 2022.
  2. Make sure the following components are selected:
    • Universal Windows Platform development
    • C++ CMake tools for Windows
  3. Download the MinGW installer from the MinGW website.
  4. Run the installer and select the "gcc" component.

NVIDIA Driver's Issues:

Follow this page to install NVIDIA Drivers.

Star History

Star History Chart

Disclaimer

This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.

Common Errors

localgpt's People

Contributors

ahmedhathout avatar allaye avatar chrisaylen avatar conacts avatar gdxz123 avatar hauntedness avatar huseyinzorlu avatar imjwang avatar jorge-campo avatar karthikcs avatar kerenk-exrm avatar kevinmgamboa avatar konradhoeffner avatar kpbird avatar kyrbrbik avatar leafmanz avatar marook avatar nidhi-chipre avatar phdykd avatar promtengineer avatar ptanov avatar romilbhardwaj avatar saneld avatar scitechenthusiast avatar simi avatar tchekda avatar tedcochran avatar teleprint-me avatar ujwpi avatar wind010 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localgpt's Issues

Running localGPT

I have tried running localGPT and get the follwoing error [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 67108864 bytes.

C:\AI\LocalGPT>python run_localGPT.py
Running on: cuda
load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: C:\AI\LocalGPT/DB
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\AI\LocalGPT\run_localGPT.py:88 in │
│ │
│ 85 │
│ 86 │
│ 87 if name == "main": │
│ ❱ 88 │ main() │
│ 89 │
│ │
│ C:\Python\lib\site-packages\click\core.py:1130 in call
│ │
│ 1127 │ │
│ 1128 │ def call(self, *args: t.Any, **kwargs: t.Any) -> t.Any: │
│ 1129 │ │ """Alias for :meth:main.""" │
│ ❱ 1130 │ │ return self.main(*args, **kwargs) │
│ 1131 │
│ 1132 │
│ 1133 class Command(BaseCommand): │
│ │
│ C:\Python\lib\site-packages\click\core.py:1055 in main │
│ │
│ 1052 │ │ try: │
│ 1053 │ │ │ try: │
│ 1054 │ │ │ │ with self.make_context(prog_name, args, **extra) as ctx: │
│ ❱ 1055 │ │ │ │ │ rv = self.invoke(ctx) │
│ 1056 │ │ │ │ │ if not standalone_mode: │
│ 1057 │ │ │ │ │ │ return rv │
│ 1058 │ │ │ │ │ # it's not safe to ctx.exit(rv) here! │
│ │
│ C:\Python\lib\site-packages\click\core.py:1404 in invoke │
│ │
│ 1401 │ │ │ echo(style(message, fg="red"), err=True) │
│ 1402 │ │ │
│ 1403 │ │ if self.callback is not None: │
│ ❱ 1404 │ │ │ return ctx.invoke(self.callback, **ctx.params) │
│ 1405 │ │
│ 1406 │ def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]: │
│ 1407 │ │ """Return a list of completions for the incomplete value. Looks │
│ │
│ C:\Python\lib\site-packages\click\core.py:760 in invoke │
│ │
│ 757 │ │ │
│ 758 │ │ with augment_usage_errors(__self): │
│ 759 │ │ │ with ctx: │
│ ❱ 760 │ │ │ │ return __callback(*args, **kwargs) │
│ 761 │ │
│ 762 │ def forward( │
│ 763 │ │ __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any # noqa: B902 │
│ │
│ C:\AI\LocalGPT\run_localGPT.py:61 in main │
│ │
│ 58 │ # Prepare the LLM │
│ 59 │ # callbacks = [StreamingStdOutCallbackHandler()] │
│ 60 │ # load the LLM for generating Natural Language responses. │
│ ❱ 61 │ llm = load_model() │
│ 62 │ qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, r │
│ 63 │ # Interactive questions and answers │
│ 64 │ while True: │
│ │
│ C:\AI\LocalGPT\run_localGPT.py:21 in load_model │
│ │
│ 18 │ model_id = "TheBloke/vicuna-7B-1.1-HF" │
│ 19 │ tokenizer = LlamaTokenizer.from_pretrained(model_id) │
│ 20 │ │
│ ❱ 21 │ model = LlamaForCausalLM.from_pretrained(model_id, │
│ 22 │ │ │ │ │ │ │ │ │ │ │ # load_in_8bit=True, # set these options i │
│ 23 │ │ │ │ │ │ │ │ │ │ │ # device_map=1#'auto', │
│ 24 │ │ │ │ │ │ │ │ │ │ │ # torch_dtype=torch.float16, │
│ │
│ C:\Python\lib\site-packages\transformers\modeling_utils.py:2611 in from_pretrained │
│ │
│ 2608 │ │ │ init_contexts.append(init_empty_weights()) │
│ 2609 │ │ │
│ 2610 │ │ with ContextManagers(init_contexts): │
│ ❱ 2611 │ │ │ model = cls(config, *model_args, **model_kwargs) │
│ 2612 │ │ │
│ 2613 │ │ # Check first if we are from_pt
│ 2614 │ │ if use_keep_in_fp32_modules: │
│ │
│ C:\Python\lib\site-packages\transformers\models\llama\modeling_llama.py:615 in init
│ │
│ 612 class LlamaForCausalLM(LlamaPreTrainedModel): │
│ 613 │ def init(self, config): │
│ 614 │ │ super().init(config) │
│ ❱ 615 │ │ self.model = LlamaModel(config) │
│ 616 │ │ │
│ 617 │ │ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) │
│ 618 │
│ │
│ C:\Python\lib\site-packages\transformers\models\llama\modeling_llama.py:446 in init
│ │
│ 443 │ │ self.vocab_size = config.vocab_size │
│ 444 │ │ │
│ 445 │ │ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.pad │
│ ❱ 446 │ │ self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num │
│ 447 │ │ self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps) │
│ 448 │ │ │
│ 449 │ │ self.gradient_checkpointing = False │
│ │
│ C:\Python\lib\site-packages\transformers\models\llama\modeling_llama.py:446 in │
│ │
│ 443 │ │ self.vocab_size = config.vocab_size │
│ 444 │ │ │
│ 445 │ │ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.pad │
│ ❱ 446 │ │ self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num │
│ 447 │ │ self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps) │
│ 448 │ │ │
│ 449 │ │ self.gradient_checkpointing = False │
│ │
│ C:\Python\lib\site-packages\transformers\models\llama\modeling_llama.py:256 in init
│ │
│ 253 │ def init(self, config: LlamaConfig): │
│ 254 │ │ super().init() │
│ 255 │ │ self.hidden_size = config.hidden_size │
│ ❱ 256 │ │ self.self_attn = LlamaAttention(config=config) │
│ 257 │ │ self.mlp = LlamaMLP( │
│ 258 │ │ │ hidden_size=self.hidden_size, │
│ 259 │ │ │ intermediate_size=config.intermediate_size, │
│ │
│ C:\Python\lib\site-packages\transformers\models\llama\modeling_llama.py:179 in init
│ │
│ 176 │ │ │ ) │
│ 177 │ │ self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=F │
│ 178 │ │ self.k_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=F │
│ ❱ 179 │ │ self.v_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=F │
│ 180 │ │ self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=F │
│ 181 │ │ self.rotary_emb = LlamaRotaryEmbedding(self.head_dim, max_position_embeddings=se │
│ 182 │
│ │
│ C:\Python\lib\site-packages\torch\nn\modules\linear.py:96 in init
│ │
│ 93 │ │ super().init() │
│ 94 │ │ self.in_features = in_features │
│ 95 │ │ self.out_features = out_features │
│ ❱ 96 │ │ self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwarg │
│ 97 │ │ if bias: │
│ 98 │ │ │ self.bias = Parameter(torch.empty(out_features, **factory_kwargs)) │
│ 99 │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you
tried to allocate 67108864 bytes.

Program is running on CPU while set on GPU

Running on a threadripper + RTX A6000 with 48gb of VRAM.

I did the installation but i went through some issues.

  • First one: Couldnt ingest because i got the error: Torch not compiled with CUDA enabled

    • i tried to fixed it by installing pytorch via conda but couldnt get conda installed on my computer (if someone can help me), so i went with this solution:
      pip uninstall torch torchvision
      and then reinstall with
      pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  • After that the program ingested and launched. I was able to query my document.

BUT, the program was working through my CPU only. I didnt touched anything in the code neither specified nothing with python run_localGPT.py.

Does someone know where this issue come from ?

Error: 'abs_out_mps()' operation does not support input type 'int64' in MPS backend on Apple M1 Mac

Issue Description

When running the ingest.py script on an Apple M1 MacBook Pro with 32GB of memory, an error occurs due to the 'abs_out_mps()' operation not supporting the input type 'int64' in the MPS backend. This issue prevents the script from executing successfully.

Reproduction Steps

  1. Clone the repository to a local directory.
  2. Install the required dependencies by running pip install -r requirements.txt and wait for the installation to complete.
  3. Open the ingest.py file and navigate to line 37.
  4. Change the value of device from 'cuda' to 'mps'.
  5. Run the command python ingest.py in the terminal.

Error Message

python ingest.py
Loading documents from /Users/lrodrrol/Documents/Projects/LocalGpt/localGPT/SOURCE_DOCUMENTS
Loaded 1 documents from /Users/lrodrrol/Documents/Projects/LocalGpt/localGPT/SOURCE_DOCUMENTS
Split into 72 chunks of text
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: /Users/lrodrrol/Documents/Projects/LocalGpt/localGPT/DB
Traceback (most recent call last):
  File "/Users/lrodrrol/Documents/Projects/LocalGpt/localGPT/ingest.py", line 57, in <module>
    main()
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/lrodrrol/Documents/Projects/LocalGpt/localGPT/ingest.py", line 51, in main
    db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 413, in from_documents
    return cls.from_texts(
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 381, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 158, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 148, in embed_documents
    embeddings = self.client.encode(instruction_pairs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/InstructorEmbedding/instructor.py", line 539, in encode
    out_features = self.forward(features)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/InstructorEmbedding/instructor.py", line 269, in forward
    output_states = self.auto_model(**trans_features, return_dict=False)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1846, in forward
    encoder_outputs = self.encoder(
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1040, in forward
    layer_outputs = layer_module(
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 673, in forward
    self_attention_outputs = self.layer[0](
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 579, in forward
    attention_output = self.SelfAttention(
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 521, in forward
    position_bias = self.compute_bias(real_seq_length, key_length, device=scores.device)
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 428, in compute_bias
    relative_position_bucket = self._relative_position_bucket(
  File "/Users/lrodrrol/.pyenv/versions/3.10.4/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 399, in _relative_position_bucket
    relative_position = torch.abs(relative_position)
TypeError: Operation 'abs_out_mps()' does not support input type 'int64' in MPS backend.

Offer to Help

I am willing to contribute to resolving this issue by providing a pull request to the repository's readme file if someone can suggest a fix. Please let me know if there are any specific changes or steps needed to address this problem.

Running successfully but not on my 1090TI GPU

I would like to express my appreciation for the excellent work you have done with this project. I admire your use of the Vicuna-7B model and InstructorEmbeddings to enhance performance and privacy.

While using your software, I have encountered an issue related to hardware compatibility. Specifically, my GPU, a 1080TI, appears to not be working with the project. While your project functions perfectly the CPU, I've been unsuccessful in making it work on my 1080TI hardware. I suspect this may be due to some nuances related to this specific GPU model or the CUDA libraries.

While the Python script does run and completes without any errors, I am able the expected results when querying documents. The exact symptoms are that my GPU use is zero, and generating answers takes more than an hour sometimes.

Here are my system specifications for reference:

Operating System: Windows 10
Python Version: Python 3.10.9
CUDA Version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

NVIDIA Driver Version: 531.14 DCH running on an NVIDIA GeForce GTX 1080 TI (11GB VRAM)

I've tried the following troubleshooting action because I suspect this is where the issue is:

  1. Uncommenting the following lines:
    model = LlamaForCausalLM.from_pretrained(model_id,
    load_in_8bit=True, # set these options if your GPU supports them!
    device_map=1, #'auto',
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
    )

Ingesting works well, however running the main program gives the following error:

python run_localGPT.py
load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: C:\Users\username\localGPT
loading model
setting tokenizer
loading LlamaForCausalLM.from_pretrained

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
False
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "C:\Users\username\localGPT\run_localGPT.py", line 85, in
main()
File "C:\Users\username\localGPT\run_localGPT.py", line 58, in main
llm = load_model()
File "C:\Users\username\localGPT\run_localGPT.py", line 26, in load_model
model = LlamaForCausalLM.from_pretrained(model_id,
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py", line 2621, in from_pretrained
from .utils.bitsandbytes import get_keys_to_not_convert, replace_8bit_linear
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\utils\bitsandbytes.py", line 9, in
import bitsandbytes as bnb
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes_init_.py", line 6, in
from . import cuda_setup, utils, research
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research_init_.py", line 1, in
from . import nn
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research\nn_init_.py", line 1, in
from .modules import LinearFP8Mixed, LinearFP8Global
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in
from bitsandbytes.optim import GlobalOptimManager
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\optim_init_.py", line 6, in
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 20, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Some further information:

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

nvidia-smi

Mon May 29 12:21:08 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 531.14 Driver Version: 531.14 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1080 Ti WDDM | 00000000:09:00.0 On | N/A |
| 37% 63C P0 65W / 250W| 1585MiB / 11264MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 808 C+G ...1.0_x64__8wekyb3d8bbwe\Video.UI.exe N/A |
| 0 N/A N/A 5188 C+G ...es\Contents\Windows\Illustrator.exe N/A |
| 0 N/A N/A 6068 C+G ...s\mattermost-desktop\Mattermost.exe N/A |
| 0 N/A N/A 9736 C+G ...ft Office\root\Office16\ONENOTE.EXE N/A |
| 0 N/A N/A 10208 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 11708 C+G ...2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 12364 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 12428 C+G ...ows\CEPHtmlEngine\CEPHtmlEngine.exe N/A |
| 0 N/A N/A 14004 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
| 0 N/A N/A 14012 C+G ...aam7r\AcrobatNotificationClient.exe N/A |
| 0 N/A N/A 14204 C+G ...GeForce Experience\NVIDIA Share.exe N/A |
| 0 N/A N/A 14976 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 16428 C+G ...t5de4sc\Sepiro.WakeOnLan.WinApp.exe N/A |
| 0 N/A N/A 19140 C+G ...l\Microsoft\Teams\current\Teams.exe N/A |
| 0 N/A N/A 21224 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 29132 C+G ...ejd91yc\AdobeNotificationClient.exe N/A |
| 0 N/A N/A 34480 C+G ...soft\EdgeWebView\msedgewebview2.exe N/A |
| 0 N/A N/A 34948 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 36888 C+G ...64__8wekyb3d8bbwe\CalculatorApp.exe N/A |
| 0 N/A N/A 37084 C+G ...oogle\Chrome\Application\chrome.exe N/A |
| 0 N/A N/A 40836 C+G ...soft Office\root\Office16\EXCEL.EXE N/A |
| 0 N/A N/A 41816 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 44436 C+G ...l\Microsoft\Teams\current\Teams.exe N/A |
| 0 N/A N/A 44992 C+G ...t\SelfServicePlugin\SelfService.exe N/A |
| 0 N/A N/A 47084 C+G ...61.0_x64__8wekyb3d8bbwe\GameBar.exe N/A |
| 0 N/A N/A 47880 C+G ...crosoft\Edge\Application\msedge.exe N/A |
| 0 N/A N/A 53188 C+G ...on\113.0.1774.57\msedgewebview2.exe N/A |
| 0 N/A N/A 53440 C+G ...on\113.0.1774.50\msedgewebview2.exe N/A |
| 0 N/A N/A 57444 C+G ....0_x64__8wekyb3d8bbwe\HxOutlook.exe N/A |
| 0 N/A N/A 57592 C+G ...siveControlPanel\SystemSettings.exe N/A |
| 0 N/A N/A 58236 C+G ...0_x64__8wekyb3d8bbwe\HxAccounts.exe N/A |
| 0 N/A N/A 59788 C+G ...yb3d8bbwe\Microsoft.Msn.Weather.exe N/A |
| 0 N/A N/A 59824 C+G ...ft Office\root\Office16\WINWORD.EXE N/A |
+---------------------------------------------------------------------------------------+

Thank you in advance for your help and support!

Provide CPU only how-to and implement an easy CPU only option

Due to choice of both encoders and LLM your implementation has benefits over private GPT beyond GPU vs CPU usage ( as you explain here #7 (comment) ), and someone has apparently been able to make it run in CPU mode: See this comment under your helpful youtube video https://www.youtube.com/watch?v=MlyoObdIHyo

image

Please provide a how to for running localGPT in CPU mode. Please consider an easily accessible CPU only option. I am aware of #6 but think this goes beyond whats discussed there

Feature request

How can we save the embeddings of a particular file and use it later for question-answering?

Also i receive this warning

I receive this error message in a new conda environnement:

C:\Users(myname)\anaconda3\envs\privategpt\lib\site-packages\transformers\generation\utils.py:1255: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(

conda project is called privategpt but it is in fact fully dedicated to localgpt

Other llm models?

It's possible to run other llm models? What Is needed to modify to run by example red pajama with this? Tks for all your work c:

download the vicuna-7B model manually?

hi
i have poor network
so i couldn't download pytorch and other models without disconnect several times
is there any way to download vicuna-7B model with other apps (like IDM)
and put it in the files manually?

Option for usage of other AI models (or maybe just 13-b)

I thought that it would be cool to use last version of vicuna (vicuna-13b) instead of 7b, as it can be more efficient. Then I thought that it'll be useful for some people to use other models too. Therefore, I suggest adding functionality to support several models (or at least vicuna-13b🙃) in the future and to select them via arguments

I checked Vicuna docs and seems like 13-b version is configurable the same way as 7-b. So, allegedly, it can be easily implemented.
I can implement and contribute this, but I need to know your opinion first!

Running on google colab

Hello, i'm trying to run it on Google Colab :

  • The first script ingest.py finishes quit fast (around 1min)
  • Unfortunately, the second script run_localGPT.py gets stuck 7min before it stops on Using embedded DuckDB with persistence: data will be stored in: /content/localGPT/DB
2023-05-31 08:29:14.848942: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Running on: cuda
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: /content/localGPT/DB

Bug on Ubuntu 22: software doesnt work

Installation smooth, no problem

So i do a python ingest.py
and everything is fine, but then later:

load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: /mnt/6903a017-f604-4f90-9652-324e10b3e675/work/h2oai/localgpt/localGPT
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Killed

(note: i am not doing that in my home folder)

UnboundLocalError: local variable 'loader' referenced before assignment

Loading documents from D:\localGPT-main/SOURCE_DOCUMENTS
Traceback (most recent call last):
File "D:\localGPT-main\ingest.py", line 57, in
main()
File "C:\Users\KaiHongTech\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1130, in call
return self.main(*args, **kwargs)
File "C:\Users\KaiHongTech\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\KaiHongTech\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\KaiHongTech\AppData\Roaming\Python\Python310\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "D:\localGPT-main\ingest.py", line 41, in main
documents = load_documents(SOURCE_DIRECTORY)
File "D:\localGPT-main\ingest.py", line 27, in load_documents
return [load_single_document(f"D:\localGPT-main\SOURCE_DOCUMENTS") for file_path in all_files if file_path[-4:] in ['.txt', '.pdf', '.csv'] ]
File "D:\localGPT-main\ingest.py", line 27, in
return [load_single_document(f"D:\localGPT-main\SOURCE_DOCUMENTS") for file_path in all_files if file_path[-4:] in ['.txt', '.pdf', '.csv'] ]
File "D:\localGPT-main\ingest.py", line 21, in load_single_document
return loader.load()[0]
UnboundLocalError: local variable 'loader' referenced before assignment

Process is Killed on CPU

[localGPT] main*% [2d,16h,12m] →

$ python3 ingest.py --device_type cpu
Loading documents from /home/ni-user/Desktop/localGPT/SOURCE_DOCUMENTS
Loaded 1 documents from /home/ni-user/Desktop/localGPT/SOURCE_DOCUMENTS
Split into 148 chunks of text
load INSTRUCTOR_Transformer
Killed
[localGPT] main*% [2d,16h,12m] →

$ python3 run_localGPT.py --device_type cpu
Running on: cpu
load INSTRUCTOR_Transformer
Killed
[localGPT] main*% [2d,16h,13m] →

run_localGPT.py: chromadb.errors.NoIndexException: Index not found, please create an instance before querying

I've been able to run the ingest.py and it seems to work. a chroma-collections.parquet and chroma-embeddings.parquet are created in the same folder as ingest.py.

When I run run_localGPT.py, it generates an "chromadb.errors.NoIndexException: Index not found, please create an instance before querying" error on line 66: res = qa(query).

Should the ingest.py be generating a third file with an index or is something else wrong?

running out of resources?

i have the following problem and im on a MacBook Air M2 with 16GB Ram

➜ localGPT git:(main) ✗ python run_localGPT.py --device_type cpu
Running on: cpu
load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: /Users/andi/localGPT/DB
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s][1] 78030 killed python run_localGPT.py --device_type cpu
/Users/andi/miniconda3/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

This is the error i get when i start ingest.py

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ E:\localGPT\ingest.py:49 in │
│ │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ ❱ 49 │ main() │
│ 50 │
│ │
│ E:\localGPT\ingest.py:43 in main │
│ │
│ 40 │ embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl", │
│ 41 │ │ │ │ │ │ │ │ │ │ │ │ model_kwargs={"device": "cuda"}) │
│ 42 │ │
│ ❱ 43 │ db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, c │
│ 44 │ db.persist() │
│ 45 │ db = None │
│ 46 │
│ │
│ E:\Python\Python310\lib\site-packages\langchain\vectorstores\chroma.py:413 in from_documents │
│ │
│ 410 │ │ """ │
│ 411 │ │ texts = [doc.page_content for doc in documents] │
│ 412 │ │ metadatas = [doc.metadata for doc in documents] │
│ ❱ 413 │ │ return cls.from_texts( │
│ 414 │ │ │ texts=texts, │
│ 415 │ │ │ embedding=embedding, │
│ 416 │ │ │ metadatas=metadatas, │
│ │
│ E:\Python\Python310\lib\site-packages\langchain\vectorstores\chroma.py:381 in from_texts │
│ │
│ 378 │ │ │ client_settings=client_settings, │
│ 379 │ │ │ client=client, │
│ 380 │ │ ) │
│ ❱ 381 │ │ chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) │
│ 382 │ │ return chroma_collection │
│ 383 │ │
│ 384 │ @classmethod
│ │
│ E:\Python\Python310\lib\site-packages\langchain\vectorstores\chroma.py:158 in add_texts │
│ │
│ 155 │ │ │ ids = [str(uuid.uuid1()) for _ in texts] │
│ 156 │ │ embeddings = None │
│ 157 │ │ if self._embedding_function is not None: │
│ ❱ 158 │ │ │ embeddings = self._embedding_function.embed_documents(list(texts)) │
│ 159 │ │ self._collection.add( │
│ 160 │ │ │ metadatas=metadatas, embeddings=embeddings, documents=texts, ids=ids │
│ 161 │ │ ) │
│ │
│ E:\Python\Python310\lib\site-packages\langchain\embeddings\huggingface.py:148 in embed_documents │
│ │
│ 145 │ │ │ List of embeddings, one for each text. │
│ 146 │ │ """ │
│ 147 │ │ instruction_pairs = [[self.embed_instruction, text] for text in texts] │
│ ❱ 148 │ │ embeddings = self.client.encode(instruction_pairs) │
│ 149 │ │ return embeddings.tolist() │
│ 150 │ │
│ 151 │ def embed_query(self, text: str) -> List[float]: │
│ │
│ E:\Python\Python310\lib\site-packages\InstructorEmbedding\instructor.py:521 in encode │
│ │
│ 518 │ │ if device is None: │
│ 519 │ │ │ device = self._target_device │
│ 520 │ │ │
│ ❱ 521 │ │ self.to(device) │
│ 522 │ │ │
│ 523 │ │ all_embeddings = [] │
│ 524 │ │ if isinstance(sentences[0],list): │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1145 in to │
│ │
│ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1144 │ │ │
│ ❱ 1145 │ │ return self._apply(convert) │
│ 1146 │ │
│ 1147 │ def register_full_backward_pre_hook( │
│ 1148 │ │ self, │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:820 in _apply │
│ │
│ 817 │ │ │ # track autograd history of param_applied, so we have to use │
│ 818 │ │ │ # with torch.no_grad():
│ 819 │ │ │ with torch.no_grad(): │
│ ❱ 820 │ │ │ │ param_applied = fn(param) │
│ 821 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │
│ 822 │ │ │ if should_use_set_data: │
│ 823 │ │ │ │ param.data = param_applied │
│ │
│ E:\Python\Python310\lib\site-packages\torch\nn\modules\module.py:1143 in convert │
│ │
│ 1140 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │
│ 1141 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │
│ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ ❱ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1144 │ │ │
│ 1145 │ │ return self.apply(convert) │
│ 1146 │
│ │
│ E:\Python\Python310\lib\site-packages\torch\cuda_init
.py:239 in _lazy_init │
│ │
│ 236 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To use CUDA with " │
│ 237 │ │ │ │ "multiprocessing, you must use the 'spawn' start method") │
│ 238 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │
│ ❱ 239 │ │ │ raise AssertionError("Torch not compiled with CUDA enabled") │
│ 240 │ │ if _cudart is None: │
│ 241 │ │ │ raise AssertionError( │
│ 242 │ │ │ │ "libcudart functions unavailable. It looks like you have a broken build? │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled

Issue when running ingest.py "Unable to load weights from pytorch checkpoint"," If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True."

C:\Users\sanke\Downloads\localGPT>python ingest.py
Loading documents from C:\Users\sanke\Downloads\localGPT/SOURCE_DOCUMENTS
Loaded 1 documents from C:\Users\sanke\Downloads\localGPT/SOURCE_DOCUMENTS
Split into 72 chunks of text
load INSTRUCTOR_Transformer
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_u │
│ tils.py:446 in load_state_dict │
│ │
│ 443 │ │ │ ) │
│ 444 │ │ return safe_load_file(checkpoint_file) │
│ 445 │ try: │
│ ❱ 446 │ │ return torch.load(checkpoint_file, map_location="cpu") │
│ 447 │ except Exception as e: │
│ 448 │ │ try: │
│ 449 │ │ │ with open(checkpoint_file) as f: │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py: │
│ 797 in load │
│ │
│ 794 │ │ │ # If we want to actually tail call to torch.jit.load, we need to │
│ 795 │ │ │ # reset back to the original position. │
│ 796 │ │ │ orig_position = opened_file.tell() │
│ ❱ 797 │ │ │ with _open_zipfile_reader(opened_file) as opened_zipfile: │
│ 798 │ │ │ │ if _is_torchscript_zip(opened_zipfile): │
│ 799 │ │ │ │ │ warnings.warn("'torch.load' received a zip file that looks like a To │
│ 800 │ │ │ │ │ │ │ │ " dispatching to 'torch.jit.load' (call 'torch.jit.loa │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py: │
│ 283 in init
│ │
│ 280 │
│ 281 class _open_zipfile_reader(_opener): │
│ 282 │ def init(self, name_or_buffer) -> None: │
│ ❱ 283 │ │ super().init(torch._C.PyTorchFileReader(name_or_buffer)) │
│ 284 │
│ 285 │
│ 286 class _open_zipfile_writer_file(_opener): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_u │
│ tils.py:450 in load_state_dict │
│ │
│ 447 │ except Exception as e: │
│ 448 │ │ try: │
│ 449 │ │ │ with open(checkpoint_file) as f: │
│ ❱ 450 │ │ │ │ if f.read(7) == "version": │
│ 451 │ │ │ │ │ raise OSError( │
│ 452 │ │ │ │ │ │ "You seem to have cloned a repository without having git-lfs ins │
│ 453 │ │ │ │ │ │ "git-lfs and run git lfs install followed by git lfs pull in │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py:23 in decode │
│ │
│ 20 │
│ 21 class IncrementalDecoder(codecs.IncrementalDecoder): │
│ 22 │ def decode(self, input, final=False): │
│ ❱ 23 │ │ return codecs.charmap_decode(input,self.errors,decoding_table)[0] │
│ 24 │
│ 25 class StreamWriter(Codec,codecs.StreamWriter): │
│ 26 │ pass │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1821: character maps to

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\sanke\Downloads\localGPT\ingest.py:57 in │
│ │
│ 54 │
│ 55 │
│ 56 if name == "main": │
│ ❱ 57 │ main() │
│ 58 │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py:1130 in │
call
│ │
│ 1127 │ │
│ 1128 │ def call(self, *args: t.Any, **kwargs: t.Any) -> t.Any: │
│ 1129 │ │ """Alias for :meth:main.""" │
│ ❱ 1130 │ │ return self.main(*args, **kwargs) │
│ 1131 │
│ 1132 │
│ 1133 class Command(BaseCommand): │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py:1055 in │
│ main │
│ │
│ 1052 │ │ try: │
│ 1053 │ │ │ try: │
│ 1054 │ │ │ │ with self.make_context(prog_name, args, **extra) as ctx: │
│ ❱ 1055 │ │ │ │ │ rv = self.invoke(ctx) │
│ 1056 │ │ │ │ │ if not standalone_mode: │
│ 1057 │ │ │ │ │ │ return rv │
│ 1058 │ │ │ │ │ # it's not safe to ctx.exit(rv) here! │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py:1404 in │
│ invoke │
│ │
│ 1401 │ │ │ echo(style(message, fg="red"), err=True) │
│ 1402 │ │ │
│ 1403 │ │ if self.callback is not None: │
│ ❱ 1404 │ │ │ return ctx.invoke(self.callback, **ctx.params) │
│ 1405 │ │
│ 1406 │ def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]: │
│ 1407 │ │ """Return a list of completions for the incomplete value. Looks │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py:760 in │
│ invoke │
│ │
│ 757 │ │ │
│ 758 │ │ with augment_usage_errors(__self): │
│ 759 │ │ │ with ctx: │
│ ❱ 760 │ │ │ │ return __callback(*args, **kwargs) │
│ 761 │ │
│ 762 │ def forward( │
│ 763 │ │ __self, __cmd: "Command", args: t.Any, **kwargs: t.Any # noqa: B902 │
│ │
│ C:\Users\sanke\Downloads\localGPT\ingest.py:48 in main │
│ │
│ 45 │ print(f"Split into {len(texts)} chunks of text") │
│ 46 │ │
│ 47 │ # Create embeddings │
│ ❱ 48 │ embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl", │
│ 49 │ │ │ │ │ │ │ │ │ │ │ │ model_kwargs={"device": device}) │
│ 50 │ │
│ 51 │ db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, c │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\embeddings\hu │
│ ggingface.py:127 in init
│ │
│ 124 │ │ try: │
│ 125 │ │ │ from InstructorEmbedding import INSTRUCTOR │
│ 126 │ │ │ │
│ ❱ 127 │ │ │ self.client = INSTRUCTOR( │
│ 128 │ │ │ │ self.model_name, cache_folder=self.cache_folder, **self.model_kwargs │
│ 129 │ │ │ ) │
│ 130 │ │ except ImportError as e: │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\sentence_transformers\S │
│ entenceTransformer.py:95 in init
│ │
│ 92 │ │ │ │ │ │ │ │ │ │ use_auth_token=use_auth_token) │
│ 93 │ │ │ │
│ 94 │ │ │ if os.path.exists(os.path.join(model_path, 'modules.json')): #Load as Sen │
│ ❱ 95 │ │ │ │ modules = self._load_sbert_model(model_path) │
│ 96 │ │ │ else: #Load with AutoModel │
│ 97 │ │ │ │ modules = self._load_auto_model(model_path) │
│ 98 │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\InstructorEmbedding\ins │
│ tructor.py:474 in load_sbert_model │
│ │
│ 471 │ │ │ │ module_class = INSTRUCTOR_Pooling │
│ 472 │ │ │ else: │
│ 473 │ │ │ │ module_class = import_from_string(module_config['type']) │
│ ❱ 474 │ │ │ module = module_class.load(os.path.join(model_path, module_config['path'])) │
│ 475 │ │ │ modules[module_config['name']] = module │
│ 476 │ │ │
│ 477 │ │ return modules │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\InstructorEmbedding\ins │
│ tructor.py:306 in load │
│ │
│ 303 │ │ │
│ 304 │ │ with open(sbert_config_path) as fIn: │
│ 305 │ │ │ config = json.load(fIn) │
│ ❱ 306 │ │ return INSTRUCTOR_Transformer(model_name_or_path=input_path, **config) │
│ 307 │ │
│ 308 │ def tokenize(self, texts): │
│ 309 │ │ """ │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\InstructorEmbedding\ins │
│ tructor.py:240 in init
│ │
│ 237 │ │ │ config = AutoConfig.from_pretrained(os.path.join(model_name_or_path,'with_pr │
│ 238 │ │ else: │
│ 239 │ │ │ config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache

│ ❱ 240 │ │ self._load_model(self.model_name_or_path, config, cache_dir, **model_args) │
│ 241 │ │ │
│ 242 │ │ self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokeniz │
│ 243 │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\sentence_transformers\m │
│ odels\Transformer.py:47 in _load_model │
│ │
│ 44 │ def _load_model(self, model_name_or_path, config, cache_dir): │
│ 45 │ │ """Loads the transformer model""" │
│ 46 │ │ if isinstance(config, T5Config): │
│ ❱ 47 │ │ │ self._load_t5_model(model_name_or_path, config, cache_dir) │
│ 48 │ │ else: │
│ 49 │ │ │ self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=confi │
│ 50 │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\sentence_transformers\m │
│ odels\Transformer.py:55 in _load_t5_model │
│ │
│ 52 │ │ """Loads the encoder model from T5""" │
│ 53 │ │ from transformers import T5EncoderModel │
│ 54 │ │ T5EncoderModel._keys_to_ignore_on_load_unexpected = ["decoder.
"] │
│ ❱ 55 │ │ self.auto_model = T5EncoderModel.from_pretrained(model_name_or_path, config=conf │
│ 56 │ │
│ 57 │ def repr(self): │
│ 58 │ │ return "Transformer({}) with Transformer model: {} ".format(self.get_config_dict │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_u │
│ tils.py:2568 in from_pretrained │
│ │
│ 2565 │ │ if from_pt: │
│ 2566 │ │ │ if not is_sharded and state_dict is None: │
│ 2567 │ │ │ │ # Time to load the checkpoint │
│ ❱ 2568 │ │ │ │ state_dict = load_state_dict(resolved_archive_file) │
│ 2569 │ │ │ │
│ 2570 │ │ │ # set dtype to instantiate the model under: │
│ 2571 │ │ │ # 1. If torch_dtype is not None, we use that dtype │
│ │
│ C:\Users\sanke\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_u │
│ tils.py:462 in load_state_dict │
│ │
│ 459 │ │ │ │ │ │ "model. Make sure you have saved the model properly." │
│ 460 │ │ │ │ │ ) from e │
│ 461 │ │ except (UnicodeDecodeError, ValueError): │
│ ❱ 462 │ │ │ raise OSError( │
│ 463 │ │ │ │ f"Unable to load weights from pytorch checkpoint file for '{checkpoint_f │
│ 464 │ │ │ │ f"at '{checkpoint_file}'. " │
│ 465 │ │ │ │ "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please s │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: Unable to load weights from pytorch checkpoint file for
'C:\Users\sanke/.cache\torch\sentence_transformers\hkunlp_instructor-xl\pytorch_model.bin' at
'C:\Users\sanke/.cache\torch\sentence_transformers\hkunlp_instructor-xl\pytorch_model.bin'. If you tried to load a
PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

C:\Users\sanke\Downloads\localGPT>

I have python 3.10 installed. not sure whats causing this.

RuntimeError :out of memory

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

AssertionError: Torch not compiled with CUDA enabled

As per your request here the error I get after installing localGPT

PS C:\localGPT> python ingest.py
C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
warnings.warn("loaded more than 1 DLL from .libs:"
Loading documents from C:\localGPT/SOURCE_DOCUMENTS
Loaded 2 documents from C:\localGPT/SOURCE_DOCUMENTS
Split into 1536 chunks of text
load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: C:\localGPT
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\localGPT\ingest.py:52 in │
│ │
│ 49 │
│ 50 │
│ 51 if name == "main": │
│ ❱ 52 │ main() │
│ 53 │
│ │
│ C:\localGPT\ingest.py:46 in main │
│ │
│ 43 │ │ │ │ │ │ │ │ │ │ │ │ model_kwargs={"device": "cuda"}) │
│ 44 │ │
│ 45 │ │
│ ❱ 46 │ db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, c │
│ 47 │ db.persist() │
│ 48 │ db = None │
│ 49 │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\vectorstores\c │
│ hroma.py:422 in from_documents │
│ │
│ 419 │ │ """ │
│ 420 │ │ texts = [doc.page_content for doc in documents] │
│ 421 │ │ metadatas = [doc.metadata for doc in documents] │
│ ❱ 422 │ │ return cls.from_texts( │
│ 423 │ │ │ texts=texts, │
│ 424 │ │ │ embedding=embedding, │
│ 425 │ │ │ metadatas=metadatas, │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\vectorstores\c │
│ hroma.py:390 in from_texts │
│ │
│ 387 │ │ │ client_settings=client_settings, │
│ 388 │ │ │ client=client, │
│ 389 │ │ ) │
│ ❱ 390 │ │ chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) │
│ 391 │ │ return chroma_collection │
│ 392 │ │
│ 393 │ @classmethod
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\vectorstores\c │
│ hroma.py:159 in add_texts │
│ │
│ 156 │ │ │ ids = [str(uuid.uuid1()) for _ in texts] │
│ 157 │ │ embeddings = None │
│ 158 │ │ if self._embedding_function is not None: │
│ ❱ 159 │ │ │ embeddings = self._embedding_function.embed_documents(list(texts)) │
│ 160 │ │ self._collection.add( │
│ 161 │ │ │ metadatas=metadatas, embeddings=embeddings, documents=texts, ids=ids │
│ 162 │ │ ) │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\embeddings\hug │
│ gingface.py:148 in embed_documents │
│ │
│ 145 │ │ │ List of embeddings, one for each text. │
│ 146 │ │ """ │
│ 147 │ │ instruction_pairs = [[self.embed_instruction, text] for text in texts] │
│ ❱ 148 │ │ embeddings = self.client.encode(instruction_pairs) │
│ 149 │ │ return embeddings.tolist() │
│ 150 │ │
│ 151 │ def embed_query(self, text: str) -> List[float]: │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\InstructorEmbedding\inst │
│ ructor.py:521 in encode │
│ │
│ 518 │ │ if device is None: │
│ 519 │ │ │ device = self._target_device │
│ 520 │ │ │
│ ❱ 521 │ │ self.to(device) │
│ 522 │ │ │
│ 523 │ │ all_embeddings = [] │
│ 524 │ │ if isinstance(sentences[0],list): │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:1145 in to │
│ │
│ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1144 │ │ │
│ ❱ 1145 │ │ return self._apply(convert) │
│ 1146 │ │
│ 1147 │ def register_full_backward_pre_hook( │
│ 1148 │ │ self, │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:797 in _apply │
│ │
│ 794 │ │
│ 795 │ def _apply(self, fn): │
│ 796 │ │ for module in self.children(): │
│ ❱ 797 │ │ │ module._apply(fn) │
│ 798 │ │ │
│ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:820 in _apply │
│ │
│ 817 │ │ │ # track autograd history of param_applied, so we have to use │
│ 818 │ │ │ # with torch.no_grad():
│ 819 │ │ │ with torch.no_grad(): │
│ ❱ 820 │ │ │ │ param_applied = fn(param) │
│ 821 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │
│ 822 │ │ │ if should_use_set_data: │
│ 823 │ │ │ │ param.data = param_applied │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module. │
│ py:1143 in convert │
│ │
│ 1140 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │
│ 1141 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │
│ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ ❱ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1144 │ │ │
│ 1145 │ │ return self.apply(convert) │
│ 1146 │
│ │
│ C:\Users\Name\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda_init
.py:2 │
│ 39 in _lazy_init │
│ │
│ 236 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To use CUDA with " │
│ 237 │ │ │ │ "multiprocessing, you must use the 'spawn' start method") │
│ 238 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │
│ ❱ 239 │ │ │ raise AssertionError("Torch not compiled with CUDA enabled") │
│ 240 │ │ if _cudart is None: │
│ 241 │ │ │ raise AssertionError( │
│ 242 │ │ │ │ "libcudart functions unavailable. It looks like you have a broken build? │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled

python ingest.py gets stuck on "load INSTRUCTOR_Transformer"

After running pip install -r requirements.txt and putting a new pdf into the source_documents folder. I ran python ingest.py . The first time it installed additional reequipments but got stuck at "load INSTRUCTOR_Transformer"

Here is a full screen shot.

image

multi folders

I have about a hundred folders, and about six hundred pdf files in each folder. How to modify SOURCE_DOCUMENTS directory?
Thanks

Chroma collection langchain contains fewer than 4 elements.

Chroma collection langchain contains fewer than 4 elements.
C:\Python310\lib\site-packages\transformers\generation\utils.py:1255: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(

Error when running run_localGPT.py with the --device_type cpu flag

Actions taken:

Ran the command python run_localGPT.py --device_type cpu

Ingest.py --device_type cpu was ran before this with no issues.

Expected result:

For the "> Enter a query:" prompt to appear in terminal

Actual Result:

OSError: Unable to load weights from pytorch checkpoint file for 'C:\Users<USERNAME>/.cache\huggingface\hub\models--TheBloke--vicuna-7B-1.1-HF\snapshots\c3efe0b1dd78716c6bfc288a997026354bce441a\pytorch_model-00001-of-00002.bin' at 'C:\Users<USERNAME>/.cache\huggingface\hub\models--TheBloke--vicuna-7B-1.1-HF\snapshots\c3efe0b1dd78716c6bfc288a997026354bce441a\pytorch_model-00001-of-00002.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Additional info:

Adding from_tf=True, after this "model = LlamaForCausalLM.from_pretrained(model_path, (from_tf=True added here)"

gives me the error: OSError: TheBloke/vicuna-7B-1.1-HF does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

I have re downloaded the model multiple times now.

Mac M1 Pro — No module named transformers, Dependencies for InstructorEmbedding not found

Macbook m1 Pro, Ventura 13. Python 3.10.
When trying to start ingest.py, I get this error:

Loading documents from /Users/artur/localGPT/SOURCE_DOCUMENTS
Loaded 1 documents from /Users/artur/localGPT/SOURCE_DOCUMENTS
Split into 72 chunks of text
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 125, in __init__
    from InstructorEmbedding import INSTRUCTOR
  File "/Users/artur/instructor-embedding/InstructorEmbedding/__init__.py", line 1, in <module>
    from .instructor import *
  File "/Users/artur/instructor-embedding/InstructorEmbedding/instructor.py", line 9, in <module>
    from sentence_transformers import SentenceTransformer
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sentence_transformers/__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, ParallelSentencesDataset
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sentence_transformers/datasets/__init__.py", line 3, in <module>
    from .ParallelSentencesDataset import ParallelSentencesDataset
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sentence_transformers/datasets/ParallelSentencesDataset.py", line 4, in <module>
    from .. import SentenceTransformer
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 11, in <module>
    import transformers
ModuleNotFoundError: No module named 'transformers'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/artur/localGPT/ingest.py", line 57, in <module>
    main()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/artur/localGPT/ingest.py", line 48, in main
    embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl",
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 131, in __init__
    raise ValueError("Dependencies for InstructorEmbedding not found.") from e
ValueError: Dependencies for InstructorEmbedding not found.

The thing is, I obviously have «transformers» at this python (3.10).
And InstructorEmbedding seems fine.

BTW I can’t start the original PrivateGPT too (No module named ‚transformers’, Could not import sentence_transformers python package). Where to dig?

Cannot import name 'LlamaTokenizer'

Excuse me, I'm a newbie and I'm trying to use LocalGPT but I have this problem.
A thousand thanks!

(base) Armando@iMac-di-Martina ~ % python Documents/LocalGPT/run_localGPT.py
Traceback (most recent call last):
File "/Users/Armando/Documents/LocalGPT/run_localGPT.py", line 7, in
from transformers import LlamaTokenizer, LlamaForCausalLM, pipeline
ImportError: cannot import name 'LlamaTokenizer' from 'transformers' (/Users/Armando/anaconda3/lib/python3.10/site-packages/transformers/init.py)
(base) Armando@iMac-di-Martina ~ %

HuggingFace timeout

Hello,

A timeout exception occurred from HuggingFace when running ingest.py.

Any suggestions?

llm download error screenshot

williamj

Can other models and embeddings be used?

I encountered some problems after switching to a LlamaCpp model, so I would like to ask if it supports switching to another model?
run runLocalGPT.py error InvalidDimensionException: Dimensionality of (384) does not match index dimensionality (768)

Vulnerability in protobuf 3.20.0

Snyk found a vulnerability:

protobuf: 3.20.0
Known security vulnerability: 1
Security advisory: 0
Exploits: unavailable
Highest severity: medium
Recommendation: 4.23.2

xformers can't load C++/CUDA extensions

I'm running this on apple silicon M2 and with the CPU flag on. after ingesting, it asks me for a query, then it gives me this error and just hangs:

(my_env) ➜  localGPT git:(main) ✗ python3 run_localGPT.py --device_type cpu
Running on: cpu
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: /Users/matthewberman/Desktop/localGPT/DB
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:51<00:00, 25.91s/it]
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0 with CUDA None (you have 2.0.0)
    Python  3.11.3 (you have 3.11.3)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

Enter a query: what is this document about?
/opt/homebrew/lib/python3.11/site-packages/transformers/generation/utils.py:1255: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(

charmap codec error for input file

Hello,

This error occurs regardless of removing illegal characters.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 3765: character maps to undefined

Attached is a screenshot.

williamj

localGPT file read error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.