nomic-ai / gpt4all Goto Github PK

gpt4all: run open-source LLMs anywhere

License: MIT License

Python 10.87% Shell 0.31% CMake 3.83% C++ 52.79% Qt Script 0.23% C 1.10% QML 18.53% CSS 0.02% Makefile 0.53% Go 0.68% JavaScript 4.02% C# 3.39% PowerShell 0.12% Java 3.52% Batchfile 0.06%

llm-inference

gpt4all's Introduction

GPT4All

Open-source large language models that run locally on your CPU and nearly any GPU

GPT4All Models • GPT4All Documentation • Discord

🦜️🔗 Official Langchain Backend

GPT4All is made possible by our compute partner Paperspace.

Run on an M1 macOS Device (not sped up!)

GPT4All: An ecosystem of open-source on-edge large language models.

GPT4All is an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs and any GPU. Note that your CPU needs to support AVX or AVX2 instructions.

Learn more in the documentation.

A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models.

What's New (Issue Tracker)

Latest Release
October 19th, 2023: GGUF Support Launches with Support for:
- Mistral 7b base model, an updated model gallery on gpt4all.io, several new local code models including Rift Coder v1.5
- Nomic Vulkan support for Q4_0 and Q4_1 quantizations in GGUF.
- Offline build support for running old versions of the GPT4All Local LLM Chat Client.
September 18th, 2023: Nomic Vulkan launches supporting local LLM inference on AMD, Intel, Samsung, Qualcomm and NVIDIA GPUs.
August 15th, 2023: GPT4All API launches allowing inference of local LLMs from docker containers.
July 2023: Stable support for LocalDocs, a GPT4All Plugin that allows you to privately and locally chat with your data.

Chat Client

Run any GPT4All model natively on your home desktop with the auto-updating desktop chat client. See GPT4All Website for a full list of open-source models you can run with this powerful desktop application.

Direct Installer Links:

Find the most up-to-date information on the GPT4All Website

Chat Client building and running

Follow the visual instructions on the chat client build_and_run page

Bindings

Integrations

🗃️ Weaviate Vector Database - module docs

Contributing

GPT4All welcomes contributions, involvement, and discussion from the open source community! Please see CONTRIBUTING.md and follow the issues, bug reports, and PR markdown templates.

Check project discord, with project owners, or through existing issues/PRs to avoid duplicate work. Please make sure to tag all of the above with relevant project identifiers or your contribution could potentially get lost. Example tags: backend, bindings, python-bindings, documentation, etc.

GPT4All 2024 Roadmap

To contribute to the development of any of the below roadmap items, make or find the corresponding issue and cross-reference the in-progress task.

Each item should have an issue link below.

Chat UI Language Localization (localize UI into the native languages of users)
- Chinese
- German
- French
- Portuguese
- Your native language here.
UI Redesign: an internal effort at Nomic to improve the UI/UX of gpt4all for all users.
- Design new user interface and gather community feedback
- Implement the new user interface and experience.
Installer and Update Improvements
- Seamless native installation and update process on OSX
- Seamless native installation and update process on Windows
- Seamless native installation and update process on Linux
Model discoverability improvements:
- Support huggingface model discoverability
- Support Nomic hosted model discoverability
LocalDocs (towards a local perplexity)
- Multilingual LocalDocs Support
  - Create a multilingual experience
  - Incorporate a multilingual embedding model
  - Specify a preferred multilingual LLM for localdocs
- Improved RAG techniques
  - Query augmentation and re-writing
  - Improved chunking and text extraction from arbitrary modalities
    - Custom PDF extractor past the QT default (charts, tables, text)
  - Faster indexing and local exact search with v1.5 hamming embeddings and reranking (skip ANN index construction!)
- Support queries like 'summarize X document'
- Multimodal LocalDocs support with Nomic Embed
- Nomic Dataset Integration with real-time LocalDocs
  - Include an option to allow the export of private LocalDocs collections to Nomic Atlas for debugging data/chat quality
  - Allow optional sharing of LocalDocs collections between users.
  - Allow the import of a LocalDocs collection from an Atlas Datasets
    - Chat with live version of Wikipedia, Chat with Pubmed, chat with the latest snapshot of world news.
First class Multilingual LLM Support
- Recommend and set a default LLM for German
- Recommend and set a default LLM for English
- Recommend and set a default LLM for Chinese
- Recommend and set a default LLM for Spanish
Server Mode improvements
- Improved UI and new requested features:
  - Fix outstanding bugs and feature requests around networking configurations.
  - Support Nomic Embed inferencing
  - First class documentation
  - Improving developer use and quality of server mode (e.g. support larger batches)

Technical Reports

📗 Technical Report 3: GPT4All Snoozy and Groovy

📗 Technical Report 2: GPT4All-J

📗 Technical Report 1: GPT4All

Citation

If you utilize this repository, models or data in a downstream project, please consider citing it with:

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

gpt4all's People

Contributors

Stargazers

Watchers

Forkers

danielbarankin cat-stack-boop zekijohn atlas3dss uakbr gladiopeace thindery chatgpt-cdrone andridns andrewsinnovations friedri1970 marcus-arcadius realworga nick-verida nlothian starlab-llm sytelus preshy enkaybit dinodefend dbhurley josephcdonaldson dan255 ashioyajotham llegomark leofn codeaudit salehhindi daryl149 omids hadrianpaulo pcedison claysauruswrecks jonarod jboru shengzing fabiorizzomatos adambrusselback-kannondata ggorgun fdimarh duppybad billyan2018 aistobascistox cheng-chi shawokou123 nztinversive techthiyanes sekmet jiyfeng lts-rad dumpmemory kawdoco zafergraph big-aaron airwide-chatgpt makotome mazzzystar wkf dimitryb eskyee sbmsr amaroarcast wordsofwalterb hirajanwin ibibek christianpugh xobjdump legendbc thawab8 alvations johnatag bananemure bluebambu killvxk lollipop190 touristshaun daniel1989 mariiaromaniuk duyvuleo chaunceydust babyblue26 fastflair iamwilhelm luofengwei superqd gvc0461082002 evdcush baris-unver alex-songs gpt42 h-k-nyosu cjharmath 0xo0o0 leoshimo vxfemboy jagadishb1409 maminian hhy5277 mesh-4 phantomxm2021

gpt4all's Issues

Stuck in an infinite loop

I followed the README and downloaded the bin file, copied it into the chat folder and ran ./gpt4all-lora-quantized-linux-x86.
Then started asking questions. After a few questions I asked for a joke and it has been stuck in a loop repeating the same lines over and over (maybe that's the joke! it's making fun of me!).

I can share the seed and the exact questions that I asked if that would help.

How can I provide access to local files?

I would like to use that AI for translating files of a whole repository from Python to TypeScript. ChatGPT can do that, but just file by file.

How can I provide (limited) access to a local folder of my computer (mac)?

git clone permission denied

$ git clone --recurse-submodules [email protected]:nomic-ai/gpt4all.git
...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

But this worked OK:
$ git clone --recurse-submodules https://github.com/nomic-ai/gpt4all.git
Cloning into 'gpt4all'...
remote: Enumerating objects: 315, done.
...
Submodule path 'transformers': checked out 'cae78c46d658a8e496a815c2ee49b9b178fb9c9a'

Use your own data

I know it has been covered elsewhere, but people need to understand is that you can use your own data but you need to train it.

So suggesting to add write a little guide so simple as possible.

gather sample.data
train sample.data
use chatbot with sample.data

There are thousand and thousand peoples waiting for this.

More cost effective hosting of pretrained model

Hi, just wanted to say that the quantised model in the README is using S3 a 4GB download at 9c per GB egress means that you (or nomic) are getting charged $0.36 per download. If you get 1000 downloads thats around $360USD.

Perhaps it might be worthwhile looking at alternative hosting like Cloudflare R2 (free egress) or using a mirror link / torrents (free)

https://s3.amazonaws.com/static.nomic.ai/gpt4all/models/gpt4all-lora-quantized.bin

Example is not valid iambic pentameter

Please add a LICENSE file

Adding MIT License to the repository will make it clear that anyone can use, copy and modify this software for any purpose without any restrictions. It also makes it easier for people who want to contribute or collaborate with others on open-source projects using these licenses.

Disclaimer: the above comment was generated using gpt4all ;)

Source for the chat binaries?

I looked around but didn't see it, where is the source for the chat interface?

Issue using generate.py

Hi, I am trying to build a Flask API version of generate.py. Before that, I tried to run it on my server and encountered an error, which is probably linked to the default YAML file distributed.

model/tokenizer

model_name: # REPLACE HERE with the base llama model
tokenizer_name: # REPLACE HERE with the llama tokenizer
lora: true
lora_path: "/nomic-ai/gpt4all-lora"

max_new_tokens: 512
temperature: 0
prompt: null

The script generates this error :

╭─────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ /Users/michel/micromamba/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:23 │
│ 9 in hf_raise_for_status │
│ │
│ 236 │ │
│ 237 │ """ │
│ 238 │ try: │
│ ❱ 239 │ │ response.raise_for_status() │
│ 240 │ except HTTPError as e: │
│ 241 │ │ error_code = response.headers.get("X-Error-Code") │
│ 242 │
│ │
│ /Users/michel/micromamba/lib/python3.9/site-packages/requests/models.py:1021 in │
│ raise_for_status │
│ │
│ 1018 │ │ │ ) │
│ 1019 │ │ │
│ 1020 │ │ if http_error_msg: │
│ ❱ 1021 │ │ │ raise HTTPError(http_error_msg, response=self) │
│ 1022 │ │
│ 1023 │ def close(self): │
│ 1024 │ │ """Releases the connection back to the pool. Once this method has been │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
HTTPError: 404 Client Error: Not Found for url:
https://huggingface.co/gpt4all-lora/resolve/main/config.json

Can you please help me?

By the way, gpt4all-lora-quantized.bin is perfectly working using ./gpt4all-lora-quantized-OSX-m1

how can I continue to talk this is annoying cut off ?

Link to the full, non-quantized model and instructions for running on GPU?

Is this provided anywhere? Didn't see it in the repo or technical report.

Small date typo in paper

Year should probably be 2023:

"We collected roughly one million prompt response pairs using the GPT-3.5-Turbo OpenAI API between March 20, 2022 and March 26th, 2022"

Error DeepSpeed is not installed => run `pip3 install deepspeed` or build it from source when trying to run accelerate launch

And when I try to install deepspeed it throws:

error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      test.c
      LINK : fatal error LNK1181: no se puede abrir el archivo de entrada 'aio.lib'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\fer\AppData\Local\Temp\pip-install-rl2m083a\deepspeed_0a392c06e34c405d970b33e0372ff5a3\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\fer\AppData\Local\Temp\pip-install-rl2m083a\deepspeed_0a392c06e34c405d970b33e0372ff5a3\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

This is happening on windows 11

Finetuning Interface: How to train for custom data?

I have a data set I want to train or fine tune on my data set. So how I can do this ?

Distributed package doesn't have NCCL?

Hi y'all, thank you for releasing this work!

I was trying to re-run the training using the base finetuning.yml (+1 on #30 btw) but got this error below, did I skip a step or do something wrong?

RuntimeError    : raise RuntimeError("Distributed package doesn't have NCCL " "built in")Distributed package doesn't have NCCL built in

RuntimeError: Distributed package doesn't have NCCL built in
    torch.distributed.init_process_group(backend="nccl", **kwargs)
  File "/Users/ericnograles/Library/Python/3.9/lib/python/site-packages/torch/distributed/distributed_c10d.py", line 895, in init_process_group
    PartialState(cpu, **kwargs)
    raise RuntimeError("Distributed package doesn't have NCCL " "built in")
  File "/Users/ericnograles/Library/Python/3.9/lib/python/site-packages/accelerate/state.py", line 117, in __init__
RuntimeError: Distributed package doesn't have NCCL built in
    self.state = AcceleratorState(

LoRA Epochs

Is "Epoch 2" the more recent / final LoRA?

Both are exactly 8.41MB. That does not seem right.

https://huggingface.co/nomic-ai/gpt4all-lora/tree/main
vs
https://huggingface.co/nomic-ai/gpt4all-lora-epoch-2/tree/main

Also is there a link to a merged GPU model and to the 16bit ggml model?

Is this a typo?

git submodule configure

Cannot decompress the bin flie

The module which is in bin form and my laptop is Mac with M1.
At first I just click it to decompress it with the routine app 'the unarchiver'.It failed.
Then I used the terminal to decompress ,use the order 'chmod a+x filename.bin',but failed again.

How to execute gpt4all from bash script or Node.js process instead of interactive prompt?

I went through the readme on my Mac M2 and brew installed python3 and pip3. Then replaced all the commands saying python with python3 and pip with pip3. I got to the point of running this command:

python generate.py --config configs/generate/generate.yaml --prompt "Write a script to reverse a string in Python"

Setting up model
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
                    ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1160, in hf_hub_download
    metadata = get_hf_file_metadata(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1501, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/opt/homebrew/Cellar/[email protected]/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6424332a-26533405190ac1c961e12ed4)

Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/me/gpt/gpt4all/generate.py", line 52, in <module>
    model, tokenizer = setup_model(config)
                       ^^^^^^^^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/generate.py", line 20, in setup_model
    model = AutoModelForCausalLM.from_pretrained(config["model_name"], device_map="auto", torch_dtype=torch.float16)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/models/auto/configuration_auto.py", line 905, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/configuration_utils.py", line 573, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/configuration_utils.py", line 628, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "/Users/me/gpt/gpt4all/transformers/src/transformers/utils/hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

What am I missing, what do I do now? How do I get it to generate some output without using the interactive prompt? I was able to successfully download that 4GB file and put it in the chat folder and run the interactive prompt, but I would like to get this to be runnable as a shell or Node.js script, so I can programmatically make some calls. How can it be done?

Thanks!

P.S. For clarity, I basically want to do something like this:

gpt4all --prompt "List some dogs" > output.md

API Access?

I don't see any API access?
I'd like a RESTful interface rather than using the webpage. Is there anyway to do this now? Any plans for adding it soon?

Thanks.

failed to open 'ggmI-alpaca-7b-q4.bin'

As it showed above,I cannot run the chat well. It s the problem when I open the chat.exec.

faq.md

have a faq doc for repetitive questions in the discord

Process gets killed

When running the osx version on an M1 Pro with 32GB Ram under OS Ventura 13.2.1 (22D68) the application crashs direct after the start. See screenshot attached.

gpt4all-lora-quantized.bin is located in ./chat/

Gets stuck in infinte loops when requested to generate code

I used ./gpt4all-lora-quantized-OSX-m1

> Give me a C++ algorithm for the integer square root (isqrt).
The following is an implementation of iSqrt() function which returns the integer Square Root of its argument, in terms of unsigned long ints on 32-bit systems. It uses Newton's method to find a better approximation as it iterates through each iteration until convergence or exceeding maximum number of iterations allowed (10).
```cpp
#include <iostream>
using namespace std; // for cout << endl;
unsigned long int iSqrt( unsigned long int n ) {
    if (n <= 2) return n *= 3/2 + sqrt((double)(n*n)); else{
        double x = pow(-1.0, floor(.5*(log(fabs(n)))+1./4))/(sqrt(6)*pow(x,3)/pi); // Newton's method for integer square root calculation;
    unsigned long int result= (unsigned)floor((double)(x)); 	// convert double to signed integer.
        if ((result-fabs(n/2+1))) { x = pow(-1.0,( floor(.5*(log(fabs(x)))) + .75))/(sqrt(6)*pow(x,3)/pi); } // Newton's method for double square root calculation;
        if (result-floor((double)(n/2+1))) { x = pow(-1.0,( floor(.5*(log(fabs(x)))) + .75))/(sqrt(6)*pow(x,3)/pi); } // Newton's method for double square root calculation;
        if (result-floor((double)(n/2+1))) { x = pow(-1.0,( floor(.5*(log(fabs(x)))) + .75))/(sqrt(6)*pow(x,3)/pi); } // Newton's method for double square root calculation;
 ... keeps repeating the last line until stopped ...

Another session:

> Give me a C++ algorithm for the integer square root (isqrt).
Here's one way to implement iSqrt in C++, using an array of integers and dynamic programming techniques:
```cpp
int sqr(const int& num) { // returns squared value, not just a number
    constexpr auto MAX = (std::numeric_limits<decltype>::max)*2;// max possible square root for any integer <= this limit

    if (num < 0 || std::abs(int64_t{1}) == num) { // handle special cases like -ve numbers and 1 itself
        return sqr(-std::fmod<decltype>(-std::floor, int64_t{2}, MAX));//return floor of the square root for negative number or just take it as is if positive.
    } else { // handle normal cases like nonnegative numbers > 1 (iSqrt)
        constexpr auto sqr = std::pow(std::fabs<decltype>(num),2);//return squared value of the input integer number for isqrt calculation

        if ((int64_t{0} == num && int64_t{1}) || (sqr <= MAX)) { //handle special cases like 0 and sqrt(1) which are both irrational numbers.
            return std::fmod<decltype>(-std::floor, sqr);//return floor of the square root for these two values as they have no rational value to represent them in iSqrt calculation.
        } else { //handle normal cases like nonnegative integers > 1 (iSquare)
            constexpr auto num2 = std::fabs<decltype>(num);//return absolute value of the input integer number for sqr(x).

            if ((sqr <= MAX && int64_t{0} == num || (int64_t{1}) > 3) //handle special cases like sqrt(2), which is irrational and has no rational value to represent it in iSqrt calculation.
                return std::fmod<decltype>(-std::floor, int64_t{num/int64_t{1}});//return floor of the square root for these two values as they have irrational numbers and no rational value to represent them in iSqrt calculation.
            else { //handle normal cases like nonnegative integers > 2 (iSquare)
                constexpr auto num3 = std::fabs<decltype>(num);//return absolute value of the input integer number for sqr(x).

                if ((sqr <= MAX && int64_t{0} == num || (int64_t{1}) > 2) //handle special cases like sqrt(3), which is irrational and has no rational value to represent it in iSqrt calculation.
                    return std::fmod<decltype>(-std::floor, int64_t{num/int64_t{1}});//return floor of the square root for these two values as they have irrational numbers and no rational value to represent them in iSqrt calculation.
                else { //handle normal cases like nonnegative integers > 3 (iSquare)
                    constexpr auto num4 = std::fabs<decltype>(num);//return absolute value of the input integer number for sqr(x).

easy server up

make it simple to get a fastapi server running with a chat endpoint

No windows binary

the file listed is not a binary that runs in windows

cd chat;./gpt4all-lora-quantized-linux-x86 on Windows/Linux

Results are worse than ggml-alpaca-7b-native-q4.bin

Tested controversial topics (for example, instructions for murdering or torturing) and gpt4all-lora-quantized.bin seems to mimic much of ChatGPT's excuses for not answering on such questions. While ggml-alpaca-7b-native-q4.bin is answering happily and on-topic.

Also consider increasing scanf buffer: antimatter15/alpaca.cpp#119

Resuming the process

if i quit the process (ctrl+c), is there a way to resume it since the process keeps running.

Generations contains Instructions

Hi, great work!

Btw I noticed sometimes it generates instructions by itself. Is this because of the dataset format used for training?

![image](https://user-images.githubusercontent.com/90372299/228519900-65ca8ae2-9c94-48e8-8b86-56444156bb63.png)

I tryna to load model,but failed.It still said that my computer cannot open the bin s file , how can I solve it.I'll be appreciate if u help me!

cannot rerun the model

I tried to rerun the model (it worked fine at the first time)
and i got this error:

main: seed = ****76542
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: failed to open 'gpt4all-lora-quantized.bin'
main: failed to load model from 'gpt4all-lora-quantized.bin'

Illegal Instruction on 2012 Intel i7

Hi there,

I'm trying to run this on a Mac-mini i7 (4-core/16GB) running Ubuntu, (I know this is going to perform really badly!)

When I run it I get Illegal Instruction Core Dumped

I assume this is because this old CPU is missing something, perhaps AVX?

Does anyone know what I would change in the makefile for gpt4all.cpp to get around this?

Thanks!

Verborrea answering simple coding request

write x86 assembler code compilable in MASM to make an "hello world" program
The response is the following assembly language source file that can be compiled using MASM (Microsoft Macro Assembler) and executed on a 32-bit Windows system. This example shows how to print out “Hello World” in Assembly Language, which demonstrates basic programming concepts such as input/output operations:

    .model flat, prop=0x18c
    
    section         = "hello"      ; Section name for the code segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "data"        ; Section name for the data segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "bss"        ; Section name for the bss segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "heap"        ; Section name for the heap segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "stack"        ; Section name for the stack segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "text"        ; Section name for the code segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "data"        ; Section name for the data segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "bss"        ; Section name for the bss segment
    flags            = secsig32   ; Flags that indicate this is a 64-bit executable (segsize/segvsize)
                          | segalign, nopsz=1024       // Segment alignment and size in bytes.
    
section         = "heap"        ; Section name for the heap segment
    flags            = secsig32   ; Flags that indicate this is a 64-^C

Incorrect run command

The executable appear to require the model to be in the current directory:

(venv) ➜  gpt4all git:(main) ✗ ./chat/gpt4all-lora-quantized-OSX-m1
main: seed = 1680048739
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: failed to open 'gpt4all-lora-quantized.bin'
main: failed to load model from 'gpt4all-lora-quantized.bin'

crash when prompt is longer than 256 characters

On Windows 11, use the prompt:

Create a detailed step by step plan for a kichen robot to make toast. Each step should be an individual action in machine readable format. The available actions: Grab(object), Move(object, next_to), Drop(object), Push(button), Cut(object, number_of_pieces).

Debugger output:

(5620.c8c): Security check failure or stack buffer overrun - code c0000409 (!!! second chance !!!)
Subcode: 0x8 FAST_FAIL_RANGE_CHECK_FAILURE 
gpt4all_lora_quantized_win64+0x23361:
00007ff6`7dfe3361 cd29            int     29h
0:000> k
 # Child-SP          RetAddr               Call Site
00 00000048`2b2fe2d0 00007ff6`7dfe333e     gpt4all_lora_quantized_win64+0x23361
01 00000048`2b2fe300 00007ff6`7dfcb760     gpt4all_lora_quantized_win64+0x2333e
02 00000048`2b2fe330 00007ff6`7dfe3adc     gpt4all_lora_quantized_win64+0xb760
03 00000048`2b2ffb30 00007ff8`0f923db1     gpt4all_lora_quantized_win64+0x23adc
04 00000048`2b2ffb70 00007ff8`101f32a1     KERNEL32!BaseThreadInitThunk+0x21
05 00000048`2b2ffba0 00000000`00000000     ntdll!RtlUserThreadStart+0x21

cleaner gpu interface

make it very easy for people to get up and running with the model on gpu

Setting up llama models for generate

I can't seem to be able to find llama model from HF that works with generate.py. The closest I got was this error:

Setting up model
Traceback (most recent call last):
  File "/home/ubuntu/gpt4all/generate.py", line 52, in <module>
    model, tokenizer = setup_model(config)
  File "/home/ubuntu/gpt4all/generate.py", line 20, in setup_model
    model = AutoModelForCausalLM.from_pretrained(config["model_name"], device_map="auto", torch_dtype=torch.float16)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 917, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 623, in __getitem__
    raise KeyError(key)
KeyError: 'llama'

Any tips or tricks what does the config.json need to look like for this to work?

Beginner question: What setting of GPU are you using?

How many GPUs?

Thank you very much!

Illegal instruction (core dumped) on Linux Virtual Machine (KVM)

user@gpt4:~/gpt4all/chat$ ./gpt4all-lora-quantized-linux-x86 
main: seed = 1680120667
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
Illegal instruction (core dumped)

dmesg shows:

[  104.211520] systemd[1]: systemd 249.11-0ubuntu3.7 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[  104.211578] systemd[1]: Detected virtualization kvm.
[  104.211582] systemd[1]: Detected architecture x86-64.
[ 5620.273116] show_signal: 22 callbacks suppressed
[ 5620.273119] traps: gpt4all-lora-qu[17654] trap invalid opcode ip:423d62 sp:7ffe451f4828 error:0 in gpt4all-lora-quantized-linux-x86[400000+55000]
[ 5647.501626] traps: gpt4all-lora-qu[17668] trap invalid opcode ip:423d62 sp:7fffdfc29678 error:0 in gpt4all-lora-quantized-linux-x86[400000+55000]

strace tail shows:

...
loading libs, reading gpt4all-lora-quantized.bin
...
brk(0x13d5000)                          = 0x13d5000
brk(0x13f6000)                          = 0x13f6000
read(3, "\0\0\340\245\244\2\0\0\0\321\220\3\0\0\0\341\276\266\3\0\0\0\342\236\226\3\0\0\0\345\272\247"..., 8191) = 8191
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x423d62} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

ILL_ILLOPN = Illegal operand. I suppose some CPU instruction is not available.

The CPU is AMD Epyc 7313, running Ubuntu 22.04 inside of a VM.

From the VM, the following cpu flags are enabled:

    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni
                          cx16 x2apic hypervisor cmp_legacy 3dnowprefetch vmmcall
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full

Unfortunately I'm not very experienced with VM's, however I would like to run GPT chat on a server.

Is it possible to get the source of gpt4all-lora-quantized-linux-x86 to recompile?

Feature request: Docker Image

There is no configs/train/finetune-7b.yaml

There is no configs/train/finetune-7b.yaml in the repo. Which is required in the Readme for train.

Fails to load ggml weights

gpt4all fails to load ggml weights (both old and new formats) downloaded for llama.cpp and alpaca.cpp. Is there a way to convert them to a format gpt4all understands, or is this a bug?

ModuleNotFoundError: No module named 'torch._six'

Thanks for putting this repo together!

I am encountering an issue when running this command - python generate.py --config configs/generate/generate.yaml --prompt "Write a script to reverse a string in Python":

Traceback (most recent call last):
  File "/home/Owner/Developer/gpt4all/generate.py", line 2, in <module>
    from peft import PeftModelForCausalLM
  File "/home/Owner/Developer/gpt4all/peft/src/peft/__init__.py", line 22, in <module>
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
  File "/home/Owner/Developer/gpt4all/peft/src/peft/mapping.py", line 16, in <module>
    from .peft_model import (
  File "/home/Owner/Developer/gpt4all/peft/src/peft/peft_model.py", line 22, in <module>
    from accelerate import dispatch_model, infer_auto_device_map
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/accelerate/accelerator.py", line 27, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/accelerate/checkpointing.py", line 24, in <module>
    from .utils import (
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/accelerate/utils/__init__.py", line 96, in <module>
    from .other import (
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/accelerate/utils/other.py", line 29, in <module>
    from deepspeed import DeepSpeedEngine
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/__init__.py", line 14, in <module>
    from . import module_inject
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/module_inject/__init__.py", line 1, in <module>
    from .replace_module import replace_transformer_layer, revert_transformer_layer, ReplaceWithTensorSlicing
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 15, in <module>
    from ..runtime.zero import GatheredParameters
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/__init__.py", line 6, in <module>
    from .partition_parameters import ZeroParamType
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 22, in <module>
    from .linear import LinearModuleForZeroStage3, zero3_linear_wrap
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/zero/linear.py", line 20, in <module>
    from deepspeed.runtime.utils import noop_decorator
  File "/home/Owner/anaconda3/lib/python3.9/site-packages/deepspeed/runtime/utils.py", line 19, in <module>
    from torch._six import inf
ModuleNotFoundError: No module named 'torch._six'

Couple of different errors

First it does not seem to want to load the .bin model from the link

so i thought okay lets atleast see if the generate function works
no dice there either

its asking for a hugging face login....which was not part of the instructions - eager to see how this compares to base llama which is quite fun to play with and fine tune.

Is it possible to run in iOS?

How many GPU hours were used to train the model?

My initial estimates from TRAINING_LOG.md and Meta's research paper:

Test run: either >0.05 or >59 GPU-hours
Full model: either >0.13 or >135 GPU-hours

I want to determine how costly it is to fine-tune the 7B model for local execution. My GPU trains models ~10x slower than one A100.

TypeError: len() of a 0-d tensor

The training dataset (gpt4all_curated_data_without_p3_2022_03_27.tar.gz) has an issue with an empty prompt

    raise TypeError("len() of a 0-d tensor")                                                                                                                                                                                                                  TypeError: len() of a 0-d tensor

It's a kind of minor problem but you can't train with it because it crashes. Simple fix to trim it out:

egrep -v 'prompt": ""' data.jsonl

I'd submit a pull request but it's not data in the repo.

Is there any chance to use it as a JavaScript library?

As a JS developer I’d like to use this project at my personal open sourced library. But unfortunately I don’t speak Python and have no idea how to use and change it. Is there any chance that the project will provide WebSocket or at least REST API?

Mac/Intel Chip

Do you have also an executable for Mac/Intel Chip?

Could not read from remote repository.

I got the following error on Windows 11, when run this command:
F:>git clone --recurse-submodules [email protected]:nomic-ai/gpt4all.git
Cloning into 'gpt4all'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.