Code Monkey home page Code Monkey logo

openchatkit's People


adamsch1 avatar azahed98 avatar csris avatar devin-ai-integration[bot] avatar eltociear avatar justusc avatar leclem avatar lorrinwww avatar martindevans avatar orangetin avatar patrickhwood avatar qrpike avatar shirayu avatar vipulved avatar xzyaoi avatar zhangce avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openchatkit's Issues

LORA Training

Hi, it will be super nice if you provide LORA training, to reduce the computational cost. Because 8x80 A100 is too expensive

[feature]Do you support RLHF training ?

After viewing your code , I found that you haven't support RLHF training yet. Your code is mainly about distributed training using pipeline & data parallel.
Do you have the plan to support RLHF training?Do you think it is necessary?

Issue Converting Weights to Huggingface Format

I'm trying to convert the weights as per the example but running into an issue.

After mkdir huggingface_models \ && python tools/ \ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path /huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6

I'm getting this error:
Traceback (most recent call last): File "/mnt/c/Users/name/OpenChatKit/tools/", line 102, in <module> assert args.save_path is not None AssertionError --save-path: command not found --n-stages: command not found --n-layer-per-stage: command not found

I'm using Windows 11 WSL Ubuntu 22.04.2 LTS

Does it support Chinese Q&A?

ChatGPT supports multi-language question answering and reasoning, although in most cases, English answers are generated first and then translated into other languages. So I want to ask whether OpenChatKit supports direct Chinese Q&A, or do I need to use Chinese data set for training before I can conduct Chinese Q&A?

OpenChatKit Feedback Report

My question:


Bot response:


Ideal bot response:


Bot response was:

  • Factually incorrect
  • Not helpful
  • Harmful, inappropriate or unsafe

python inference/ Killed

git clone
run python inference/ --model GPT-NeoXT-Chat-Base-20B
Loading GPT-NeoXT-Chat-Base-20B to cuda:0...

run python inference/
OSError: Can't load the configuration of '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B'. If you were trying to load it from '', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B' is the correct path to a directory containing a config.json file

so cp -r GPT-NeoXT-Chat-Base-20B huggingface_models/

root@msi:~/test/OpenChatKit-main# python inference/
Loading /root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...

I am confused, it is running in docker, is the gpu not enough video memory?

Why instruction tuning calculate whole sentence loss?

I noticed that OIG dataset adds human and bot tag in each sample. In your code, you directly pack samples to max seq length and calculate cross entropy on whole sentence. Will this make the model output human, bot tag and not knowing when to stop? Does only calculate the last bot response loss be more suitable?

How I manage my own domain knowledge articles?

I want to know the format of my documents if I want to fine-tune a model on my domain knowledge.
If my documents are many complete articles should I split them into many small questions :
: questions from articles :answers from articles

or can I feed the model with original article(how can I feed the model with my whole article?).

many thanks!

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 432.00 MiB (GPU 2; 23.65 GiB total capacity; 20.88 GiB already allocated; 259.56 MiB free;

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Can't install nccl package

Describe the bug
When trying to set up the conda environment, it is failing to install the nccl package.

(base) PS D:\OpenChatKit> conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

  - nccl=

To Reproduce
Steps to reproduce the behavior:

  1. Enter conda env create -f environment.yml

Expected behavior
It should install all of the packages

Desktop (please complete the following information):

  • Windows 11 Pro

Training LLaMa?

In theory the LLaMa 30b & 65b should be much more capable than the GPT-NeoX 20b.

Does OpenChatKit support LLaMa? If not, is it on the roadmap?

I appreciate that togethercomputer might not be able to release pretrained LLaMa weights due to the licence, but it'd be great if researches can at least play with it.

Is that possible to have Chinese version of README?

Is your feature request related to a problem? Please describe.
Looks like there are not clear on installation in Chinese

Describe the solution you'd like
I can help to translate it into Chinese

Describe alternatives you've considered

Additional context

Build a docker image for openchatkit

Is your feature request related to a problem? Please describe.
A docker image might be easier for people to use.

Describe the solution you'd like
We could add a /docker folder or a simple dockerfile to the repo, so people could build the image by themselves. And maybe we could push the image to dockerhub so they could just pull and test.

how to identify the process of training?

I started a training process with 4*V100S(32GB VRAM each) at 18:00, and i got a "training starts..." prompt.
With nvidia-smi, i can see that 3 GPUs are running with utils 100%.
The next morning, the processes are still running, but nothing in output folder, neither the log message.
So, is there someway to see how the training job is going?

Exception in

I run the following command:

The result is as follows:

error: RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
Traceback (most recent call last):
File "", line 18, in
process =
File "/root/miniconda3/lib/python3.8/", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git clone /www/wwwroot/OpenChatKit/data/OIG/files' returned non-zero exit status 128.

curl is OK.
And Permission is 777 in /www/wwwroot/OpenChatKit/data/OIG/files


error de login

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Training script needs to print more progress

Feedback from a user: When running the training script, it's not clear that it's making progress. The only way to know that it's doing something is by looking at nvidia-smi.

RuntimeError: Failed to import transformers.optimization

Describe the bug
I've downloaded the corpus and the model weights, I ran the command bash training/ and I got the following:

To Reproduce
Steps to reproduce the behavior:

  1. Download weights
  2. download corpus
  3. run bash training/
  4. Bam error

Expected behavior

To fine tune the model, or get an out of memory error

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Pop os

Additional context
Add any other context about the problem here.

OpenChatKit Feedback Report

My question:


Bot response:


Ideal bot response:


Bot response was:

  • Factually incorrect
  • Not helpful
  • Harmful, inappropriate or unsafe

Is there an error in this line of code?“python inference/ --model togethercomputer/Pythia-Chat-Base-7B”No parameters specified offline directory

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.


What’s the roadmap for the project becoming a true open alternative to chatgpt?

While its capabilities are impressive on their own, stacked against ChatGPT there’s lot lacking.

For example…

  • it can’t really generate useful code
  • the code it generates is only python
  • its reasoning capabilities need a lot of improvement
  • its knowledge of the world needs a lot of improvement

To me it seems that it is good at generating coherent sentences, but massively lacks reasoning.

Hopefully this feedback doesn’t come across as harsh or critical. It seems this project is the closest there is to a ChatGPT alternative. Impressive work everyone who contributed so far. I’m rooting for this projects success and hope it will truly rival ChatGPT someday.

One issue on env ResolvePackageNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro OpenChatKit % conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed


  • cupy=10.4.0
  • nccl=
  • faiss-gpu=1.7.2
  • cudatoolkit=11.6.0

To Reproduce
Steps to reproduce the behavior:

Expected behavior


Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context
Add any other context about the problem here.

Add documentation for running inference on multiple GPUs

While trying out python inference/ --retrieval --model togethercomputer/GPT-NeoXT-Chat-Base-20B
I got this error on A100 GPU:

File "inference/", line 185, in <module>
  File "inference/", line 173, in main
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/", line 217, in onecmd
    return func(arg)
  File "inference/", line 87, in do_say
    output = self._model.do_inference(
  File "inference/", line 32, in do_inference
    outputs = self._model.generate(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/autograd/", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/", line 1326, in generate
    return self.sample(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/", line 1944, in sample
    outputs = self(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/", line 619, in forward
    outputs = self.gpt_neox(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/", line 511, in forward
    outputs = layer(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/", line 115, in forward
    qkv = self.query_key_value(hidden_states)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Conda takes too long to install dependencies

Describe the bug
One user reported conda env create -f environment.yml taking over 60 minutes. We need a better solution.

To Reproduce
Steps to reproduce the behavior:

  1. Run conda env create -f environment.yml from the root of the repo.

Expected behavior
Should finish in a "reasonable" amount of time.

Add print statements to `pretrained/GPT-NeoX-20B/` to show progress

Describe the bug
pretrained/GPT-NeoX-20B/ can take a long time to prepare the base model. It should print progress as it's converting.

To Reproduce
Steps to reproduce the behavior:

  1. run python pretrained/GPT-NeoX-20B/ from the root of the repo.

Expected behavior
The script should print progress.

FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/'

Ubuntu Ubuntu 22.04.2 LTS
After downloading the model and now trying to convert:

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python3.10 tools/ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6
loading stage 0
Traceback (most recent call last):
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/", line 110, in <module>
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/", line 43, in load_decentralized_checkpoint
    checkpoint = torch.load(os.path.join(input_path, f'prank_{i}'), map_location=torch.device("cpu"))
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/'

Any ideas?

Can't prepare pretrained model

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python pretrained/GPT-NeoX-20B/
Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 613/613 [00:00<00:00, 272kB/s]
Downloading tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 156/156 [00:00<00:00, 55.0kB/s]
Downloading vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03M/1.03M [00:01<00:00, 748kB/s]
Downloading merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 555kB/s]
Downloading tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02M/2.02M [00:01<00:00, 1.61MB/s]
Downloading special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90.0/90.0 [00:00<00:00, 39.5kB/s]
Downloading pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56.4k/56.4k [00:00<00:00, 3.44MB/s]
Downloading pytorch_model-00001-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 883M/883M [01:16<00:00, 12.1MB/s]
Downloading pytorch_model-00002-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00003-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00004-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00005-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00006-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00007-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00008-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00009-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00010-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00011-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00012-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00013-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00014-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00015-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:19<00:00, 11.5MB/s]
Downloading pytorch_model-00016-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00017-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00018-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00019-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00020-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00021-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00022-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00023-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00024-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00025-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00026-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00027-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00028-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00029-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00030-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 12.0MB/s]
Downloading pytorch_model-00031-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:17<00:00, 11.7MB/s]
Downloading pytorch_model-00032-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00033-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00034-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00035-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00036-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:26<00:00, 10.5MB/s]
Downloading pytorch_model-00037-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:24<00:00, 10.7MB/s]
Downloading pytorch_model-00038-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00039-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:29<00:00, 10.2MB/s]
Downloading pytorch_model-00040-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00041-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:21<00:00, 11.2MB/s]
Downloading pytorch_model-00042-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:25<00:00, 10.6MB/s]
Downloading pytorch_model-00043-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00044-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:23<00:00, 10.9MB/s]
Downloading pytorch_model-00045-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576M/576M [00:50<00:00, 11.9MB/s]
Downloading pytorch_model-00046-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 591M/591M [00:54<00:00, 11.3MB/s]

(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$ ls
config.json  special_tokens_map.json  tokenizer_config.json  tokenizer.json
(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$ 

However /home/georgi/.cache/huggingface/transformers is 41.3 GB. Any ideas what goes wrong?

Exception in

python data/OIG/
File "data/OIG/", line 27, 'rb') as infile,
SyntaxError: invalid syntax

Tell me why, thank you!

Bug when running inference with retrieval augmented model

Describe the bug
Using retrieval-augmented models, a sequence of prompts leads to a runtime error (size mismatch between two tensors).

To Reproduce
Steps to reproduce the behavior:

  1. After downloading the Wikipedia index, run inference using python inference/ --retrieval
  2. In the OpenChatKit Shell, run the following set of queries:
>>> Where is Bern?
>>> Where is Switzerland?
>>> Is Switzerland in Europe or in America?

The queries lead to the following error:

Traceback (most recent call last):
  File "/home/fsuser/OpenChatKit/inference/", line 185, in <module>
  File "/home/fsuser/OpenChatKit/inference/", line 181, in main
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/", line 217, in onecmd
    return func(arg)
  File "/home/fsuser/OpenChatKit/inference/", line 87, in do_say
    output = self._model.do_inference(
  File "/home/fsuser/OpenChatKit/inference/", line 32, in do_inference
    outputs = self._model.generate(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/autograd/", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/", line 1326, in generate
    return self.sample(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/", line 1944, in sample
    outputs = self(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/", line 619, in forward
    outputs = self.gpt_neox(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/", line 511, in forward
    outputs = layer(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/", line 153, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/", line 220, in _attn
    attn_scores = torch.where(causal_mask, attn_scores, mask_value)
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2247) at non-singleton dimension 3

Setup using mamba in root dir: mamba env create -f environment.yml


  • OS: Ubuntu 20.04.5 LTS
  • 1x A100 80G GPU
  • 8 vCPU with 128GB RAM

Cupy error while training (`CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal`)

Describe the bug
The bash script to train the model does not work because of a Cupy error:

(OpenChatKit-Test) user@pc:~/OpenChatKit$ bash training/
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/", line 358, in <module>
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
Traceback (most recent call last):
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
Traceback (most recent call last):
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Initialize NCCLCommunicator: < pipeline_group_0 >; rank: 0
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/", line 358, in <module>
  File "/home/user/OpenChatKit/training/", line 275, in main
  File "/home/user/OpenChatKit/training/comm/", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/", line 31, in __init__
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal

To Reproduce
Steps to reproduce the behavior:

  1. Run code on WSL-Ubuntu in a Conda Env
  2. Run the bash script bash training/
  3. The error above is produced

Expected behavior
The code is supposed to execute.


Desktop (please complete the following information):

  • OS: Windows 11
  • Ubuntu-WSL
  • Miniconda
  • Nvidia GeForce 3060 (Could this be the issue?)

Additional context
Also, the previous steps to download the data and weights also gave me errors. These steps:

python data/OIG/
python pretrained/GPT-NeoX-20B/

Ended after a couple minutes/hours with the error message "Killed". I was able to acquire the data sets with a simple wget command but I thought that was weird too.

can not create conda environment

Describe the bug
Followed the instructions but could not get

conda env create -f environment.yml

to work because of

  - cudatoolkit=11.6.0
  - faiss-gpu=1.7.2
  - nccl=
  - cupy=10.4.0

To Reproduce
Steps to reproduce the behavior:
Intall miniconda
conda env create -f environment.yml

Expected behavior
Create an environment called OpenChatKit but can't create

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Resources required to launch

What is minimum specification to launch (but not train) it on local machine with normal speed?
Thank you

Having issue on EnvironmentFileNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

To Reproduce
Steps to reproduce the behavior:

  1. Install miniconda3
  2. run " conda env create -f environment.yml"

Expected behavior
Should be move to next step

(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

Desktop (please complete the following information):

  • OS: [MacOS]

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.