Code Monkey home page Code Monkey logo

openchatkit's People

Contributors

adamsch1 avatar azahed98 avatar csris avatar devin-ai-integration[bot] avatar eltociear avatar justusc avatar leclem avatar lorrinwww avatar martindevans avatar orangetin avatar patrickhwood avatar qrpike avatar shirayu avatar vipulved avatar xzyaoi avatar zhangce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openchatkit's Issues

LORA Training

Hi, it will be super nice if you provide LORA training, to reduce the computational cost. Because 8x80 A100 is too expensive

[feature]Do you support RLHF training ?

After viewing your code , I found that you haven't support RLHF training yet. Your code is mainly about distributed training using pipeline & data parallel.
Do you have the plan to support RLHF training?Do you think it is necessary?

Issue Converting Weights to Huggingface Format

I'm trying to convert the weights as per the example but running into an issue.

After mkdir huggingface_models \ && python tools/convert_to_hf_gptneox.py \ --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path /huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6

I'm getting this error:
Traceback (most recent call last): File "/mnt/c/Users/name/OpenChatKit/tools/convert_to_hf_gptneox.py", line 102, in <module> assert args.save_path is not None AssertionError --save-path: command not found --n-stages: command not found --n-layer-per-stage: command not found

I'm using Windows 11 WSL Ubuntu 22.04.2 LTS

Does it support Chinese Q&A?

ChatGPT supports multi-language question answering and reasoning, although in most cases, English answers are generated first and then translated into other languages. So I want to ask whether OpenChatKit supports direct Chinese Q&A, or do I need to use Chinese data set for training before I can conduct Chinese Q&A?

OpenChatKit Feedback Report

My question:

Test

Bot response:

Test

Ideal bot response:

Test!

Bot response was:

  • Factually incorrect
  • Not helpful
  • Harmful, inappropriate or unsafe

python inference/bot.py Killed

git clone https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B
run python inference/bot.py --model GPT-NeoXT-Chat-Base-20B
Loading GPT-NeoXT-Chat-Base-20B to cuda:0...
Killed

run python inference/bot.py
OSError: Can't load the configuration of '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B' is the correct path to a directory containing a config.json file

so cp -r GPT-NeoXT-Chat-Base-20B huggingface_models/

root@msi:~/test/OpenChatKit-main# python inference/bot.py
Loading /root/test/OpenChatKit-main/inference/../huggingface_models/GPT-NeoXT-Chat-Base-20B to cuda:0...
Killed

I am confused, it is running in docker, is the gpu not enough video memory?

Why instruction tuning calculate whole sentence loss?

I noticed that OIG dataset adds human and bot tag in each sample. In your code, you directly pack samples to max seq length and calculate cross entropy on whole sentence. Will this make the model output human, bot tag and not knowing when to stop? Does only calculate the last bot response loss be more suitable?

How I manage my own domain knowledge articles?

I want to know the format of my documents if I want to fine-tune a model on my domain knowledge.
If my documents are many complete articles should I split them into many small questions :
: questions from articles :answers from articles

or can I feed the model with original article(how can I feed the model with my whole article?).

many thanks!

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 432.00 MiB (GPU 2; 23.65 GiB total capacity; 20.88 GiB already allocated; 259.56 MiB free;

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Can't install nccl package

Describe the bug
When trying to set up the conda environment, it is failing to install the nccl package.

(base) PS D:\OpenChatKit> conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - nccl=2.12.12.1

To Reproduce
Steps to reproduce the behavior:

  1. Enter conda env create -f environment.yml

Expected behavior
It should install all of the packages

Desktop (please complete the following information):

  • Windows 11 Pro

Training LLaMa?

In theory the LLaMa 30b & 65b should be much more capable than the GPT-NeoX 20b.

Does OpenChatKit support LLaMa? If not, is it on the roadmap?

I appreciate that togethercomputer might not be able to release pretrained LLaMa weights due to the licence, but it'd be great if researches can at least play with it.

Is that possible to have Chinese version of README?

Is your feature request related to a problem? Please describe.
Looks like there are not clear on installation in Chinese

Describe the solution you'd like
I can help to translate it into Chinese

Describe alternatives you've considered

Additional context

Build a docker image for openchatkit

Is your feature request related to a problem? Please describe.
A docker image might be easier for people to use.

Describe the solution you'd like
We could add a /docker folder or a simple dockerfile to the repo, so people could build the image by themselves. And maybe we could push the image to dockerhub so they could just pull and test.

how to identify the process of training?

I started a training process with 4*V100S(32GB VRAM each) at 18:00, and i got a "training starts..." prompt.
With nvidia-smi, i can see that 3 GPUs are running with utils 100%.
The next morning, the processes are still running, but nothing in output folder, neither the log message.
So, is there someway to see how the training job is going?

Exception in subprocess.py

I run the following command:
python prepare.py

The result is as follows:

error: RPC failed; curl 56 GnuTLS recv error (-110): The TLS connection was non-properly terminated.
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
Traceback (most recent call last):
File "prepare.py", line 18, in
process = subprocess.run(
File "/root/miniconda3/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git clone https://huggingface.co/datasets/laion/OIG /www/wwwroot/OpenChatKit/data/OIG/files' returned non-zero exit status 128.

Add:
curl https://huggingface.co/datasets/laion/OIG is OK.
And Permission is 777 in /www/wwwroot/OpenChatKit/data/OIG/files

Why?

error de login

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Training script needs to print more progress

Feedback from a user: When running the training script, it's not clear that it's making progress. The only way to know that it's doing something is by looking at nvidia-smi.

RuntimeError: Failed to import transformers.optimization

Describe the bug
I've downloaded the corpus and the model weights, I ran the command bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh and I got the following:
https://gist.github.com/riatzukiza/0930307fc90bf940103364be2d3db5c1

To Reproduce
Steps to reproduce the behavior:

  1. Download weights
  2. download corpus
  3. run bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
  4. Bam error

Expected behavior

To fine tune the model, or get an out of memory error

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Pop os

Additional context
Add any other context about the problem here.

OpenChatKit Feedback Report

My question:

胜多负少的

Bot response:

所得到的多多

Ideal bot response:

点对点

Bot response was:

  • Factually incorrect
  • Not helpful
  • Harmful, inappropriate or unsafe

Is there an error in this line of code?“python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B”No parameters specified offline directory

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Roadmap

What’s the roadmap for the project becoming a true open alternative to chatgpt?

While its capabilities are impressive on their own, stacked against ChatGPT there’s lot lacking.

For example…

  • it can’t really generate useful code
  • the code it generates is only python
  • its reasoning capabilities need a lot of improvement
  • its knowledge of the world needs a lot of improvement

To me it seems that it is good at generating coherent sentences, but massively lacks reasoning.

Hopefully this feedback doesn’t come across as harsh or critical. It seems this project is the closest there is to a ChatGPT alternative. Impressive work everyone who contributed so far. I’m rooting for this projects success and hope it will truly rival ChatGPT someday.

One issue on env ResolvePackageNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro OpenChatKit % conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • cupy=10.4.0
  • nccl=2.12.12.1
  • faiss-gpu=1.7.2
  • cudatoolkit=11.6.0

To Reproduce
Steps to reproduce the behavior:

Expected behavior
.

Screenshots

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context
Add any other context about the problem here.

Add documentation for running inference on multiple GPUs

While trying out python inference/bot.py --retrieval --model togethercomputer/GPT-NeoXT-Chat-Base-20B
I got this error on A100 GPU:

File "inference/bot.py", line 185, in <module>
    main()
  File "inference/bot.py", line 173, in main
    OpenChatKitShell(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/cmd.py", line 217, in onecmd
    return func(arg)
  File "inference/bot.py", line 87, in do_say
    output = self._model.do_inference(
  File "inference/bot.py", line 32, in do_inference
    outputs = self._model.generate(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/generation_utils.py", line 1944, in sample
    outputs = self(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 619, in forward
    outputs = self.gpt_neox(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 511, in forward
    outputs = layer(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 115, in forward
    qkv = self.query_key_value(hidden_states)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/admin/home/anaconda3/envs/openkit/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Conda takes too long to install dependencies

Describe the bug
One user reported conda env create -f environment.yml taking over 60 minutes. We need a better solution.

To Reproduce
Steps to reproduce the behavior:

  1. Run conda env create -f environment.yml from the root of the repo.

Expected behavior
Should finish in a "reasonable" amount of time.

Add print statements to `pretrained/GPT-NeoX-20B/prepare.py` to show progress

Describe the bug
pretrained/GPT-NeoX-20B/prepare.py can take a long time to prepare the base model. It should print progress as it's converting.

To Reproduce
Steps to reproduce the behavior:

  1. run python pretrained/GPT-NeoX-20B/prepare.py from the root of the repo.

Expected behavior
The script should print progress.

FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/prank_0_checkpoint.pt'

Ubuntu Ubuntu 22.04.2 LTS
After downloading the model and now trying to convert:

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python3.10 tools/convert_to_hf_gptneox.py --ckpt-path model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5 --save-path huggingface_models/GPT-NeoXT-Chat-Base-20B --n-stages 8 --n-layer-per-stage 6
loading stage 0
Traceback (most recent call last):
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/convert_to_hf_gptneox.py", line 110, in <module>
    load_decentralized_checkpoint(
  File "/home/georgi/Documents/GitHub/OpenChatKit/tools/convert_to_hf_gptneox.py", line 43, in load_decentralized_checkpoint
    checkpoint = torch.load(os.path.join(input_path, f'prank_{i}_checkpoint.pt'), map_location=torch.device("cpu"))
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 771, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 270, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/georgi/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/serialization.py", line 251, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'model_ckpts/GPT-Neo-XT-Chat-Base-20B/checkpoint_5/prank_0_checkpoint.pt'

Any ideas?

Can't prepare pretrained model

(OpenChatKit) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit$ python pretrained/GPT-NeoX-20B/prepare.py
Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 613/613 [00:00<00:00, 272kB/s]
Downloading tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 156/156 [00:00<00:00, 55.0kB/s]
Downloading vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03M/1.03M [00:01<00:00, 748kB/s]
Downloading merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 446k/446k [00:00<00:00, 555kB/s]
Downloading tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02M/2.02M [00:01<00:00, 1.61MB/s]
Downloading special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90.0/90.0 [00:00<00:00, 39.5kB/s]
Downloading pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 56.4k/56.4k [00:00<00:00, 3.44MB/s]
Downloading pytorch_model-00001-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 883M/883M [01:16<00:00, 12.1MB/s]
Downloading pytorch_model-00002-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00003-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00004-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00005-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00006-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00007-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00008-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00009-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00010-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00011-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00012-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00013-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00014-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00015-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:19<00:00, 11.5MB/s]
Downloading pytorch_model-00016-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00017-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00018-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00019-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00020-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00021-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.0MB/s]
Downloading pytorch_model-00022-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.1MB/s]
Downloading pytorch_model-00023-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00024-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00025-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00026-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00027-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:14<00:00, 12.2MB/s]
Downloading pytorch_model-00028-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00029-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00030-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 12.0MB/s]
Downloading pytorch_model-00031-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:17<00:00, 11.7MB/s]
Downloading pytorch_model-00032-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00033-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00034-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00035-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:15<00:00, 12.1MB/s]
Downloading pytorch_model-00036-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:26<00:00, 10.5MB/s]
Downloading pytorch_model-00037-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:24<00:00, 10.7MB/s]
Downloading pytorch_model-00038-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.6MB/s]
Downloading pytorch_model-00039-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:29<00:00, 10.2MB/s]
Downloading pytorch_model-00040-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:18<00:00, 11.7MB/s]
Downloading pytorch_model-00041-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:21<00:00, 11.2MB/s]
Downloading pytorch_model-00042-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:25<00:00, 10.6MB/s]
Downloading pytorch_model-00043-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:16<00:00, 11.9MB/s]
Downloading pytorch_model-00044-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 868M/868M [01:23<00:00, 10.9MB/s]
Downloading pytorch_model-00045-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576M/576M [00:50<00:00, 11.9MB/s]
Downloading pytorch_model-00046-of-00046.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 591M/591M [00:54<00:00, 11.3MB/s]
Killed


(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$ ls
config.json  special_tokens_map.json  tokenizer_config.json  tokenizer.json
(base) georgi@georgi-hackintosh:~/Documents/GitHub/OpenChatKit/pretrained/GPT-NeoX-20B/EleutherAI_gpt-neox-20b$ 

However /home/georgi/.cache/huggingface/transformers is 41.3 GB. Any ideas what goes wrong?

Exception in prepare.py

python data/OIG/prepare.py
File "data/OIG/prepare.py", line 27
gzip.open(f, 'rb') as infile,
^
SyntaxError: invalid syntax

Tell me why, thank you!

Bug when running inference with retrieval augmented model

Describe the bug
Using retrieval-augmented models, a sequence of prompts leads to a runtime error (size mismatch between two tensors).

To Reproduce
Steps to reproduce the behavior:

  1. After downloading the Wikipedia index, run inference using python inference/bot.py --retrieval
  2. In the OpenChatKit Shell, run the following set of queries:
>>> Where is Bern?
...
>>> Where is Switzerland?
...
>>> Is Switzerland in Europe or in America?

Traceback
The queries lead to the following error:

Traceback (most recent call last):
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 185, in <module>
    main()
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 181, in main
    ).cmdloop()
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 217, in onecmd
    return func(arg)
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 87, in do_say
    output = self._model.do_inference(
  File "/home/fsuser/OpenChatKit/inference/bot.py", line 32, in do_inference
    outputs = self._model.generate(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1944, in sample
    outputs = self(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 619, in forward
    outputs = self.gpt_neox(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 511, in forward
    outputs = layer(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 319, in forward
    attention_layer_outputs = self.attention(
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 153, in forward
    attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_mask)
  File "/home/fsuser/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 220, in _attn
    attn_scores = torch.where(causal_mask, attn_scores, mask_value)
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2247) at non-singleton dimension 3

Environment
Setup using mamba in root dir: mamba env create -f environment.yml

Hardware:

  • OS: Ubuntu 20.04.5 LTS
  • 1x A100 80G GPU
  • 8 vCPU with 128GB RAM

Cupy error while training (`CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal`)

Describe the bug
The bash script to train the model does not work because of a Cupy error:

(OpenChatKit-Test) user@pc:~/OpenChatKit$ bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
Traceback (most recent call last):
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
Traceback (most recent call last):
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
Initialize NCCLCommunicator: < pipeline_group_0 >; rank: 0
Traceback (most recent call last):
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 358, in <module>
    main()
  File "/home/user/OpenChatKit/training/dist_clm_train.py", line 275, in main
    init_communicators(args)
  File "/home/user/OpenChatKit/training/comm/comm_utils.py", line 103, in init_communicators
    _PIPELINE_PARALLEL_COMM = NCCLCommunicator(_PIPELINE_PARALLEL_RANK, args.cuda_id, args.pipeline_group_size,
  File "/home/user/OpenChatKit/training/comm/nccl_backend.py", line 31, in __init__
    cupy.cuda.Device(cuda_id).use()
  File "cupy/cuda/device.pyx", line 196, in cupy.cuda.device.Device.use
  File "cupy/cuda/device.pyx", line 222, in cupy.cuda.device.Device.use
  File "cupy_backends/cuda/api/runtime.pyx", line 365, in cupy_backends.cuda.api.runtime.setDevice
  File "cupy_backends/cuda/api/runtime.pyx", line 142, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal

To Reproduce
Steps to reproduce the behavior:

  1. Run code on WSL-Ubuntu in a Conda Env
  2. Run the bash script bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh
  3. The error above is produced

Expected behavior
The code is supposed to execute.

Screenshots
NA

Desktop (please complete the following information):

  • OS: Windows 11
  • Ubuntu-WSL
  • Miniconda
  • Nvidia GeForce 3060 (Could this be the issue?)

Additional context
Also, the previous steps to download the data and weights also gave me errors. These steps:

python data/OIG/prepare.py
python pretrained/GPT-NeoX-20B/prepare.py

Ended after a couple minutes/hours with the error message "Killed". I was able to acquire the data sets with a simple wget command but I thought that was weird too.

can not create conda environment

Describe the bug
Followed the instructions but could not get

conda env create -f environment.yml

to work because of

ResolvePackageNotFound: 
  - cudatoolkit=11.6.0
  - faiss-gpu=1.7.2
  - nccl=2.12.12.1
  - cupy=10.4.0

To Reproduce
Steps to reproduce the behavior:
Intall miniconda
run
conda env create -f environment.yml

Expected behavior
Create an environment called OpenChatKit but can't create

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):
Mac

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Resources required to launch

Hello
What is minimum specification to launch (but not train) it on local machine with normal speed?
Thank you

Having issue on EnvironmentFileNotFound

Describe the bug
(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

To Reproduce
Steps to reproduce the behavior:

  1. Install miniconda3
  2. run " conda env create -f environment.yml"

Expected behavior
Should be move to next step

Screenshots
(base) samchen@Sams-MacBook-Pro miniconda3 % conda env create -f environment.yml

EnvironmentFileNotFound: '/Users/samchen/miniconda3/environment.yml' file not found

Desktop (please complete the following information):

  • OS: [MacOS]

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.