Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Running into CUDA out of memory on Colab about gpt-llm-trainer HOT 8 OPEN

mshumer commented on July 17, 2024 2

Running into CUDA out of memory on Colab

from gpt-llm-trainer.

Comments (8)

fredzannarbor commented on July 17, 2024 1

There is a comment in the code about needing authorization by Meta and passing a token to Hugging Face.

model_name = "NousResearch/llama-2-7b-chat-hf" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument

QUESTION 1: Does this mean that the default code assumes that you have HF/Meta access to 2-7b-chat-hf? And if you don't have it, does that explain the memory error?

QUESTION 2: Has anyone plugged anything else in here with success? If so, what?

from gpt-llm-trainer.

fredzannarbor commented on July 17, 2024

Has anyone found a solution to this? Same her.

from gpt-llm-trainer.

fredzannarbor commented on July 17, 2024

I am still unable to merge the model because I am getting the same error as @smilinrobin. I have Colab Pro and am running on a V100. I have implemented the recommendations that are readily available on the Internet, such as setting max_split_size_mb to 250, and none of them have made much difference.

Can we have a team effort to improve this situation? If you have llm-trainer working on Colab, can you please share your configuration?

from gpt-llm-trainer.

KabaTubare commented on July 17, 2024

Yes I am having the same issue as others at the Merge section, it gets to about 50% before running out of memory (even though I am using a V100 GPU on Colab Pro. Hope this gets fixed soon as this is a very useful project

from gpt-llm-trainer.

fredzannarbor commented on July 17, 2024

UPDATE: I went through the sign-up and have verified that I have access to Meta, to NousResearch, to Hugging Face, and am successfully passing the login.

I also set max_split_per_mb to 32 and experimented with various values of memory fraction.

I updated to Google Colab Pro+. No benefit, since they still only issue one 16GB GPU.

At this point I have to conclude that the model can't be run on standard Google Colab Pro accounts, unless I am missing something. Can anyone prove me wrong?

from gpt-llm-trainer.

arielshaulov commented on July 17, 2024

UPDATE: I went through the sign-up and have verified that I have access to Meta, to NousResearch, to Hugging Face, and am successfully passing the login.

I also set max_split_per_mb to 32 and experimented with various values of memory fraction.

I updated to Google Colab Pro+. No benefit, since they still only issue one 16GB GPU.

At this point I have to conclude that the model can't be run on standard Google Cloud Pro accounts, unless I am missing something. Can anyone prove me wrong?

same conclusion here, I got access to the llama 2 from Meta , and still getting "CUDA out of memory" error.
if anyone plugged anything else other then the llama 2 it would be very helpful.

from gpt-llm-trainer.

KabaTubare commented on July 17, 2024

Yes the issue continues, the gpt 3.5 llm trainer does work well on Colab though. But would love to have a model that doesn't incur suc h outrageous costs once it is spun into a production environment

from gpt-llm-trainer.

Running into CUDA out of memory on Colab about gpt-llm-trainer HOT 8 OPEN

Comments (8)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent