Comments (26)
Make the following changes if you want to use CPU
example.py
# torch.distributed.init_process_group("nccl") you don't have/didn't properly setup gpus
torch.distributed.init_process_group("gloo") # uses CPU
# torch.cuda.set_device(local_rank) remove for the same reasons
# torch.set_default_tensor_type(torch.cuda.HalfTensor)
torch.set_default_tensor_type(torch.FloatTensor)
generation.py
# tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).cuda().long()
tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).long()
model.py
self.cache_k = torch.zeros(
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda()
self.cache_v = torch.zeros(
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda()
from llama.
Try change
torch.distributed.init_process_group("nccl")
to
torch.distributed.init_process_group("gloo")
Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
from llama.
Running on a MacBook pro m1, after changing to this got a new error:
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
I'm getting the same error have you been able to resolve it?
To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.
from llama.
Have the same issue, running on a MacBook Pro.
from llama.
I am now stuck with following issue after following instructions provided by @b0kch01:
Any guidance?
from llama.
If you'd like, you can try my llama cpu fork. Tested to work on my Macbook Pro M1 Max
from llama.
Try change
torch.distributed.init_process_group("nccl")
to
torch.distributed.init_process_group("gloo")
from llama.
Running on a MacBook pro m1, after changing to this got a new error:
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
I'm getting the same error have you been able to resolve it?
To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.
Doesn't pytorch support the apple silicon GPU, so it wouldn't be slow as a snail
from llama.
Running on a MacBook pro m1, after changing to this got a new error:
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
I'm getting the same error have you been able to resolve it?
from llama.
Thank you! @b0kch01
from llama.
Just in case someone's going to ask for MPS (M1/2 GPU support): the code uses view_as_complex
, which is neither supported, nor does it have any PYTORCH_ENABLE_MPS_FALLBACK due to memory sharing issues. Even modifying the code to use MPS does not enable GPU support on Apple Silicon until pytorch/pytorch#77764 is fixed. So only CPU for now, but @b0kch01
's version works nicely 🙂
from llama.
Unfortunately, I'm still unable to run anything using @b0kch01's llama-cpu
repo on Linux.
[W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
Locating checkpoints
Found MP=1 checkpoints
Creating checkpoint instance...
Grabbing params...
Loading model arguments...
Creating tokenizer...
Creating transformer...
-- Creating embedding
-- Creating transformer blocks (32)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 13822) of binary: /gpfs/gibbs/project/frank/maw244/conda_envs/llama/bin/python3.10
Traceback (most recent call last):
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
example.py FAILED
------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-03-03_17:31:31
host : c14n02.grace.hpc.yale.internal
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 13822)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13822
======================================================
Here's the version on the cluster:
LSB Version: :core-4.1-amd64:core-4.1-ia32:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-ia32:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-ia32:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release: 7.9
Codename: Maipo
Edit:
Nevermind, it was an issue with the memory available on the interactive node. Got it working by adjusting the batch size and seq len and submitting it as a job. Thank you, @b0kch01!
from llama.
Thank you @Urammar
from llama.
Same, running on MacBook Pro M1.
from llama.
Running on a MacBook pro m1, after changing to this got a new error:
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
I'm getting the same error have you been able to resolve it?
To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.
I am trying to do the same. Can you tell exactly how did you achieve the same? Also, if you can share your example.py, it would be great.
from llama.
After following @b0kch01's advices, I'm stuck here. Can anyone help out?
from llama.
After following @b0kch01's advices, I'm stuck here. Can anyone help out?
When the log says MP=0
, it means it cannot find any of the weights in the path you provided
from llama.
Those errors both typically indicate you havent actually pointed at either the model or the tokenizer. I noticed the tokenizer doesnt seem to get downloaded with the 7B
I'll upload it here to save you the trouble, this should be extracted to whatever directory you set your download to (1 above your 7B model directory itself) or just manually point to it with that command line
from llama.
Those errors both typically indicate you havent actually pointed at either the model or the tokenizer. I noticed the tokenizer doesnt seem to get downloaded with the 7B
I'll upload it here to save you the trouble, this should be extracted to whatever directory you set your download to (1 above your 7B model directory itself) or just manually point to it with that command line
Thanks! Apparently, the download.sh needs to be edited to function on macOS. Luckily, this guy made it work: https://github.com/facebookresearch/llama/pull/39/files
I implemented the changes and am now downloading the weights.
from llama.
I'm getting stuck in a different way after following @b0kch01's changes. I'm not running into the MP=0 issue, but I get some warnings about being unable to initialize the client and server sockets. Otherwise it looks similar to @jaygdesai's issue but with exitcode 7 instead of 9.
from llama.
Hi, I'm stuck here, following @b0kch01's llama-cpu repo on my mac. Any suggestion? Thanks.
from llama.
![]()
Hi, I'm stuck here, following @b0kch01's llama-cpu repo on my mac. Any suggestion? Thanks.
I‘m facing the same issue, but only with the 7B model. It seems like the consolidated.01.pth file doesn’t get downloaded when running the download.sh file: the consolidated.01.pth file is the only one I get a 403 status instead of 200.
I opened a new issue here.
from llama.
When I was using my own jetson agx orin developer kit, I also had this error. I checked online and found that jetson does not seem to support NCCL. Is this normal, because I don’t want to use the CPU to run。
from llama.
Try change
torch.distributed.init_process_group("nccl")
to
torch.distributed.init_process_group("gloo")
where ?
from llama.
Closing as author is inactive. If anyone has further questions, feel free to open a new issue. For future reference, check both llama and llama-recipes repos for getting started guides.
from llama.
This solution works for me: #947
from llama.
Related Issues (20)
- ValidationError: Input validation error: `inputs` must have less than 4096 tokens. Given: 4545
- Too long for pending a review for huggingface model
- ### System Info HOT 1
- Architecture
- Agnostic Atheist AI not Normal HOT 14
- Discussing a potential bias in Llama2-Chat that can lead to content safety issues
- download.sh didn't work well HOT 3
- parameter count of Llama2-70B and Llama2-13B
- Change the name of openai to closeai and change the project name to openai.
- Error: llama runner process no longer running: 3221225785
- [Generation, Question] Why does the `seed` have to be the same in different processors (`Llama.build`)?
- how can i evaluate mathematic datasets like GSM8K?
- Test Tokenizer gives Incorrect padding error
- No response from request to access models
- how to download this model HOT 1
- Providing SHA-256 hashes
- This PR will implement code for reproducing results in the following paper:
- Unable to access the Hugging Face Llama-3 model repo
- [Parallel MD5] Accelerating `download.sh`
- LLaMA3 supports an 8K token context length. When continuously pretraining with proprietary data, the majority of the text data is significantly shorter than 8K tokens, resulting in a substantial amount of padding. To enhance training efficiency and effectiveness, it is necessary to merge multiple short texts into a longer text, with the length remaining below 8K tokens. However, the question arises: how should these short texts be combined into a single training sequence? Should they be separated by delimiters, or should an approach involving masking be used during the pretraining process?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama.