Comments (11)
Hi @liechtym
Thanks for reporting the problem. We've reproduced the problem and have a fix in an upcoming release. We'll respond here and close this issue once the release is out
from transformers-neuronx.
@mrnikwaws Thank you very much! I appreciate the quick response and look forward to the release.
from transformers-neuronx.
2.16 is now released and should address your issue. Please respond on this ticket if the issue is not resolved. If we don't hear back we'll close the issue.
from transformers-neuronx.
Thank you much!
from transformers-neuronx.
@mrnikwaws I just tried with the following demo code and I'm still getting the same error.
I verified my installation from the latest commit in the repo with pip freeze
:
transformers-neuronx @ git+https://github.com/aws-neuron/transformers-neuronx.git@426629648481095dfbb4f6bd993f25b88a87b505
I only changed a couple things from the demo. Instead of using 'llama-2-13b' I used 'meta-llama/Llama-2-7b-chat-hf' in LlamaForCausalLM.from_pretrained(). The only other change was tp_degree=2
in LlamaForSampling.from_pretrained().
Traceback:
Traceback (most recent call last):
File "run.py", line 11, in <module>
neuron_model = LlamaForSampling.from_pretrained('./Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/module.py", line 148, in from_pretrained
state_dict = torch.load(state_dict_path)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 252, in __init__
super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './Llama-2-7b-chat-hf-split/pytorch_model.bin'
Again, I'm on the same instance, AMI, and setup as before.
from transformers-neuronx.
@liechtym Sorry for the inconvenience. We have a fix for this in transformers-neuronx github repo which has been updated today. Can you please check with the latest?
from transformers-neuronx.
@shebbur-aws Yes I'll check with the latest and update you soon.
from transformers-neuronx.
@shebbur-aws This issue seems to be resolved when reinstalling from the Github repo.
However, I am now getting the following error while running meta-llama-2-13b-sampling.ipynb with the modifications I described in the previous comment. Let me know if you'd like me to create a new issue for this.
2024-01-04 14:33:59.000295: 4197 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000383: 4198 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000471: 4199 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000492: 4197 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_9e281341e7845ee2287f+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000563: 4200 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000601: 4198 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_a4faa198082ac5b8d787+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000623: 4201 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000703: 4202 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000754: 4203 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000755: 4199 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d5006487226e226573ea+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000756: 4204 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000790: 4205 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000862: 4206 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:34:00.000087: 4200 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1bf56f238691e0fd88c8+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440: 4202 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_70d1a1ce4d52a869b9e6+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440: 4201 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_c46e110ea38cea049c6d+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464: 4203 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_b9a15c837cee1bf59e24+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464: 4204 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1f6eaa498df4dc58af20+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465: 4205 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d750f56f8d6a41f0372e+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465: 4206 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_e22db4da23e4fde86dd1+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-Jan-04 14:34:00.727597 4120:4181 ERROR NEFF:neff_parse NEFF version: 2.0, features: 0x100 are not supported. Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727647 4120:4181 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
2024-Jan-04 14:34:00.727686 4120:4182 ERROR NEFF:neff_parse NEFF version: 2.0, features: 0x100 are not supported. Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727716 4120:4182 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
Traceback (most recent call last):
File "run.py", line 12, in <module>
neuron_model.to_neuron()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 72, in to_neuron
self.setup()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 63, in setup
nbs.setup()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 335, in setup
self.program.setup(self.layers, self.pre_layer_parameters, self.ln_lm_head_params)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1449, in setup
super().setup(layers, pre_layer_params, ln_lm_head_params)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1325, in setup
kernel.load(io_ring_cache_size)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py", line 454, in load
self.model.load()
RuntimeError: nrt_load_collectives status=10
from transformers-neuronx.
@liechtym Looks like there is a mismatch in compiler and runtime/tools version you are using. Can you please upgrade your runtime packages to 2.16 version as well which should fix this issue you are seeing.
from transformers-neuronx.
Thanks @shebbur-aws. I will try this out and report back soon.
from transformers-neuronx.
It's working great! Thanks! If I have any additional issues I'll file a different issue. Thanks again.
from transformers-neuronx.
Related Issues (20)
- How to use generate() with inputs_embeds HOT 2
- Mixtral config issue -- not handling null well HOT 8
- Generate Llama 2 from Embeddings HOT 5
- Infering logits from `model.forward` for the entire batch instead of the last forward's output. HOT 6
- Support for MPT model HOT 1
- `stopping_criteria_list(input_ids, probs)` does not check for the correct sequence. HOT 4
- User feedback when compiling and reloading a large model HOT 1
- Issue while compiling Mistral 7B 0.2 Instruct HOT 5
- Backward compatibility with saved llama 2 compiled artifacts HOT 1
- NaN outputs when masking llama model inputs HOT 8
- Improve Neuron model loading time HOT 4
- Add support for `gemma` models HOT 1
- Add support for Baichuan-13B model
- Latest changes introduced for continuous batching break Mixtral model HOT 5
- llava support HOT 3
- Any plan to support Qwen-2 Model
- Neuron model NEFFs are dependent on the python path HOT 2
- Not able to load llama 3 70b on inf2.24xlarge instance HOT 5
- Gibberish output for princeton-nlp/Sheared-LLaMA-1.3B with continuous batching HOT 2
- [Question] BasicTransformerBlock
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers-neuronx.