I've been following the <a href="https://github.com/aws-neuron/aws-neuron-samples/blob

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Inf2 Modified Llama 2 Loading Issue about transformers-neuronx HOT 11 CLOSED

aws-neuron commented on July 22, 2024

Inf2 Modified Llama 2 Loading Issue

from transformers-neuronx.

Comments (11)

mrnikwaws commented on July 22, 2024

Hi @liechtym

Thanks for reporting the problem. We've reproduced the problem and have a fix in an upcoming release. We'll respond here and close this issue once the release is out

from transformers-neuronx.

liechtym commented on July 22, 2024

@mrnikwaws Thank you very much! I appreciate the quick response and look forward to the release.

from transformers-neuronx.

mrnikwaws commented on July 22, 2024

2.16 is now released and should address your issue. Please respond on this ticket if the issue is not resolved. If we don't hear back we'll close the issue.

from transformers-neuronx.

liechtym commented on July 22, 2024

Thank you much!

from transformers-neuronx.

liechtym commented on July 22, 2024

@mrnikwaws I just tried with the following demo code and I'm still getting the same error.

https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb

I verified my installation from the latest commit in the repo with pip freeze:
transformers-neuronx @ git+https://github.com/aws-neuron/transformers-neuronx.git@426629648481095dfbb4f6bd993f25b88a87b505

I only changed a couple things from the demo. Instead of using 'llama-2-13b' I used 'meta-llama/Llama-2-7b-chat-hf' in LlamaForCausalLM.from_pretrained(). The only other change was tp_degree=2 in LlamaForSampling.from_pretrained().

Traceback:

Traceback (most recent call last):
  File "run.py", line 11, in <module>
    neuron_model = LlamaForSampling.from_pretrained('./Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/module.py", line 148, in from_pretrained
    state_dict = torch.load(state_dict_path)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './Llama-2-7b-chat-hf-split/pytorch_model.bin'

Again, I'm on the same instance, AMI, and setup as before.

from transformers-neuronx.

shebbur-aws commented on July 22, 2024

@liechtym Sorry for the inconvenience. We have a fix for this in transformers-neuronx github repo which has been updated today. Can you please check with the latest?

from transformers-neuronx.

liechtym commented on July 22, 2024

@shebbur-aws Yes I'll check with the latest and update you soon.

from transformers-neuronx.

liechtym commented on July 22, 2024

@shebbur-aws This issue seems to be resolved when reinstalling from the Github repo.

However, I am now getting the following error while running meta-llama-2-13b-sampling.ipynb with the modifications I described in the previous comment. Let me know if you'd like me to create a new issue for this.

2024-01-04 14:33:59.000295:  4197  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000383:  4198  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000471:  4199  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000492:  4197  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_9e281341e7845ee2287f+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000563:  4200  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000601:  4198  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_a4faa198082ac5b8d787+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000623:  4201  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000703:  4202  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000754:  4203  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000755:  4199  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d5006487226e226573ea+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000756:  4204  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000790:  4205  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000862:  4206  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:34:00.000087:  4200  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1bf56f238691e0fd88c8+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440:  4202  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_70d1a1ce4d52a869b9e6+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440:  4201  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_c46e110ea38cea049c6d+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464:  4203  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_b9a15c837cee1bf59e24+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464:  4204  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1f6eaa498df4dc58af20+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465:  4205  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d750f56f8d6a41f0372e+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465:  4206  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_e22db4da23e4fde86dd1+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-Jan-04 14:34:00.727597  4120:4181  ERROR  NEFF:neff_parse                              NEFF version: 2.0, features: 0x100 are not supported.  Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727647  4120:4181  ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
2024-Jan-04 14:34:00.727686  4120:4182  ERROR  NEFF:neff_parse                              NEFF version: 2.0, features: 0x100 are not supported.  Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727716  4120:4182  ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
Traceback (most recent call last):
  File "run.py", line 12, in <module>
    neuron_model.to_neuron()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 72, in to_neuron
    self.setup()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 63, in setup
    nbs.setup()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 335, in setup
    self.program.setup(self.layers, self.pre_layer_parameters, self.ln_lm_head_params)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1449, in setup
    super().setup(layers, pre_layer_params, ln_lm_head_params)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1325, in setup
    kernel.load(io_ring_cache_size)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py", line 454, in load
    self.model.load()
RuntimeError: nrt_load_collectives status=10

from transformers-neuronx.

shebbur-aws commented on July 22, 2024

@liechtym Looks like there is a mismatch in compiler and runtime/tools version you are using. Can you please upgrade your runtime packages to 2.16 version as well which should fix this issue you are seeing.

from transformers-neuronx.

liechtym commented on July 22, 2024

Thanks @shebbur-aws. I will try this out and report back soon.

from transformers-neuronx.

liechtym commented on July 22, 2024

It's working great! Thanks! If I have any additional issues I'll file a different issue. Thanks again.

from transformers-neuronx.

Inf2 Modified Llama 2 Loading Issue about transformers-neuronx HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent