Code Monkey home page Code Monkey logo

Comments (11)

mrnikwaws avatar mrnikwaws commented on July 22, 2024

Hi @liechtym

Thanks for reporting the problem. We've reproduced the problem and have a fix in an upcoming release. We'll respond here and close this issue once the release is out

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

@mrnikwaws Thank you very much! I appreciate the quick response and look forward to the release.

from transformers-neuronx.

mrnikwaws avatar mrnikwaws commented on July 22, 2024

2.16 is now released and should address your issue. Please respond on this ticket if the issue is not resolved. If we don't hear back we'll close the issue.

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

Thank you much!

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

@mrnikwaws I just tried with the following demo code and I'm still getting the same error.

https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb

I verified my installation from the latest commit in the repo with pip freeze:
transformers-neuronx @ git+https://github.com/aws-neuron/transformers-neuronx.git@426629648481095dfbb4f6bd993f25b88a87b505

I only changed a couple things from the demo. Instead of using 'llama-2-13b' I used 'meta-llama/Llama-2-7b-chat-hf' in LlamaForCausalLM.from_pretrained(). The only other change was tp_degree=2 in LlamaForSampling.from_pretrained().

Traceback:

Traceback (most recent call last):
  File "run.py", line 11, in <module>
    neuron_model = LlamaForSampling.from_pretrained('./Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/module.py", line 148, in from_pretrained
    state_dict = torch.load(state_dict_path)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './Llama-2-7b-chat-hf-split/pytorch_model.bin'

Again, I'm on the same instance, AMI, and setup as before.

from transformers-neuronx.

shebbur-aws avatar shebbur-aws commented on July 22, 2024

@liechtym Sorry for the inconvenience. We have a fix for this in transformers-neuronx github repo which has been updated today. Can you please check with the latest?

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

@shebbur-aws Yes I'll check with the latest and update you soon.

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

@shebbur-aws This issue seems to be resolved when reinstalling from the Github repo.

However, I am now getting the following error while running meta-llama-2-13b-sampling.ipynb with the modifications I described in the previous comment. Let me know if you'd like me to create a new issue for this.

2024-01-04 14:33:59.000295:  4197  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000383:  4198  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000471:  4199  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000492:  4197  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_9e281341e7845ee2287f+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000563:  4200  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000601:  4198  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_a4faa198082ac5b8d787+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000623:  4201  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000703:  4202  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000754:  4203  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000755:  4199  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d5006487226e226573ea+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000756:  4204  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000790:  4205  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000862:  4206  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:34:00.000087:  4200  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1bf56f238691e0fd88c8+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440:  4202  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_70d1a1ce4d52a869b9e6+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440:  4201  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_c46e110ea38cea049c6d+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464:  4203  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_b9a15c837cee1bf59e24+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464:  4204  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1f6eaa498df4dc58af20+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465:  4205  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d750f56f8d6a41f0372e+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465:  4206  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_e22db4da23e4fde86dd1+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-Jan-04 14:34:00.727597  4120:4181  ERROR  NEFF:neff_parse                              NEFF version: 2.0, features: 0x100 are not supported.  Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727647  4120:4181  ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
2024-Jan-04 14:34:00.727686  4120:4182  ERROR  NEFF:neff_parse                              NEFF version: 2.0, features: 0x100 are not supported.  Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727716  4120:4182  ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
Traceback (most recent call last):
  File "run.py", line 12, in <module>
    neuron_model.to_neuron()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 72, in to_neuron
    self.setup()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 63, in setup
    nbs.setup()
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 335, in setup
    self.program.setup(self.layers, self.pre_layer_parameters, self.ln_lm_head_params)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1449, in setup
    super().setup(layers, pre_layer_params, ln_lm_head_params)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1325, in setup
    kernel.load(io_ring_cache_size)
  File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py", line 454, in load
    self.model.load()
RuntimeError: nrt_load_collectives status=10

from transformers-neuronx.

shebbur-aws avatar shebbur-aws commented on July 22, 2024

@liechtym Looks like there is a mismatch in compiler and runtime/tools version you are using. Can you please upgrade your runtime packages to 2.16 version as well which should fix this issue you are seeing.

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

Thanks @shebbur-aws. I will try this out and report back soon.

from transformers-neuronx.

liechtym avatar liechtym commented on July 22, 2024

It's working great! Thanks! If I have any additional issues I'll file a different issue. Thanks again.

from transformers-neuronx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.