The notebook code does not even run, even after entering the hugging face token.

Hey, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hey, <a class="user-mention notranslate" data-hovercard-type="user" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Doesn't work about mixtral-offloading HOT 10 CLOSED

dvmazur commented on May 17, 2024

Doesn't work

from mixtral-offloading.

Comments (10)

ffreemt commented on May 17, 2024 2

Try to use huggingface-cli for downloading the model first, something like

!huggingface-cli download --resume-download lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo --local-dir Mixtral-8x7B-Instruct-v0.1-offloading-demo
clear_output()

! date

# then load the modle from local-dir
# config = AutoConfig.from_pretrained(quantized_model_name)
# state_path = snapshot_download(quantized_model_name)

state_path = "Mixtral-8x7B-Instruct-v0.1-offloading-demo"
config = AutoConfig.from_pretrained(state_path)

Maybe snapshot_download can't handle so many files. huggingface-cli download is quite fast: 17G in a 2-3 minutes. Note config = AutoConfig.from_pretrained(quantized_model_name) also seems to hang.

from mixtral-offloading.

dvmazur commented on May 17, 2024 1

Hey, @SanskarX10, could you provide more info?

Just tried running the notebook myself, It appears to be stuck downloading the model snapshot from the model hub. Could be an issue on HF's side.

from mixtral-offloading.

oltipreka commented on May 17, 2024 1

Despite the generation speed being slows, it works like a charm.
So, I wanted to thank you, @dvmazur, for such an amazing job you have done, really!!!!

That said, I would have some questions:

What does it take to run the some notebook locally on one's pc? What GPU is necessary? Regarding RAM it seem to be at least 16 GB, right?
Is there a way to download the Mixtral model embeddings?

==> Ok. I gave a look at the Paper, which seems to clarify my questions.

Again, sincere congratulations to the all the contributors !!!!!

from mixtral-offloading.

dvmazur commented on May 17, 2024 1

Hey, @oltipreka, thanks for the kind words. This was a collaborative effort, so please shout out @lavawolfiee for making it happen.

As for the generation speed, we are still working on making it faster, but we've slowed down a bit due to the holidays :)

Regarding your questions,

You'll need about 27Gb of combined GPU and CPU memory. The proportion of GPU to CPU memory affects generation speed, as lower GPU memory might require offloading more experts. You can view some setups in our tech-report
You can download the original embedding layer weights from Mixtral's repo on HF Hub.

I'm closing this issue due to it being resolved.

from mixtral-offloading.

oltipreka commented on May 17, 2024 1

Hey, @oltipreka, thanks for the kind words. This was a collaborative effort, so please shout out @lavawolfiee for making it happen.

As for the generation speed, we are still working on making it faster, but we've slowed down a bit due to the holidays :)

Regarding your questions,

You'll need about 27Gb of combined GPU and CPU memory. The proportion of GPU to CPU memory affects generation speed, as lower GPU memory might require offloading more experts. You can view some setups in our tech-report

You can download the original embedding layer weights from Mixtral's repo on HF Hub.

I'm closing this issue due to it being resolved.

Thanks for the clarifications, extremely useful.
Yeah, you are absolutely right, the entire team deserves credit for this, including @lavawolfiee.
Thank you folks and keep going !!!

from mixtral-offloading.

segamboam commented on May 17, 2024

Hi everyone. I have the same problem. A curiosity is that with Colab, if I don't have the Hugging Face token, the code doesn't run on line 5.

But when I introduce the token in Colab secrets, it doesn't run on line 4.

Maybe, there is an error of compatibility between Colab and Hugging Face, or issues related to connection

from mixtral-offloading.

SanskarX10 commented on May 17, 2024

Same issue , the execution goes on forever and nothing gets downloaded

from mixtral-offloading.

SanskarX10 commented on May 17, 2024

Hey, @SanskarX10, could you provide more info?

Just tried running the notebook myself, It appears to be stuck downloading the model snapshot from the model hub. Could be an issue on HF's side.

Oh ! , that can be the case. Thanks for quick reply.

from mixtral-offloading.

dvmazur commented on May 17, 2024

@ffreemt, thanks for the tip! I just published the new notebook.

We'll implement a more permanent solution on the weekend.

from mixtral-offloading.

Manojkl commented on May 17, 2024

The recent notebook works. However, the speed of generation is slow. To answer the query "write a poem about python" it took 4 minutes.

from mixtral-offloading.

Doesn't work about mixtral-offloading HOT 10 CLOSED

Comments (10)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent