Running into issues when serving Mixtral 8x7B on 4 x H100 (TP=4) with deepspeed-mii v0

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[BUG] Issue serving Mixtral 8x7B on H100 about deepspeed-mii HOT 7 OPEN

Rogerwyf commented on June 20, 2024

[BUG] Issue serving Mixtral 8x7B on H100

from deepspeed-mii.

Comments (7)

mrwyattii commented on June 20, 2024 1

Thanks for reporting this. It seems there was a bug introduced in the latest release when we added FP6 quantization support. I will investigate and fix the bug. Thank you!

from deepspeed-mii.

JamesTheZ commented on June 20, 2024 1

@JamesTheZ may know about this.

Seems because the current implementation only compiles cuda_linear_kernels.cpp on Ampere: https://github.com/microsoft/DeepSpeed/blob/330d36bb39b8dd33b5603ee0024705db38aab534/op_builder/inference_core_ops.py#L75-L81

from deepspeed-mii.

sidagarwal2805 commented on June 20, 2024

I ran into the same problem with V100s with exact same error output. Was fixed when I switched to A100s

from deepspeed-mii.

Rogerwyf commented on June 20, 2024

I ran into the same problem with V100s with exact same error output. Was fixed when I switched to A100

Yea - can confirm this works on A100, but not on H100

from deepspeed-mii.

xiaoxiawu-microsoft commented on June 20, 2024

@JamesTheZ may know about this.

from deepspeed-mii.

Taishi-N324 commented on June 20, 2024

I'm encountering an issue with meta-llama/Llama-2-7b-chat-hf on an H100 due to an undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii, and it's not working. I've also faced the same problem with mistralai/Mistral-7B-v0.1. Neither of these models is functioning in my setup.

I've attempted using multiple versions of deepspeed-mii (0.2.1, 0.2.2, and 0.2.3), as well as different versions of PyTorch (2.2.1, 2.1.2, and 2.1.0), but none of these combinations seem to work. Additionally, even went as far as compiling directly from the source, but unfortunately, I haven't had any success.

Is anyone else experiencing the same issue or has any suggestions on how to resolve it?

import mii
pipe = mii.pipeline("meta-llama/Llama-2-7b-chat-hf")

NVIDIA H100 80GB
Driver Version: 535.104.12

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 999.98 GB

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.6 LTS
Release:	20.04
Codename:	focal

from deepspeed-mii.

deroholic commented on June 20, 2024

Downgrading to this will work:
deepspeed 0.13.5
deepspeed-mii 0.2.2

from deepspeed-mii.

[BUG] Issue serving Mixtral 8x7B on H100 about deepspeed-mii HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent