Comments (7)
Thanks for reporting this. It seems there was a bug introduced in the latest release when we added FP6 quantization support. I will investigate and fix the bug. Thank you!
from deepspeed-mii.
@JamesTheZ may know about this.
Seems because the current implementation only compiles cuda_linear_kernels.cpp
on Ampere: https://github.com/microsoft/DeepSpeed/blob/330d36bb39b8dd33b5603ee0024705db38aab534/op_builder/inference_core_ops.py#L75-L81
from deepspeed-mii.
I ran into the same problem with V100s with exact same error output. Was fixed when I switched to A100s
from deepspeed-mii.
I ran into the same problem with V100s with exact same error output. Was fixed when I switched to A100
Yea - can confirm this works on A100, but not on H100
from deepspeed-mii.
@JamesTheZ may know about this.
from deepspeed-mii.
I'm encountering an issue with meta-llama/Llama-2-7b-chat-hf
on an H100 due to an undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii, and it's not working. I've also faced the same problem with mistralai/Mistral-7B-v0.1
. Neither of these models is functioning in my setup.
I've attempted using multiple versions of deepspeed-mii (0.2.1, 0.2.2, and 0.2.3), as well as different versions of PyTorch (2.2.1, 2.1.2, and 2.1.0), but none of these combinations seem to work. Additionally, even went as far as compiling directly from the source, but unfortunately, I haven't had any success.
Is anyone else experiencing the same issue or has any suggestions on how to resolve it?
import mii
pipe = mii.pipeline("meta-llama/Llama-2-7b-chat-hf")
NVIDIA H100 80GB
Driver Version: 535.104.12
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/taishi/workplace/mii/venv/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.0, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 999.98 GB
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
from deepspeed-mii.
Downgrading to this will work:
deepspeed 0.13.5
deepspeed-mii 0.2.2
from deepspeed-mii.
Related Issues (20)
- Limit VRAM usage in serving the model HOT 2
- Any plans for produnction-ready services?
- Add support for DBRX
- [FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII HOT 1
- how can I use deepspeed to split the model to submit GPU?
- Is openai compatible server still working? HOT 1
- How do I launch the api on a graphics card other than cuda: 0 HOT 1
- How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ?
- [FEATURE] Access to logits and final hidden layer HOT 1
- RuntimeError: The server socket has failed to listen on any local network address HOT 1
- Only running one replica even though setting many replicas
- [Problem]errno: 98 - Address already in use
- Performance with vllm
- error when using Qwen1.5-32B
- ValueError: Unsupported model type phi3
- BUG in run_batch_processing
- Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7 HOT 2
- [REQUEST] Mixtral-8x22B support
- [REQUEST] LLAMA-3 support
- Does deepspeed-mii support prefix_allowed_tokens_fn?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepspeed-mii.