Comments (10)
Sure, I'm happy to help. Let me try out the new version and I'll let you know if it works for me.
from tutel.
Your CUDA environment seems not be installed in the default location(e.g. /usr/local/cuda/include) can you print the value of CUDA_HOME. BTW, you can also try whether export USE_NVRTC=0
will help.
from tutel.
Your CUDA environment seems not be installed in the default location(e.g. /usr/local/cuda/include) can you print the value of CUDA_HOME. BTW, you can also try whether
export USE_NVRTC=0
will help.
Thank you for your prompt reply! Yes, my CUDA environment is not installed in the default location because I'm using a shared computation cluster. Is there a parameter I can fix to ensure the compiler can find the correct CUDA? I will try to use export USE_NVRTC=0
$ echo $CUDA_HOME
/public/apps/cuda/11.3
from tutel.
We just merge a PR that parse CUDA_HOME from environment variable. Can you try whether it works for you?
from tutel.
Thank you! The fix works and I think the CUDA_HOME can be found correctly. I can now successfully run the hello_world.py
and hello_world_ddp.py
under the examples folder without any error. However, when I tried to use it under the fairseq (the use case is here), I got the following two errors:
- Compilation error: (seems like the CUDA compiler worked, otherwise there won't be the second error)
[W custom_kernel.cpp:158] nvrtc: error: unrecognized option --includ`��.�U found
� Failed to use NVRTC for JIT compilation in this Pytorch version, try another approach using CUDA compiler.. (To always disable NVRTC, please: export USE_NVRTC=0)
- RuntimeError
File "/private/home/hyhuang/.conda/envs/newnllb/lib/python3.9/site-packages/fairseq-1.0.0a0+b1b3eda-py3.9-linux-x86_64.egg/fairseq/modules/moe/top2gate.py", line 234, in top2gating
locations1 = fused_cumsum_sub_one(mask1)
File "/private/home/hyhuang/.local/lib/python3.9/site-packages/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one
return torch.ops.tutel_ops.cumsum(data)
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values))INTERNAL ASSERT FAILED at "/tmp/pip-req-build-djl73tcc/tutel/custom/custom_kernel.cpp":214, please report a bug to PyTorch. CHECK_EQ fails.
return torch.ops.tutel_ops.cumsum(data)
Would you be able to provide any suggestions between these two? I am so confused. This is the same environment I used to run the hello_world.py
scripts.
from tutel.
You need to run unset USE_NVRTC
since you may explicitly configure that variable before.
from tutel.
Thank you! That completely resolves this problem. Closing the issue.
from tutel.
@hyhuang00 Can you help us to test whether the latest version (#170) still work for your environment? As we canceled the way to detect manual CUDA_HOME
environment variable, but the new way should be compatible with different environment more robustly.
from tutel.
The new version works on my machine without any error. I installed the package via $ python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
from tutel.
Thanks!
from tutel.
Related Issues (20)
- Multi-nodes training is much more slower than single node HOT 1
- [installation errors] fatal error: nccl.h: No such file or directory HOT 1
- RuntimeError: No such operator tutel_ops::cumsum HOT 10
- How the experts' gradients are handled under data parallelism? HOT 1
- All2All precision always in fp32 HOT 1
- tutel/jit_kernels/sparse.py torch.float16 There is a bug in the calculation: the cuda calculation result is inconsistent with the CPU calculation result and the array is out of bounds HOT 1
- [Bug]The function func_fwd is calculated inconsistent on the cpu and gpu HOT 1
- ImportError: cannot import name 'tutel_custom_kernel' from 'tutel.impls.jit_compiler' HOT 12
- about compute_location and locations HOT 1
- INTERNAL ASSERT FAILED HOT 5
- Training with Data and Expert Parallelism HOT 5
- Can this package support the one-gpu machine HOT 5
- how to use tutel on Megatron Deepspeed HOT 4
- numpy not in requirements HOT 5
- What is the difference between this and deepspeed-moe? HOT 2
- tutel is slower than the naive p2p using 2DH for small scale HOT 3
- RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED HOT 3
- Non-surface function utilities only work for contiguous input data HOT 12
- How to implement Fairseq-MoE training checkpoint like Swin-MoE? HOT 1
- [Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tutel.