Comments (5)
Having similar issues on Ubuntu 20.04.
nvidia-smi:
Driver Version: 510.39.01 CUDA Version: 11.6
nvcc --version:
Build cuda_11.3.r11.3/compiler.29920130_0
(Note: nvidia-smi CUDA version is the max the driver will accept, not installed.)
nestedtensor
did eventually build for me however.
Also missing pytorch-lightning
if I don't install the maua/audio/requirements.txt
.
Regardless, getting Segmentation fault (core dumped)
trying to run maua.
So, I started some digging.:
$ gdb --args python -m maua
(gdb) run
Starting program: /.../maua/envs/bin/python -m maua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 32595]
[New Thread 0x7fff0c110700 (LWP 32597)]
...etc...
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffea6a901e1 in google::protobuf::internal::ReflectionOps::FindInitializationErrors(google::protobuf::Message const&, std::string const&, std::vector<std::string, std::allocator<std::string> >*) ()
from /.../maua/envs/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
(gdb)
That doesn't make a lot of sense to me, but thought I'd try python 3.9 and that did seem to help, although I got a bunch more odd errors. but was able to fix them with a couple hacks:
$ python -m maua --help
Traceback (most recent call last):
...
File "/.../maua/maua/style/image.py", line 17, in <module>
from maua.optimizers import load_optimizer, OPTIMIZERS
File "/.../maua/maua/optimizers.py", line 4, in <module>
import torch_optimizer as more_optim
ModuleNotFoundError: No module named 'torch_optimizer'
$ pip install torch-optimizer
$ python -m maua --help
Traceback (most recent call last):
...
File "/.../maua/maua/style/image.py", line 17, in <module>
from maua.optimizers import load_optimizer, OPTIMIZERS
File "/.../maua/maua/optimizers.py", line 30, in <module>
"NovoGrad": timm_optim.NovoGrad,
AttributeError: module 'timm.optim' has no attribute 'NovoGrad'
And I could fix the last by commenting out NovoGrad in the optimizer list.
Not sure why python 3.9 helps so why those other errors are happening but maybe helps identify the real problem?
Also note that I haven't done the pip install cupy-cuda113==9.6
step of the install yet as I was trying to limit the potential causes.
from maua.
Hello @ashwindcruz and @RKelln thanks for raising the issue!
I think this is mainly a case of the README being a little out of sync with the repo structure. I've added the missing dependencies to the requirements.txt and updated the commands/paths in the README.
I believe the segfault is related to the cupy version that gets found by conda not being compatible with the cudatoolkit version (at least I remember getting segfaults and ended up adding the extra cupy-cuda113 install). I haven't been able to reproduce the segfaults on my machine now though (either when uninstalling cupy-cuda113 or reinstalling from scratch without it).
Ccould you try reinstalling the repo with the updated commands in the README? Or alternatively just continue with python 3.9 instead (although I believe this gave some dependency issues in the audio
package).
@ashwindcruz Are you getting an error when installing nestedtensor? Building the wheel can take a long time (~5 min on my machine, but maybe longer with less CPU cores).
I don't think it's needed for upscaling, but as of now the CLI imports the full tree of files on every execution (that's also why each command is so slow to start at the moment). I need to restructure things so that running a given command only imports the parts it actually needs, but I haven't thought of a good way to do that yet...
from maua.
Was able to try an install using the new instructions and requirements. However ran into an issue with torchvision?
/.../lib/python3.8/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
warn(f"Failed to load image Python extension: {e}")
Segmentation fault (core dumped)
I noticed that it had installed pytorch 1.10.2
from conda
, but then uninstalled that and installed 1.10.1
using pip
. I tried installing 1.10.2
using pip
that that didn't help, so then tried a reinstall without the requirements.txt
file version lock. With that I get just Segmentation fault (core dumped)
as I used to with the previous install instructions. So I tried locking the conda
environment to the 1.10.1 versions... still Seg faults. My python 3.9 conda environment still works fine. The only difference seems to be the python version?
Using pytorch collect_env:
Working python 3.9 env:
Collecting environment information...
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:40:17) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.1
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] cudatoolkit-dev 11.3.1 py39h3811e60_0 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h8d4b97c_729 conda-forge
[conda] mkl-service 2.4.0 py39h7e14d7c_0 conda-forge
[conda] mkl_fft 1.3.1 py39h0c7bc48_1 conda-forge
[conda] mkl_random 1.2.2 py39hde0f152_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.20.1 pypi_0 pypi
[conda] pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.9 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 0.10.1 py39_cu113 pytorch
[conda] torchcrepe 0.0.15 pypi_0 pypi
[conda] torchmetrics 0.7.0 pypi_0 pypi
[conda] torchvision 0.11.2 py39_cu113 pytorch
Broken 3.8 reinstall using 1.10.2:
Collecting environment information...
PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:53:36) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.2
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.2
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.3
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] cudatoolkit-dev 11.3.1 py38h497a2fe_0 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h8d4b97c_729 conda-forge
[conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge
[conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.20.1 pypi_0 pypi
[conda] pytorch 1.10.2 py3.8_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.9 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 0.10.2 py38_cu113 pytorch
[conda] torchcrepe 0.0.15 pypi_0 pypi
[conda] torchmetrics 0.7.0 pypi_0 pypi
[conda] torchvision 0.11.3 py38_cu113 pytorch
Broken 1.10.1 reinstall:
Collecting environment information...
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:53:36) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.1
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 ha36c431_9 nvidia
[conda] cudatoolkit-dev 11.3.1 py38h497a2fe_0 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h8d4b97c_729 conda-forge
[conda] mkl-service 2.4.0 py38h95df7f1_0 conda-forge
[conda] mkl_fft 1.3.1 py38h8666266_1 conda-forge
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] mypy-extensions 0.4.3 pypi_0 pypi
[conda] numpy 1.20.1 pypi_0 pypi
[conda] pytorch 1.10.1 py3.8_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.5.9 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch-optimizer 0.3.0 pypi_0 pypi
[conda] torchaudio 0.10.1 py38_cu113 pytorch
[conda] torchcrepe 0.0.15 pypi_0 pypi
[conda] torchmetrics 0.7.0 pypi_0 pypi
[conda] torchvision 0.11.2 py38_cu113 pytorch
from maua.
Do either of you run into similar issues if you re-install using the current instructions in a clean environment?
I've streamlined quite a bit of the installation to essentially run with just pip. Maybe that helps avoid these segfaults?
from maua.
Alright going to close this for now as I think it's stale. Feel free to open up a new issue (or re-open this one) if you still run into problems!
from maua.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maua.