Code Monkey home page Code Monkey logo

Comments (5)

RKelln avatar RKelln commented on May 27, 2024

Having similar issues on Ubuntu 20.04.

nvidia-smi: 
Driver Version: 510.39.01    CUDA Version: 11.6
nvcc --version:
Build cuda_11.3.r11.3/compiler.29920130_0

(Note: nvidia-smi CUDA version is the max the driver will accept, not installed.)

nestedtensor did eventually build for me however.

Also missing pytorch-lightning if I don't install the maua/audio/requirements.txt.

Regardless, getting Segmentation fault (core dumped) trying to run maua.

So, I started some digging.:

$ gdb --args python -m maua

(gdb) run
Starting program: /.../maua/envs/bin/python -m maua
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 32595]
[New Thread 0x7fff0c110700 (LWP 32597)]
...etc...

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffea6a901e1 in google::protobuf::internal::ReflectionOps::FindInitializationErrors(google::protobuf::Message const&, std::string const&, std::vector<std::string, std::allocator<std::string> >*) ()
   from /.../maua/envs/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so
(gdb) 

That doesn't make a lot of sense to me, but thought I'd try python 3.9 and that did seem to help, although I got a bunch more odd errors. but was able to fix them with a couple hacks:

$ python -m maua --help
Traceback (most recent call last):
...
  File "/.../maua/maua/style/image.py", line 17, in <module>
    from maua.optimizers import load_optimizer, OPTIMIZERS
  File "/.../maua/maua/optimizers.py", line 4, in <module>
    import torch_optimizer as more_optim
ModuleNotFoundError: No module named 'torch_optimizer'

$ pip install torch-optimizer

$ python -m maua --help
Traceback (most recent call last):
...
  File "/.../maua/maua/style/image.py", line 17, in <module>
    from maua.optimizers import load_optimizer, OPTIMIZERS
  File "/.../maua/maua/optimizers.py", line 30, in <module>
    "NovoGrad": timm_optim.NovoGrad,
AttributeError: module 'timm.optim' has no attribute 'NovoGrad'

And I could fix the last by commenting out NovoGrad in the optimizer list.

Not sure why python 3.9 helps so why those other errors are happening but maybe helps identify the real problem?

Also note that I haven't done the pip install cupy-cuda113==9.6 step of the install yet as I was trying to limit the potential causes.

from maua.

JCBrouwer avatar JCBrouwer commented on May 27, 2024

Hello @ashwindcruz and @RKelln thanks for raising the issue!

I think this is mainly a case of the README being a little out of sync with the repo structure. I've added the missing dependencies to the requirements.txt and updated the commands/paths in the README.

I believe the segfault is related to the cupy version that gets found by conda not being compatible with the cudatoolkit version (at least I remember getting segfaults and ended up adding the extra cupy-cuda113 install). I haven't been able to reproduce the segfaults on my machine now though (either when uninstalling cupy-cuda113 or reinstalling from scratch without it).

Ccould you try reinstalling the repo with the updated commands in the README? Or alternatively just continue with python 3.9 instead (although I believe this gave some dependency issues in the audio package).

@ashwindcruz Are you getting an error when installing nestedtensor? Building the wheel can take a long time (~5 min on my machine, but maybe longer with less CPU cores).

I don't think it's needed for upscaling, but as of now the CLI imports the full tree of files on every execution (that's also why each command is so slow to start at the moment). I need to restructure things so that running a given command only imports the parts it actually needs, but I haven't thought of a good way to do that yet...

from maua.

RKelln avatar RKelln commented on May 27, 2024

Was able to try an install using the new instructions and requirements. However ran into an issue with torchvision?

/.../lib/python3.8/site-packages/torchvision/io/image.py:11: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
Segmentation fault (core dumped)

I noticed that it had installed pytorch 1.10.2 from conda, but then uninstalled that and installed 1.10.1 using pip. I tried installing 1.10.2 using pip that that didn't help, so then tried a reinstall without the requirements.txt file version lock. With that I get just Segmentation fault (core dumped) as I used to with the previous install instructions. So I tried locking the conda environment to the 1.10.1 versions... still Seg faults. My python 3.9 conda environment still works fine. The only difference seems to be the python version?

Using pytorch collect_env:

Working python 3.9 env:

Collecting environment information...
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:40:17)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.1
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.2
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] cudatoolkit-dev           11.3.1           py39h3811e60_0    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py39h7e14d7c_0    conda-forge
[conda] mkl_fft                   1.3.1            py39h0c7bc48_1    conda-forge
[conda] mkl_random                1.2.2            py39hde0f152_0    conda-forge
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.1                   pypi_0    pypi
[conda] pytorch                   1.10.1          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-lightning         1.5.9                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                0.10.1               py39_cu113    pytorch
[conda] torchcrepe                0.0.15                   pypi_0    pypi
[conda] torchmetrics              0.7.0                    pypi_0    pypi
[conda] torchvision               0.11.2               py39_cu113    pytorch

Broken 3.8 reinstall using 1.10.2:

Collecting environment information...
PyTorch version: 1.10.2
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:53:36)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.2
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.2
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.3
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] cudatoolkit-dev           11.3.1           py38h497a2fe_0    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py38h95df7f1_0    conda-forge
[conda] mkl_fft                   1.3.1            py38h8666266_1    conda-forge
[conda] mkl_random                1.2.2            py38h1abd341_0    conda-forge
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.1                   pypi_0    pypi
[conda] pytorch                   1.10.2          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-lightning         1.5.9                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                0.10.2               py38_cu113    pytorch
[conda] torchcrepe                0.0.15                   pypi_0    pypi
[conda] torchmetrics              0.7.0                    pypi_0    pypi
[conda] torchvision               0.11.3               py38_cu113    pytorch

Broken 1.10.1 reinstall:

Collecting environment information...
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:53:36)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.13.0-28-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] pytorch-lightning==1.5.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.10.1
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==0.10.1
[pip3] torchcrepe==0.0.15
[pip3] torchmetrics==0.7.0
[pip3] torchvision==0.11.2
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] cudatoolkit-dev           11.3.1           py38h497a2fe_0    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py38h95df7f1_0    conda-forge
[conda] mkl_fft                   1.3.1            py38h8666266_1    conda-forge
[conda] mkl_random                1.2.2            py38h1abd341_0    conda-forge
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.1                   pypi_0    pypi
[conda] pytorch                   1.10.1          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-lightning         1.5.9                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch-ranger            0.1.1                    pypi_0    pypi
[conda] torch-optimizer           0.3.0                    pypi_0    pypi
[conda] torchaudio                0.10.1               py38_cu113    pytorch
[conda] torchcrepe                0.0.15                   pypi_0    pypi
[conda] torchmetrics              0.7.0                    pypi_0    pypi
[conda] torchvision               0.11.2               py38_cu113    pytorch

from maua.

JCBrouwer avatar JCBrouwer commented on May 27, 2024

Do either of you run into similar issues if you re-install using the current instructions in a clean environment?

I've streamlined quite a bit of the installation to essentially run with just pip. Maybe that helps avoid these segfaults?

from maua.

JCBrouwer avatar JCBrouwer commented on May 27, 2024

Alright going to close this for now as I think it's stale. Feel free to open up a new issue (or re-open this one) if you still run into problems!

from maua.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.