If I start an experiment with the following requirements defined in the UI: <div c

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

With trains-agent==0.14.2rc2 it also fails: <div

torch version inference logic broken when torchvision is specified about clearml-agent HOT 5 OPEN

allegroai commented on June 14, 2024 1

torch version inference logic broken when torchvision is specified

from clearml-agent.

Comments (5)

bmartinn commented on June 14, 2024 1

Yes you are correct, I'll make sure the error message will be corrected in the next RC.

Regrading using pypi with torch, the problem is, this is unstabe, for example there is no way of knowing whether the torchvision on pypi is the CPU or the GPU version...
Also for the GPU version, the CUDA version changes from one torch version to another, so you end up with driver mismatch with no good reason.

With all that said, if you know what's the correct version for your setup, you can simple replace the torchvision==0.2.1 with a direct https link to the wheel:
https://files.pythonhosted.org/packages/ca/0d/f00b2885711e08bd71242ebe7b96561e6f6d01fdb4b9dcf4d37e2e13c5e1/torchvision-0.2.1-py2.py3-none-any.whl
This would work, as long as it matches the CPU/CUDA version you are running .

from clearml-agent.

H4dr1en commented on June 14, 2024 1

Regrading using pypi with torch, the problem is, this is unstabe, for example there is no way of knowing whether the torchvision on pypi is the CPU or the GPU version...
Also for the GPU version, the CUDA version changes from one torch version to another, so you end up with driver mismatch with no good reason.

Thank you for pointing that out, this definitely makes sense!

With all that said, if you know what's the correct version for your setup, you can simple replace the torchvision==0.2.1 with a direct https link to the wheel:

Thanks for the workaround! I'll close as soon as the error is more explicit 👍

EDIT:
@H4dr1en, What is the trains-agent version you are using?
What is the package manager trains-agent is using ? see example here
What is the pip version limit configured in trains.conf? see example here

train-agent==0.14.2rc2
package manager = pip
pip version = 0.21

from clearml-agent.

bmartinn commented on June 14, 2024

Hi @H4dr1en
Torch is a special case for trains-agent, since the good people of pytorch are actually maintaining packages for different CUDA versions, the trains-agent will automatically select the correct package based on the installed CUDA.

Specifically it seems that you are running without a GPU, so cuda version is 0.
It seems to find the correct package for torch==1.3.1, but fails on torchvision, the thing is it tries to download "torch" not "torchvision" ... Let me see if I can reproduce this behavior ..

EDIT:
@H4dr1en, What is the trains-agent version you are using?
What is the package manager trains-agent is using ? see example here
What is the pip version limit configured in trains.conf? see example here

from clearml-agent.

bmartinn commented on June 14, 2024

Hi @H4dr1en
Could you test with trains-agent 0.14.2rc2

pip install trains-agent==0.14.2rc2

I think the problem is that there is no package for torchvision==0.2.0
You can see in the full list here: https://download.pytorch.org/whl/cpu/torch_stable.html

Notice that you can just reset the experiment and edit the requirements to the correct torchvision version :)

from clearml-agent.

H4dr1en commented on June 14, 2024

With trains-agent==0.14.2rc2 it also fails:

Collecting Cython
  Using cached Cython-0.29.17-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.17
Collecting torch==1.3.1+cpu
  File was already downloaded /home/H4dr1en/.trains/pip-download-cache/cu0/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl
Successfully downloaded torch
Collecting torch==0.2.1
  ERROR: HTTP error 403 while getting http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
  ERROR: Could not install requirement torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl because of error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install requirement torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl because of HTTP error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl for URL http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
trains_agent: ERROR: Could not download wheel name of "http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl"
ERROR: Double requirement given: torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl (from -r /tmp/cached-reqsx0eu_ber.txt (line 2)) (already in torch==1.5.0+cpu from file:///home/H4dr1en/.trains/pip-download-cache/cu0/torch-1.5.0%2Bcpu-cp37-cp37m-linux_x86_64.whl (from -r /tmp/cached-reqsx0eu_ber.txt (line 1)), name='torch')
trains_agent: ERROR: Could not install task requirements!
Command '['/home/H4dr1en/.trains/venvs-builds/3.7/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsx0eu_ber.txt']' returned non-zero exit status 1.
DONE: Running task '63d740ab6fbd4178ad55243df1c4cf07', exit status 1

I think the problem is that there is no package for torchvision==0.2.0

Would it be reasonable to install torchvision (and torch) using pypi repo as a fallback when trains-agent cannot infer the package based on the version of CUDA and torch/torchvision?

In any case, the error should be more meaningfull (currently misleading since it tries to install torch, not torchvision with the version provided for torchvision)

from clearml-agent.

torch version inference logic broken when torchvision is specified about clearml-agent HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent