Comments (5)
Yes you are correct, I'll make sure the error message will be corrected in the next RC.
Regrading using pypi with torch, the problem is, this is unstabe, for example there is no way of knowing whether the torchvision on pypi is the CPU or the GPU version...
Also for the GPU version, the CUDA version changes from one torch version to another, so you end up with driver mismatch with no good reason.
With all that said, if you know what's the correct version for your setup, you can simple replace the torchvision==0.2.1
with a direct https link to the wheel:
https://files.pythonhosted.org/packages/ca/0d/f00b2885711e08bd71242ebe7b96561e6f6d01fdb4b9dcf4d37e2e13c5e1/torchvision-0.2.1-py2.py3-none-any.whl
This would work, as long as it matches the CPU/CUDA version you are running .
from clearml-agent.
Regrading using pypi with torch, the problem is, this is unstabe, for example there is no way of knowing whether the torchvision on pypi is the CPU or the GPU version...
Also for the GPU version, the CUDA version changes from one torch version to another, so you end up with driver mismatch with no good reason.
Thank you for pointing that out, this definitely makes sense!
With all that said, if you know what's the correct version for your setup, you can simple replace the torchvision==0.2.1 with a direct https link to the wheel:
Thanks for the workaround! I'll close as soon as the error is more explicit 👍
EDIT:
@H4dr1en, What is the trains-agent version you are using?
What is the package manager trains-agent is using ? see example here
What is the pip version limit configured in trains.conf? see example here
train-agent==0.14.2rc2
package manager = pip
pip version = 0.21
from clearml-agent.
Hi @H4dr1en
Torch is a special case for trains-agent, since the good people of pytorch are actually maintaining packages for different CUDA versions, the trains-agent will automatically select the correct package based on the installed CUDA.
Specifically it seems that you are running without a GPU, so cuda version is 0.
It seems to find the correct package for torch==1.3.1, but fails on torchvision, the thing is it tries to download "torch" not "torchvision" ... Let me see if I can reproduce this behavior ..
EDIT:
@H4dr1en, What is the trains-agent version you are using?
What is the package manager trains-agent is using ? see example here
What is the pip version limit configured in trains.conf
? see example here
from clearml-agent.
Hi @H4dr1en
Could you test with trains-agent 0.14.2rc2
pip install trains-agent==0.14.2rc2
I think the problem is that there is no package for torchvision==0.2.0
You can see in the full list here: https://download.pytorch.org/whl/cpu/torch_stable.html
Notice that you can just reset the experiment and edit the requirements to the correct torchvision version :)
from clearml-agent.
With trains-agent==0.14.2rc2
it also fails:
Collecting Cython
Using cached Cython-0.29.17-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.17
Collecting torch==1.3.1+cpu
File was already downloaded /home/H4dr1en/.trains/pip-download-cache/cu0/torch-1.3.1+cpu-cp37-cp37m-linux_x86_64.whl
Successfully downloaded torch
Collecting torch==0.2.1
ERROR: HTTP error 403 while getting http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install requirement torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl because of error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install requirement torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl because of HTTP error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl for URL http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl
trains_agent: ERROR: Could not download wheel name of "http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl"
ERROR: Double requirement given: torch==0.2.1 from http://download.pytorch.org/whl/cu0/torch-0.2.1-cp37-cp37m-linux_x86_64.whl (from -r /tmp/cached-reqsx0eu_ber.txt (line 2)) (already in torch==1.5.0+cpu from file:///home/H4dr1en/.trains/pip-download-cache/cu0/torch-1.5.0%2Bcpu-cp37-cp37m-linux_x86_64.whl (from -r /tmp/cached-reqsx0eu_ber.txt (line 1)), name='torch')
trains_agent: ERROR: Could not install task requirements!
Command '['/home/H4dr1en/.trains/venvs-builds/3.7/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsx0eu_ber.txt']' returned non-zero exit status 1.
DONE: Running task '63d740ab6fbd4178ad55243df1c4cf07', exit status 1
I think the problem is that there is no package for torchvision==0.2.0
Would it be reasonable to install torchvision
(and torch
) using pypi repo as a fallback when trains-agent cannot infer the package based on the version of CUDA and torch/torchvision?
In any case, the error should be more meaningfull (currently misleading since it tries to install torch, not torchvision with the version provided for torchvision)
from clearml-agent.
Related Issues (20)
- Issue of checkout PR commit by sha HOT 1
- Image on Docker Hub is out of date HOT 12
- no module named "virtualenv" with execute_remotely HOT 5
- clearml-agent build not building a docker image HOT 10
- shh to http conversion fails with dev.azure HOT 2
- Run in a docker mode not passing envs (DIND) HOT 2
- gnutls_handshake() failed: An unexpected TLS HOT 4
- The cmd clearml-agent daemon stop marked ongoing Task as completed
- Docker container of the cloned task crashes/stucks. HOT 12
- Feature request: support for PDM package manager HOT 6
- error: could not write config file /root/.gitconfig: Device or resource busy - running clearml-agent in docker mode HOT 3
- install error PEP 503 HOT 1
- Feature: automatically install repo as pip package HOT 2
- ClearML does not find all packages HOT 4
- Use agent with dind HOT 2
- Agent on Mac doesn't pull tasks from queue and automatically unregisters from Server after a while HOT 2
- Does clearml-agent caches experiments docker-enviroment? HOT 2
- How to set pod-template dynamically in k8s-glue? HOT 2
- How to run a clearml-task without --requirements or --packages when using Docker? HOT 2
- How to run a bash script instead of a Python script in clearml-agent? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-agent.