Comments (10)
Hi @212792736
The error itself is odd, but I think the problem is the argument passed.
there is no --default
argument for the trains-agent daemon
command.
You can check the full help:
trains-agent --help
and per command
trains-agent daemon --help
from clearml-agent.
Hi @212792736 ,
Strange... When I use a clean environment with trains_agent==0.16.0
and trains==0.16.1
, and run trains-agent daemon --default
the response is:
usage: trains-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
trains-agent: error: unrecognized arguments: --default
If I run trains-agent daemon
I get a normal response and the agent starts running as expected.
Can you verify the agent you're running is the correct version (trains-agent --version
) - it might be that the installed trains-agent
command somehow goes to a different agent installation.
To make sure you're running the installed 0.16.0 agent, try using python -m trains_agent --version
from clearml-agent.
Thanks for the swift reply, here are the commands you requested:
(trains) C:>trains-agent --version
TRAINS-AGENT version 0.16.0
(trains) C:>trains-agent --help
TRAINS-AGENT Deep Learning DevOps
usage: trains-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
positional arguments:
{execute,build,list,daemon,config,init}
execute Build & Execute a selected experiment
build Build selected experiment environment (including pip packages, cloned code
and git diff) Used mostly for debugging purposes
list List all worker machines and status
daemon Start Trains-Agent daemon worker
config Check daemon configuration and print it
init Trains-Agent configuration wizard
optional arguments:
-h Displays summary of all commands
--help Detailed help of command line interface
--version TRAINS-AGENT version number
--config-file CONFIG_FILE Use a different configuration file (default:
"C:\trains.conf")
--debug, -d print debug information
(trains) C:>trains-agent daemon --help
usage: trains-agent daemon [-h] [--foreground] [--queue QUEUES [QUEUES ...]]
[--order-fairness] [--standalone-mode]
[--services-mode] [--create-queue] [--detached]
[--stop] [-O] [--git-user GIT_USER]
[--git-pass GIT_PASS]
[--log-level {DEBUG,INFO,WARN,WARNING,ERROR,CRITICAL}]
[--gpus GPUS] [--cpu-only]
[--docker [DOCKER [DOCKER ...]]]
[--force-current-version]
optional arguments:
-h, --help show this help message and exit
--foreground Pipe full log to stdout/stderr, should not be used if
running in background
--queue QUEUES [QUEUES ...]
Queue ID(s)/Name(s) to pull tasks from ('default'
queue)
--order-fairness Pull from each queue in a round-robin order, instead
of priority order.
--standalone-mode Do not use any network connects, assume everything is
pre-installed
--services-mode Launch multiple long-term docker services. Implies
docker & cpu-only flags.
--create-queue Create requested queue if it does not exist already.
--detached, -d Detached mode, run agent in the background
--stop Stop the running agent (based on the same set of
arguments)
-O Compile optimized pyc code (see python documentation).
Repeat for more optimization.
--git-user GIT_USER git username for repository access
--git-pass GIT_PASS git password for repository access
--log-level {DEBUG,INFO,WARN,WARNING,ERROR,CRITICAL}
SDK log level
--gpus GPUS Specify active GPUs for the daemon to use (docker /
virtual environment), Equivalent to setting
NVIDIA_VISIBLE_DEVICES Examples: --gpus 0 or --gpu
0,1,2 or --gpus all
--cpu-only Disable GPU access for the daemon, only use CPU in
either docker or virtual environment
Docker support:
--docker [DOCKER [DOCKER ...]]
Run execution task inside a docker (v19.03 and above).
Optional args <image> <arguments> or specify default
docker image in agent.default_docker.image /
agent.default_docker.argumentsuse --gpus/--cpu-only
(or set NVIDIA_VISIBLE_DEVICES) to limit gpu
visibility for docker
--force-current-version
Force trains-agent to use the current trains-agent
version when running in the docker
(trains) C:>python -m trains_agent --version
TRAINS-AGENT version 0.16.0
I also have a version of trains isntalled:
trains 0.16.1
I'll check if I have some proxy issues, maybe that is causing some ambiguous error message
from clearml-agent.
Hi @212792736,
I'm still confused by the --default
command line switch you used (which is not supported).
What happens when you try simply running trains-agent --debug daemon
?
from clearml-agent.
Hi @jkhenning
First, my mistake for creating the confusion with running the trains-agent daemon --default
instead of running
trains-agent daemon --queue default
But regardless, that is not the thing that is causing the problem.
As you requested here is the output of
(trains) C:\WINDOWS\system32>trains-agent --debug daemon
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pitc-zscaler-emea-amsterdam3pr.proxy.corporate.ge.com:80
DEBUG:urllib3.connectionpool:PROXY_LINK_THAT_i_HID:80 "GET http://10.136.16.173:8008/auth.login HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): GATEWAY_PROXY_LINK:443
DEBUG:urllib3.connectionpool:https://GATEWAY_PROXY_LINK:443 "GET /_sm_ccik?_sm_rid=q0tnWZknQnsRqt3RtWZPv5D7tMfStnWZZ626rkqfStnWZZ626rkq&_orig_url=http://10.136.16.173:8008/auth.login HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:PROXY_LINK_THAT_i_HID:80 "GET http://10.136.16.173:8008/auth.login?_sm_nck=1 HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): captcha.corp_notification.com:443
DEBUG:urllib3.connectionpool:https://captcha.corp_notification.com:443 "GET /pitc/?url=http%3A%2F%2F10.136.16.173%CANNOT_ACCESS_LINK_DESCRIPTION HTTP/1.1" 200 None
Traceback (most recent call last):
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 545, in _do_refresh_token
return resp["data"]["token"]
KeyError: 'data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\USER\.conda\envs\trains\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\USER\.conda\envs\trains\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\USER\.conda\envs\trains\Scripts\trains-agent.exe\__main__.py", line 7, in <module>
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\__main__.py", line 81, in main
return run_command(parser, args, command_name)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\__main__.py", line 42, in run_command
command = command_class(**vars(args))
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\helper\base.py", line 238, in __call__
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\worker.py", line 345, in __init__
super(Worker, self).__init__(*args, **kwargs)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\base.py", line 98, in __init__
self._session = self._get_session(*args, **kwargs)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\base.py", line 113, in _get_session
return Session(*args, **kwargs)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\session.py", line 94, in __init__
super(Session, self).__init__(*args, **kwargs)
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 152, in __init__
self.refresh_token()
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\token_manager.py", line 95, in refresh_token
self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec))
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 552, in _do_refresh_token
'Is this the TRAINS API server {} ?'.format(self.get_api_server_host()))
File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 437, in get_api_server_host
from ...config import config_obj
ImportError: cannot import name 'config_obj' from 'trains_agent.config' (c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\config.py)
I deleted some of the links that were in the printout due to corporate links, but I think the problem is that I do not get at HTTP 200 response which is caused by a proxy.
from clearml-agent.
Hi @212792736 ,
Now it makes sense 🙂
Please try upgrading to the latest Trains Agent v0.16.1 and let me know if the problem still persists:
pip install -U trains-agent
from clearml-agent.
Hi again @jkhenning
After updating the error message has changed which is good.
ValueError: It seems *api_server* is misconfigured. Is this the TRAINS API server None ?
We are currently moving our processing server with trains on it so once that is completed, I will check if the problem has been fully resolved
from clearml-agent.
Hi again,
I think we figured out what the problem is,it's the proxy settings which are having issues with the python urllib to communicate with it. If I run the trains-agent on the local server, then no problem is found atm.
I'll have to do more investigation if this can be further resolved but I think the initial problem has been resolved. I'll close the issue after the investigation.
from clearml-agent.
Hi @212792736 ,
Seems this issue got lost on us :(
I think we figured out what the problem is,it's the proxy settings which are having issues with the python urllib to communicate with it
Yes, this sounds like it...
Since basically as you pointed out, its urllib doing all the communication under the hood, we just need to configure it to pass through your proxy.
All urllib sessions are created here , based on this urllib3 documentation it seems we need to replace the PoolManager with ProxyManager, I think once that is done it should work.
What do you think?
from clearml-agent.
Hoi @bmartinn
Thanks for looking into the solutions, I think this helped me to realise a nice trick, and that is to set NO_PROXY variable (import os; os.environ['NO_PROXY'] = 'IP_OF_THE_TRAINS_SERVER') in the file you mentioned
That made me realise that actually all I need is to "set no_proxy=IP_of_the_server" same as it is described here
Now I can run the trains-server and I get these responses (after the config file was printed out, and I just hit the IP addresses ):
(...)
DEBUG:trains_agent.commands.worker:starting resource monitor thread
Worker "G4H3Q5S2E:0" - Listening to queues:
+----------------------------------+---------+-------+
| id | name | tags |
+----------------------------------+---------+-------+
| 44eab786974744cb91a9fe5796fa305e | default | |
+----------------------------------+---------+-------+
DEBUG:urllib3.connectionpool:Resetting dropped connection: server
DEBUG:urllib3.connectionpool:http://server:8008 "GET /workers.register HTTP/1.1" 200 261
Running TRAINS-AGENT daemon in background mode, writing stdout/stderr to C:\Users\212792~1\AppData\Local\Temp\.trains_agent_daemon_outea7jspr8.txt
DEBUG:urllib3.connectionpool:Resetting dropped connection: 10.136.16.82
DEBUG:urllib3.connectionpool:http://server:8008 "GET /v2.5/queues.get_all HTTP/1.1" 200 321
DEBUG:urllib3.connectionpool:Resetting dropped connection: 10.136.16.82
DEBUG:urllib3.connectionpool:http://server::8008 "GET /v2.5/queues.get_next_task HTTP/1.1" 200 265
That would mean that this problem is then solved, I believe, I thank you for you support and great work!
from clearml-agent.
Related Issues (20)
- Modify clearml-agent to accept urlib>=2 as a dependency.
- poetry_install_extra_args passes arguments to poetry config HOT 1
- environment variables in default_docker arguments of clearml.conf not passed to container on first run HOT 2
- Yolo Execution with GPU HOT 4
- RAM / CPU cores partitioning for multiple agents on the same machine HOT 1
- Issue of checkout PR commit by sha HOT 1
- Image on Docker Hub is out of date HOT 12
- no module named "virtualenv" with execute_remotely HOT 5
- clearml-agent build not building a docker image HOT 10
- shh to http conversion fails with dev.azure HOT 2
- Run in a docker mode not passing envs (DIND) HOT 2
- gnutls_handshake() failed: An unexpected TLS HOT 4
- The cmd clearml-agent daemon stop marked ongoing Task as completed
- Docker container of the cloned task crashes/stucks. HOT 12
- Feature request: support for PDM package manager HOT 6
- error: could not write config file /root/.gitconfig: Device or resource busy - running clearml-agent in docker mode HOT 3
- install error PEP 503 HOT 1
- Feature: automatically install repo as pip package HOT 2
- ClearML does not find all packages HOT 4
- Use agent with dind HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-agent.