Code Monkey home page Code Monkey logo

Comments (10)

bmartinn avatar bmartinn commented on June 14, 2024

Hi @212792736
The error itself is odd, but I think the problem is the argument passed.
there is no --default argument for the trains-agent daemon command.
You can check the full help:

trains-agent --help

and per command

trains-agent daemon --help

from clearml-agent.

jkhenning avatar jkhenning commented on June 14, 2024

Hi @212792736 ,

Strange... When I use a clean environment with trains_agent==0.16.0 and trains==0.16.1, and run trains-agent daemon --default the response is:

usage: trains-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
                    {execute,build,list,daemon,config,init} ...
trains-agent: error: unrecognized arguments: --default

If I run trains-agent daemon I get a normal response and the agent starts running as expected.

Can you verify the agent you're running is the correct version (trains-agent --version) - it might be that the installed trains-agent command somehow goes to a different agent installation.

To make sure you're running the installed 0.16.0 agent, try using python -m trains_agent --version

from clearml-agent.

212792736 avatar 212792736 commented on June 14, 2024

Thanks for the swift reply, here are the commands you requested:

(trains) C:>trains-agent --version
TRAINS-AGENT version 0.16.0
(trains) C:>trains-agent --help
TRAINS-AGENT Deep Learning DevOps

usage: trains-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
                    {execute,build,list,daemon,config,init} ...

positional arguments:
  {execute,build,list,daemon,config,init}
    execute                                Build & Execute a selected experiment
    build                                  Build selected experiment environment (including pip packages, cloned code
                                           and git diff) Used mostly for debugging purposes
    list                                   List all worker machines and status
    daemon                                 Start Trains-Agent daemon worker
    config                                 Check daemon configuration and print it
    init                                   Trains-Agent configuration wizard

optional arguments:
  -h                                       Displays summary of all commands
  --help                                   Detailed help of command line interface
  --version                                TRAINS-AGENT version number
  --config-file CONFIG_FILE                Use a different configuration file (default:
                                           "C:\trains.conf")
  --debug, -d                              print debug information
(trains) C:>trains-agent daemon --help
usage: trains-agent daemon [-h] [--foreground] [--queue QUEUES [QUEUES ...]]
                           [--order-fairness] [--standalone-mode]
                           [--services-mode] [--create-queue] [--detached]
                           [--stop] [-O] [--git-user GIT_USER]
                           [--git-pass GIT_PASS]
                           [--log-level {DEBUG,INFO,WARN,WARNING,ERROR,CRITICAL}]
                           [--gpus GPUS] [--cpu-only]
                           [--docker [DOCKER [DOCKER ...]]]
                           [--force-current-version]

optional arguments:
  -h, --help            show this help message and exit
  --foreground          Pipe full log to stdout/stderr, should not be used if
                        running in background
  --queue QUEUES [QUEUES ...]
                        Queue ID(s)/Name(s) to pull tasks from ('default'
                        queue)
  --order-fairness      Pull from each queue in a round-robin order, instead
                        of priority order.
  --standalone-mode     Do not use any network connects, assume everything is
                        pre-installed
  --services-mode       Launch multiple long-term docker services. Implies
                        docker & cpu-only flags.
  --create-queue        Create requested queue if it does not exist already.
  --detached, -d        Detached mode, run agent in the background
  --stop                Stop the running agent (based on the same set of
                        arguments)
  -O                    Compile optimized pyc code (see python documentation).
                        Repeat for more optimization.
  --git-user GIT_USER   git username for repository access
  --git-pass GIT_PASS   git password for repository access
  --log-level {DEBUG,INFO,WARN,WARNING,ERROR,CRITICAL}
                        SDK log level
  --gpus GPUS           Specify active GPUs for the daemon to use (docker /
                        virtual environment), Equivalent to setting
                        NVIDIA_VISIBLE_DEVICES Examples: --gpus 0 or --gpu
                        0,1,2 or --gpus all
  --cpu-only            Disable GPU access for the daemon, only use CPU in
                        either docker or virtual environment

Docker support:
  --docker [DOCKER [DOCKER ...]]
                        Run execution task inside a docker (v19.03 and above).
                        Optional args <image> <arguments> or specify default
                        docker image in agent.default_docker.image /
                        agent.default_docker.argumentsuse --gpus/--cpu-only
                        (or set NVIDIA_VISIBLE_DEVICES) to limit gpu
                        visibility for docker
  --force-current-version
                        Force trains-agent to use the current trains-agent
                        version when running in the docker

(trains) C:>python -m trains_agent --version
TRAINS-AGENT version 0.16.0

I also have a version of trains isntalled:
trains 0.16.1

I'll check if I have some proxy issues, maybe that is causing some ambiguous error message

from clearml-agent.

jkhenning avatar jkhenning commented on June 14, 2024

Hi @212792736,

I'm still confused by the --default command line switch you used (which is not supported).

What happens when you try simply running trains-agent --debug daemon?

from clearml-agent.

212792736 avatar 212792736 commented on June 14, 2024

Hi @jkhenning

First, my mistake for creating the confusion with running the trains-agent daemon --default
instead of running
trains-agent daemon --queue default

But regardless, that is not the thing that is causing the problem.
As you requested here is the output of

(trains) C:\WINDOWS\system32>trains-agent --debug daemon
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): pitc-zscaler-emea-amsterdam3pr.proxy.corporate.ge.com:80
DEBUG:urllib3.connectionpool:PROXY_LINK_THAT_i_HID:80 "GET http://10.136.16.173:8008/auth.login HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): GATEWAY_PROXY_LINK:443
DEBUG:urllib3.connectionpool:https://GATEWAY_PROXY_LINK:443 "GET /_sm_ccik?_sm_rid=q0tnWZknQnsRqt3RtWZPv5D7tMfStnWZZ626rkqfStnWZZ626rkq&_orig_url=http://10.136.16.173:8008/auth.login HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:PROXY_LINK_THAT_i_HID:80 "GET http://10.136.16.173:8008/auth.login?_sm_nck=1 HTTP/1.1" 307 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): captcha.corp_notification.com:443
DEBUG:urllib3.connectionpool:https://captcha.corp_notification.com:443 "GET /pitc/?url=http%3A%2F%2F10.136.16.173%CANNOT_ACCESS_LINK_DESCRIPTION HTTP/1.1" 200 None
Traceback (most recent call last):
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 545, in _do_refresh_token
    return resp["data"]["token"]
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\USER\.conda\envs\trains\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\USER\.conda\envs\trains\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\USER\.conda\envs\trains\Scripts\trains-agent.exe\__main__.py", line 7, in <module>
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\__main__.py", line 81, in main
    return run_command(parser, args, command_name)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\__main__.py", line 42, in run_command
    command = command_class(**vars(args))
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\helper\base.py", line 238, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\worker.py", line 345, in __init__
    super(Worker, self).__init__(*args, **kwargs)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\base.py", line 98, in __init__
    self._session = self._get_session(*args, **kwargs)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\commands\base.py", line 113, in _get_session
    return Session(*args, **kwargs)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\session.py", line 94, in __init__
    super(Session, self).__init__(*args, **kwargs)
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 152, in __init__
    self.refresh_token()
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\token_manager.py", line 95, in refresh_token
    self._set_token(self._do_refresh_token(self.__token, exp=self.req_token_expiration_sec))
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 552, in _do_refresh_token
    'Is this the TRAINS API server {} ?'.format(self.get_api_server_host()))
  File "c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\backend_api\session\session.py", line 437, in get_api_server_host
    from ...config import config_obj
ImportError: cannot import name 'config_obj' from 'trains_agent.config' (c:\users\USER\.conda\envs\trains\lib\site-packages\trains_agent\config.py)

I deleted some of the links that were in the printout due to corporate links, but I think the problem is that I do not get at HTTP 200 response which is caused by a proxy.

from clearml-agent.

jkhenning avatar jkhenning commented on June 14, 2024

Hi @212792736 ,

Now it makes sense 🙂

Please try upgrading to the latest Trains Agent v0.16.1 and let me know if the problem still persists:
pip install -U trains-agent

from clearml-agent.

212792736 avatar 212792736 commented on June 14, 2024

Hi again @jkhenning

After updating the error message has changed which is good.

ValueError: It seems *api_server* is misconfigured. Is this the TRAINS API server None ?

We are currently moving our processing server with trains on it so once that is completed, I will check if the problem has been fully resolved

from clearml-agent.

212792736 avatar 212792736 commented on June 14, 2024

Hi again,

I think we figured out what the problem is,it's the proxy settings which are having issues with the python urllib to communicate with it. If I run the trains-agent on the local server, then no problem is found atm.

I'll have to do more investigation if this can be further resolved but I think the initial problem has been resolved. I'll close the issue after the investigation.

from clearml-agent.

bmartinn avatar bmartinn commented on June 14, 2024

Hi @212792736 ,
Seems this issue got lost on us :(

I think we figured out what the problem is,it's the proxy settings which are having issues with the python urllib to communicate with it

Yes, this sounds like it...

Since basically as you pointed out, its urllib doing all the communication under the hood, we just need to configure it to pass through your proxy.
All urllib sessions are created here , based on this urllib3 documentation it seems we need to replace the PoolManager with ProxyManager, I think once that is done it should work.
What do you think?

from clearml-agent.

212792736 avatar 212792736 commented on June 14, 2024

Hoi @bmartinn

Thanks for looking into the solutions, I think this helped me to realise a nice trick, and that is to set NO_PROXY variable (import os; os.environ['NO_PROXY'] = 'IP_OF_THE_TRAINS_SERVER') in the file you mentioned
That made me realise that actually all I need is to "set no_proxy=IP_of_the_server" same as it is described here
Now I can run the trains-server and I get these responses (after the config file was printed out, and I just hit the IP addresses ):

(...)
DEBUG:trains_agent.commands.worker:starting resource monitor thread
Worker "G4H3Q5S2E:0" - Listening to queues:
+----------------------------------+---------+-------+
| id                               | name    | tags  |
+----------------------------------+---------+-------+
| 44eab786974744cb91a9fe5796fa305e | default |       |
+----------------------------------+---------+-------+

DEBUG:urllib3.connectionpool:Resetting dropped connection: server
DEBUG:urllib3.connectionpool:http://server:8008 "GET /workers.register HTTP/1.1" 200 261
Running TRAINS-AGENT daemon in background mode, writing stdout/stderr to C:\Users\212792~1\AppData\Local\Temp\.trains_agent_daemon_outea7jspr8.txt
DEBUG:urllib3.connectionpool:Resetting dropped connection: 10.136.16.82
DEBUG:urllib3.connectionpool:http://server:8008 "GET /v2.5/queues.get_all HTTP/1.1" 200 321
DEBUG:urllib3.connectionpool:Resetting dropped connection: 10.136.16.82
DEBUG:urllib3.connectionpool:http://server::8008 "GET /v2.5/queues.get_next_task HTTP/1.1" 200 265

That would mean that this problem is then solved, I believe, I thank you for you support and great work!

from clearml-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.