Code Monkey home page Code Monkey logo

Comments (5)

jkhenning avatar jkhenning commented on June 14, 2024

Hi @bladesaber,

Are you sure the IP you provided is reachable (from the same machine)? Can you simply try:

api {
  api_server: http://localhost:8008
  web_server: http://localhost:8080
  files_server: http://localhost:8081
  ...
}

from clearml-agent.

bladesaber avatar bladesaber commented on June 14, 2024

Thanks for your reply jkhenning,

1,After I add my agent.git_user="" agent.git_pass="" in the trans.conf, it seems that it can connect the TRAINS API server . So is it necessary to fill the git username and password in the trans.conf file ? because I am in a team condition, I am not sure whether it is safe to expose the private information.

2,after that I meet some other problem, this is my log:
Successfully installed Markdown-3.2.2 Pillow-7.2.0 PyJWT-1.7.1 PyYAML-5.3.1 Werkzeug-1.0.1 absl-py-0.10.0 ...
Running task id [f8a9275a03e64d7a82c3387aaa31c602]:
[allegro-ai]$ /home/bladesaber/.trains/venvs-builds.1/3.8/bin/python -u test-allegro.py
Summary - installed python packages:
pip:

  • absl-py==0.10.0
  • actionlib==1.12.0
  • angles==1.9.12
  • attrs==20.2.0
  • bondpy==1.8.5
  • boto3==1.15.0
  • botocore==1.18.2
  • cachetools==4.1.1
  • camera-calibration==1.15.0
    ......
  • urdfdom-py==0.4.3
  • urllib3==1.25.10
  • Werkzeug==1.0.1
  • xacro==1.13.6
    Environment setup completed successfully
    Starting Task Execution:
    Leaving process id 342216

it seems that some thing wrong when execute the code: python -u test-allegro.py

thanks

from clearml-agent.

jkhenning avatar jkhenning commented on June 14, 2024

Hi @bladesaber,

1,After I add my agent.git_user="" agent.git_pass="" in the trans.conf, it seems that it can connect the TRAINS API server

This is very strange - the agent.git_user and agent.git_pass settings are completely unrelated to how the agent connects to the Trains Server. Are you sure these changes were not done together with changing the api.api_server settings as I suggested?
In any case, it is not necessary to fill the git username and password in the trans.conf if you don't require the agent do download the experiment code from a password-protected git repository.

2,after that I meet some other problem, this is my log

I'm not sure I understand the problem - do you mean the problem is Leaving process id 342216 immediately after Starting Task Execution:? If so, can you share at least a skeleton of your code?

from clearml-agent.

bladesaber avatar bladesaber commented on June 14, 2024

Hi @jkhenning
super thanks for your support :)

1, The first question is my fault, I make the wrong configured. ^_^
2,the code I use to test comes from https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_mnist.py, I do no thing change. At first, I debug it by pycharm based on my local python interpreter, everything is good. It throw out the error when I use enqueue by TRAINS-server UI. the complete log is below

(base) ***@bladesaber-MS-7C02:~/Desktop$ trains-agent daemon --queue default --foreground
Current configuration (trains_agent v0.16.0, location: /home/bladesaber/trains.conf):

api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://10.53.9.37:8008
api.web_server = http://localhost:8080
api.files_server = http://localhost:8081
api.credentials.access_key = 2W4HYIS2MY0Z03H2E4YV
api.host = http://10.53.9.37:8008
agent.worker_id =
agent.worker_name = bladesaber-MS-7C02
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = defaults
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = pytorch
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/bladesaber/.trains/venvs-builds
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/bladesaber/.trains/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/bladesaber/.trains/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/bladesaber/.trains/pip-cache
agent.docker_apt_cache = /home/bladesaber/.trains/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.0-base
agent.git_user =
agent.default_python = 3.8
agent.cuda_version = 100
agent.cudnn_version = 76
sdk.storage.cache.default_base_dir = ~/.trains/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false

Worker "bladesaber-MS-7C02:0" - Listening to queues:
+----------------------------------+---------+-------+
| id | name | tags |
+----------------------------------+---------+-------+
| b80b1a1f86e94a11a75e43a1f2c315f3 | default | |
+----------------------------------+---------+-------+

No tasks in queue b80b1a1f86e94a11a75e43a1f2c315f3
No tasks in Queues, sleeping for 5.0 seconds
task 113333b6379d4a82a8627103163c5c1a pulled from b80b1a1f86e94a11a75e43a1f2c315f3 by worker bladesaber-MS-7C02:0
Running task '113333b6379d4a82a8627103163c5c1a'
Storing stdout and stderr log to '/tmp/.trains_agent_out.2qmpc2ow.txt', '/tmp/.trains_agent_out.2qmpc2ow.txt'
Current configuration (trains_agent v0.16.0, location: /tmp/.trains_agent.s_u8xgmi.cfg):

api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://10.53.9.37:8008
api.web_server = http://localhost:8080
api.files_server = http://localhost:8081
api.credentials.access_key = 2W4HYIS2MY0Z03H2E4YV
api.host = http://10.53.9.37:8008
agent.worker_id = bladesaber-MS-7C02:0
agent.worker_name = bladesaber-MS-7C02
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = defaults
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = pytorch
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/bladesaber/.trains/venvs-builds
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/bladesaber/.trains/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/bladesaber/.trains/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/bladesaber/.trains/pip-cache
agent.docker_apt_cache = /home/bladesaber/.trains/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.0-base
agent.git_user =
agent.default_python = 3.8
agent.cuda_version = 100
agent.cudnn_version = 76
sdk.storage.cache.default_base_dir = ~/.trains/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
Executing task id [113333b6379d4a82a8627103163c5c1a]:
repository = https://github.com/bladesaber/Detection_Library.git
branch = master
version_num = b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29
tag =
entry_point = test-allegro.py
working_dir = allegro-ai
Using base prefix '/home/bladesaber/anaconda3'
New python executable in /home/bladesaber/.trains/venvs-builds/3.8/bin/python3.8
Also creating executable in /home/bladesaber/.trains/venvs-builds/3.8/bin/python
Installing setuptools, pip, wheel...
done.
Using cached repository in "/home/bladesaber/.trains/vcs-cache/Detection_Library.git.d4fb6d935049fbfa2eee92d1d7386a90/Detection_Library.git"
Note: checking out 'b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b
HEAD is now at b5ba1f3 2020-9-21
type: git
url: https://github.com/bladesaber/Detection_Library.git
branch: HEAD
commit: b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29
root: /home/bladesaber/.trains/venvs-builds/3.8/task_repository/Detection_Library.git
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
Successfully installed pip-20.1.1
Collecting Cython
Using cached Cython-0.29.21-cp38-cp38-manylinux1_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.21
Collecting torch==1.4.0+cu100
... (some package log I ignore)
Collecting pytz
Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting googleapis-common-protos<2.0dev,>=1.6.0
Installing collected packages: six, pycparser, cffi, cryptography, isodate, certifi, chardet, idna, urllib3, requests, oauthlib, requests-oauthlib, msrest, azure-core, azure-storage-blob, jmespath, python-dateutil, botocore, s3transfer, boto3, google-crc32c,

Running task id [113333b6379d4a82a8627103163c5c1a]:
[allegro-ai]$ /home/bladesaber/.trains/venvs-builds/3.8/bin/python -u test-allegro.py
Summary - installed python packages:
pip:

  • absl-py==0.10.0
  • actionlib==1.12.0
  • angles==1.9.12
    ... (some unimportant information I ignore)
  • urllib3==1.25.10
  • Werkzeug==1.0.1
  • xacro==1.13.6
    Environment setup completed successfully
    Starting Task Execution:
    Leaving process id 3790
    DONE: Running task '113333b6379d4a82a8627103163c5c1a', exit status 255
    No tasks in queue b80b1a1f86e94a11a75e43a1f2c315f3
    No tasks in Queues, sleeping for 5.0 seconds

King regards

from clearml-agent.

bladesaber avatar bladesaber commented on June 14, 2024

Yes it is leaving process id 342216 immediately after Starting Task Execution

from clearml-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.