Comments (5)
Hi @bladesaber,
Are you sure the IP you provided is reachable (from the same machine)? Can you simply try:
api {
api_server: http://localhost:8008
web_server: http://localhost:8080
files_server: http://localhost:8081
...
}
from clearml-agent.
Thanks for your reply jkhenning,
1,After I add my agent.git_user="" agent.git_pass="" in the trans.conf, it seems that it can connect the TRAINS API server . So is it necessary to fill the git username and password in the trans.conf file ? because I am in a team condition, I am not sure whether it is safe to expose the private information.
2,after that I meet some other problem, this is my log:
Successfully installed Markdown-3.2.2 Pillow-7.2.0 PyJWT-1.7.1 PyYAML-5.3.1 Werkzeug-1.0.1 absl-py-0.10.0 ...
Running task id [f8a9275a03e64d7a82c3387aaa31c602]:
[allegro-ai]$ /home/bladesaber/.trains/venvs-builds.1/3.8/bin/python -u test-allegro.py
Summary - installed python packages:
pip:
- absl-py==0.10.0
- actionlib==1.12.0
- angles==1.9.12
- attrs==20.2.0
- bondpy==1.8.5
- boto3==1.15.0
- botocore==1.18.2
- cachetools==4.1.1
- camera-calibration==1.15.0
...... - urdfdom-py==0.4.3
- urllib3==1.25.10
- Werkzeug==1.0.1
- xacro==1.13.6
Environment setup completed successfully
Starting Task Execution:
Leaving process id 342216
it seems that some thing wrong when execute the code: python -u test-allegro.py
thanks
from clearml-agent.
Hi @bladesaber,
1,After I add my agent.git_user="" agent.git_pass="" in the trans.conf, it seems that it can connect the TRAINS API server
This is very strange - the agent.git_user
and agent.git_pass
settings are completely unrelated to how the agent connects to the Trains Server. Are you sure these changes were not done together with changing the api.api_server
settings as I suggested?
In any case, it is not necessary to fill the git username and password in the trans.conf if you don't require the agent do download the experiment code from a password-protected git repository.
2,after that I meet some other problem, this is my log
I'm not sure I understand the problem - do you mean the problem is Leaving process id 342216
immediately after Starting Task Execution:
? If so, can you share at least a skeleton of your code?
from clearml-agent.
Hi @jkhenning
super thanks for your support :)
1, The first question is my fault, I make the wrong configured. ^_^
2,the code I use to test comes from https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_mnist.py, I do no thing change. At first, I debug it by pycharm based on my local python interpreter, everything is good. It throw out the error when I use enqueue by TRAINS-server UI. the complete log is below
(base) ***@bladesaber-MS-7C02:~/Desktop$ trains-agent daemon --queue default --foreground
Current configuration (trains_agent v0.16.0, location: /home/bladesaber/trains.conf):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://10.53.9.37:8008
api.web_server = http://localhost:8080
api.files_server = http://localhost:8081
api.credentials.access_key = 2W4HYIS2MY0Z03H2E4YV
api.host = http://10.53.9.37:8008
agent.worker_id =
agent.worker_name = bladesaber-MS-7C02
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = defaults
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = pytorch
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/bladesaber/.trains/venvs-builds
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/bladesaber/.trains/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/bladesaber/.trains/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/bladesaber/.trains/pip-cache
agent.docker_apt_cache = /home/bladesaber/.trains/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.0-base
agent.git_user =
agent.default_python = 3.8
agent.cuda_version = 100
agent.cudnn_version = 76
sdk.storage.cache.default_base_dir = ~/.trains/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
Worker "bladesaber-MS-7C02:0" - Listening to queues:
+----------------------------------+---------+-------+
| id | name | tags |
+----------------------------------+---------+-------+
| b80b1a1f86e94a11a75e43a1f2c315f3 | default | |
+----------------------------------+---------+-------+
No tasks in queue b80b1a1f86e94a11a75e43a1f2c315f3
No tasks in Queues, sleeping for 5.0 seconds
task 113333b6379d4a82a8627103163c5c1a pulled from b80b1a1f86e94a11a75e43a1f2c315f3 by worker bladesaber-MS-7C02:0
Running task '113333b6379d4a82a8627103163c5c1a'
Storing stdout and stderr log to '/tmp/.trains_agent_out.2qmpc2ow.txt', '/tmp/.trains_agent_out.2qmpc2ow.txt'
Current configuration (trains_agent v0.16.0, location: /tmp/.trains_agent.s_u8xgmi.cfg):
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://10.53.9.37:8008
api.web_server = http://localhost:8080
api.files_server = http://localhost:8081
api.credentials.access_key = 2W4HYIS2MY0Z03H2E4YV
api.host = http://10.53.9.37:8008
agent.worker_id = bladesaber-MS-7C02:0
agent.worker_name = bladesaber-MS-7C02
agent.force_git_ssh_protocol = false
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = defaults
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = pytorch
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/bladesaber/.trains/venvs-builds
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/bladesaber/.trains/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/bladesaber/.trains/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/bladesaber/.trains/pip-cache
agent.docker_apt_cache = /home/bladesaber/.trains/apt-cache
agent.docker_force_pull = false
agent.default_docker.image = nvidia/cuda:10.0-base
agent.git_user =
agent.default_python = 3.8
agent.cuda_version = 100
agent.cudnn_version = 76
sdk.storage.cache.default_base_dir = ~/.trains/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
Executing task id [113333b6379d4a82a8627103163c5c1a]:
repository = https://github.com/bladesaber/Detection_Library.git
branch = master
version_num = b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29
tag =
entry_point = test-allegro.py
working_dir = allegro-ai
Using base prefix '/home/bladesaber/anaconda3'
New python executable in /home/bladesaber/.trains/venvs-builds/3.8/bin/python3.8
Also creating executable in /home/bladesaber/.trains/venvs-builds/3.8/bin/python
Installing setuptools, pip, wheel...
done.
Using cached repository in "/home/bladesaber/.trains/vcs-cache/Detection_Library.git.d4fb6d935049fbfa2eee92d1d7386a90/Detection_Library.git"
Note: checking out 'b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b
HEAD is now at b5ba1f3 2020-9-21
type: git
url: https://github.com/bladesaber/Detection_Library.git
branch: HEAD
commit: b5ba1f32bd1cb63b1ed3933f8b915d1567b2ba29
root: /home/bladesaber/.trains/venvs-builds/3.8/task_repository/Detection_Library.git
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.2.3
Uninstalling pip-20.2.3:
Successfully uninstalled pip-20.2.3
Successfully installed pip-20.1.1
Collecting Cython
Using cached Cython-0.29.21-cp38-cp38-manylinux1_x86_64.whl (1.9 MB)
Installing collected packages: Cython
Successfully installed Cython-0.29.21
Collecting torch==1.4.0+cu100
... (some package log I ignore)
Collecting pytz
Using cached pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting googleapis-common-protos<2.0dev,>=1.6.0
Installing collected packages: six, pycparser, cffi, cryptography, isodate, certifi, chardet, idna, urllib3, requests, oauthlib, requests-oauthlib, msrest, azure-core, azure-storage-blob, jmespath, python-dateutil, botocore, s3transfer, boto3, google-crc32c,
Running task id [113333b6379d4a82a8627103163c5c1a]:
[allegro-ai]$ /home/bladesaber/.trains/venvs-builds/3.8/bin/python -u test-allegro.py
Summary - installed python packages:
pip:
- absl-py==0.10.0
- actionlib==1.12.0
- angles==1.9.12
... (some unimportant information I ignore) - urllib3==1.25.10
- Werkzeug==1.0.1
- xacro==1.13.6
Environment setup completed successfully
Starting Task Execution:
Leaving process id 3790
DONE: Running task '113333b6379d4a82a8627103163c5c1a', exit status 255
No tasks in queue b80b1a1f86e94a11a75e43a1f2c315f3
No tasks in Queues, sleeping for 5.0 seconds
King regards
from clearml-agent.
Yes it is leaving process id 342216 immediately after Starting Task Execution
from clearml-agent.
Related Issues (20)
- poetry_install_extra_args passes arguments to poetry config HOT 1
- environment variables in default_docker arguments of clearml.conf not passed to container on first run HOT 2
- Yolo Execution with GPU HOT 4
- RAM / CPU cores partitioning for multiple agents on the same machine HOT 1
- Issue of checkout PR commit by sha HOT 1
- Image on Docker Hub is out of date HOT 12
- no module named "virtualenv" with execute_remotely HOT 5
- clearml-agent build not building a docker image HOT 10
- shh to http conversion fails with dev.azure HOT 2
- Run in a docker mode not passing envs (DIND) HOT 2
- gnutls_handshake() failed: An unexpected TLS HOT 4
- The cmd clearml-agent daemon stop marked ongoing Task as completed
- Docker container of the cloned task crashes/stucks. HOT 12
- Feature request: support for PDM package manager HOT 6
- error: could not write config file /root/.gitconfig: Device or resource busy - running clearml-agent in docker mode HOT 3
- install error PEP 503 HOT 1
- Feature: automatically install repo as pip package HOT 2
- ClearML does not find all packages HOT 4
- Use agent with dind HOT 2
- Agent on Mac doesn't pull tasks from queue and automatically unregisters from Server after a while HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-agent.