Comments (13)
ClearML is configured with the correct credentials. It is successfully sending the data to the server but is not sending the generated reports.
from clearml-server.
@sriram-dsl which versions of clearml and clearml-server are you using?
The results should be logged by clearml AS WELL AS stored in your save_dir
. Different results are stored in different places: Metrics are shown under 'Scalars', summary images and curves under 'Plots' and training samples under 'Debug samples'.
Which are you missing?
BTW - you might have a typo in the invocation you provided: --data/data.yaml
does not seem right.
from clearml-server.
if i launch a training it should appear under Projects
tab right?
how many are running, which is completed, plots, graps and training results
i got all those in a normal environment
but inside the container, iam getting issue
from clearml-server.
@sriram-dsl which versions of clearml and clearml-server are you using?
The results should be logged by clearml AS WELL AS stored in your
save_dir
. Different results are stored in different places: Metrics are shown under 'Scalars', summary images and curves under 'Plots' and training samples under 'Debug samples'.Which are you missing?
BTW - you might have a typo in the invocation you provided:
--data/data.yaml
does not seem right.
typo mistake that is --data data/data.yaml
from clearml-server.
Is the logging issue related to a specific commit ID? Need clarification on this
from clearml-server.
@sriram-dsl we're asking for ClearML versions in order to try to validate ClearML behaviour as closely as possible to your environment.
From your description so far, it sounds like there might be a connectivity issue between your container environment and your ClearML server? How are you setting up the container? How are you running your training inside the container? Can you provide some logs for the training?
from clearml-server.
ClearML version: 1.13.2
I have set up an Ubuntu 20.04 container and cloned a YOLOv5 repository inside it to conduct training. Now, I aim to send the logs generated from this container to a server.
root@104ab3100c7d:~/yolov5# python3 train.py --img 320 --batch 2 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache --project container --name testing
train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=3, batch_size=2, imgsz=320, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=container, name=testing, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: ⚠️ YOLOv5 is out of date by 743 commits. Use `git pull` or `git clone https://github.com/ultralytics/yolov5` to update.
YOLOv5 🚀 v6.1-211-gcee5959c Python-3.8.10 torch-1.11.0+cu102 CPU
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir container', view at http://localhost:6006/
Dataset not found ⚠, missing paths ['/root/datasets/coco128/images/train2017']
Downloading https://ultralytics.com/assets/coco128.zip to coco128.zip...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.66M/6.66M [00:00<00:00, 7.08MB/s]
Dataset download success ✅ (3.7s), saved to /root/datasets
.........
Model summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs
Transferred 349/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias
train: Scanning '/root/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 [00:0
train: New cache created: /root/datasets/coco128/labels/train2017.cache
train: Caching images (0.0GB ram): 100%|██████████| 128/128 [00:00<00:00, 2060.30it/s]
val: Scanning '/root/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128
val: Caching images (0.0GB ram): 100%|██████████| 128/128 [00:00<00:00, 1655.63it/s]
Plotting labels to container/testing3/labels.jpg...
AutoAnchor: 3.96 anchors/target, 0.957 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve...
AutoAnchor: WARNING: Extremely small objects found: 35 of 929 labels are < 3 pixels in size
AutoAnchor: Running kmeans for 9 anchors on 927 points...
AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.6699: 100%|██████████| 1000/1000 [00:00<00:00, 4459.00it/s]
AutoAnchor: thr=0.25: 0.9935 best possible recall, 3.75 anchors past thr
AutoAnchor: n=9, img_size=320, metric_all=0.263/0.670-mean/best, past_thr=0.477-mean: 5,7, 12,11, 16,27, 43,36, 54,72, 62,145, 146,107, 166,191, 298,218
AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future)
Image sizes 320 train, 320 val
Using 2 dataloader workers
Logging results to container/testing3
Starting training for 3 epochs...
..........
3 epochs completed in 0.013 hours.
Optimizer stripped from container/testing3/weights/last.pt, 14.7MB
Optimizer stripped from container/testing3/weights/best.pt, 14.7MB
Validating container/testing3/weights/best.pt...
Fusing layers...
Model summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Class Images Labels P R [email protected] [email protected]:.95: 0%| | 0/32 [00:00<?, ?it/s]
..........
Results saved to container/testing3
from clearml-server.
Did you set up clearml.conf
inside your container?
from clearml-server.
Did you set up
clearml.conf
inside your container?
Yes, I've installed ClearML and configured the ClearML client using credentials generated from the self-hosting server.
from clearml-server.
Hi @sriram-dsl ! It is possible that this may solve your problem:
If you init the task manually: can you please try initializing your clearml task using output_uri=True
? You can set it to the location you upload the model to, or set sdk.development.default_output_uri
(or even CLEARML_DEFAULT_OUTPUT_URI
env var) to the file server you want the model to be uploaded to. It can be the same as the file server used under the api
section in clearml.conf
.
from clearml-server.
this is the api and sdk section in the clearml.conf
# ClearML SDK configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://<public_ip>:8008
web_server: http://<public_ip>:8082
files_server: http://<public_ip>:8081
# Credentials are generated using the webapp, http://<public_ip>:8082/settings
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "OTVT8HO*********5RXU", "secret_key": "OZClLTtposYbKC5UVA*****************4LvM7yhiaFm06"}
}
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to <system_temp_folder>/clearml_cache
default_base_dir: "~/.clearml/cache"
# default_cache_manager_size: 100
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_type.
{ url: "file://*" } # file-urls are always directly referenced
]
}
do i need to add any key:value pairs in it to log results while launching yolov5 training
iam initialising the training with the basic yolov5 training command, mention in here #225 (comment)
from clearml-server.
@eugen-ajechiloae-clearml, @jkhenning, @ainoam
Something similar to this might solve my problem: allegroai/clearml#363
In other words, should I make changes inside yolov5/train.py?
My requirement is to log the experiment results generated during training to the ClearML server. The training is conducted inside a container. I have set up an Ubuntu 20.04 container, cloned the YOLOv5 repository inside the Ubuntu container, and started training using that YOLOv5 Git repository.
from clearml-server.
Hi @sriram-dsl ! If you wish to modify train.py
, you could use a dataset, or you could use the OutputModel
https://clear.ml/docs/latest/docs/references/sdk/model_outputmodel class to upload your models if it fits your case.
Usually, datasets are used to store more than just models (like training datasets).
from clearml-server.
Related Issues (20)
- Could not find host server definition HOT 5
- Feature Request: Get server configuration parameters from AWS Secrets Manager [security]
- [Customising web-ui] - Projects are loading tasks in web ui of self hosting server but i want them to show datasets HOT 3
- How to write artifacts to S3 from server side? HOT 1
- Nginx Not Loading Plotly.js Resource: ClearML Self-Hosted Docker HOT 7
- Failed Navigate From Overview to Experiments Details HOT 4
- Async Delete Always Failed when Removing Experiments (using Minio)
- nginx 0.6.x < 1.20.1 1-Byte Memory Overwrite RCE vulnerability HOT 2
- ElasticSearch UI and Redis UI? HOT 2
- The problem with scalars HOT 12
- Curl 7.69 < 8.4.0 Heap Buffer Overflow vulnerability HOT 2
- OpenSSL 1.1.1 < 1.1.1x Vulnerability HOT 1
- Elasticsearch image tag 7.17 does not exist HOT 4
- Git package is not installed by default in node:20-bookworm-slim HOT 1
- SERVER UNAVAILABLE HOT 4
- APP Credentials disapper in webapp HOT 20
- Scalar graphs legend is too narrow for experiments with long names HOT 7
- Update from 1.14.1 to 1.15.0 leads to several fatal issues when booting HOT 3
- AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-server.