I have set up a ClearML server. Currently, I am training a YOLO model using the YOLOv5

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

generating clearml-reports about clearml-server HOT 13 OPEN

sriram-dsl commented on June 24, 2024

generating clearml-reports

from clearml-server.

Comments (13)

sriram-dsl commented on June 24, 2024

ClearML is configured with the correct credentials. It is successfully sending the data to the server but is not sending the generated reports.

from clearml-server.

ainoam commented on June 24, 2024

@sriram-dsl which versions of clearml and clearml-server are you using?

The results should be logged by clearml AS WELL AS stored in your save_dir. Different results are stored in different places: Metrics are shown under 'Scalars', summary images and curves under 'Plots' and training samples under 'Debug samples'.

Which are you missing?

BTW - you might have a typo in the invocation you provided: --data/data.yaml does not seem right.

from clearml-server.

sriram-dsl commented on June 24, 2024

if i launch a training it should appear under Projects tab right?
how many are running, which is completed, plots, graps and training results
i got all those in a normal environment
but inside the container, iam getting issue

from clearml-server.

sriram-dsl commented on June 24, 2024

@sriram-dsl which versions of clearml and clearml-server are you using?

The results should be logged by clearml AS WELL AS stored in your save_dir. Different results are stored in different places: Metrics are shown under 'Scalars', summary images and curves under 'Plots' and training samples under 'Debug samples'.

Which are you missing?

BTW - you might have a typo in the invocation you provided: --data/data.yaml does not seem right.

typo mistake that is --data data/data.yaml

from clearml-server.

sriram-dsl commented on June 24, 2024

Is the logging issue related to a specific commit ID? Need clarification on this

from clearml-server.

ainoam commented on June 24, 2024

@sriram-dsl we're asking for ClearML versions in order to try to validate ClearML behaviour as closely as possible to your environment.
From your description so far, it sounds like there might be a connectivity issue between your container environment and your ClearML server? How are you setting up the container? How are you running your training inside the container? Can you provide some logs for the training?

from clearml-server.

sriram-dsl commented on June 24, 2024

ClearML version: 1.13.2
I have set up an Ubuntu 20.04 container and cloned a YOLOv5 repository inside it to conduct training. Now, I aim to send the logs generated from this container to a server.

root@104ab3100c7d:~/yolov5# python3 train.py --img 320 --batch 2 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache --project container --name testing
train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=3, batch_size=2, imgsz=320, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=ram, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=container, name=testing, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: ⚠️ YOLOv5 is out of date by 743 commits. Use `git pull` or `git clone https://github.com/ultralytics/yolov5` to update.
YOLOv5 🚀 v6.1-211-gcee5959c Python-3.8.10 torch-1.11.0+cu102 CPU

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED)
TensorBoard: Start with 'tensorboard --logdir container', view at http://localhost:6006/

Dataset not found ⚠, missing paths ['/root/datasets/coco128/images/train2017']
Downloading https://ultralytics.com/assets/coco128.zip to coco128.zip...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.66M/6.66M [00:00<00:00, 7.08MB/s]
Dataset download success ✅ (3.7s), saved to /root/datasets

.........

Model summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

Transferred 349/349 items from yolov5s.pt
Scaled weight_decay = 0.0005
optimizer: SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias
train: Scanning '/root/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 [00:0
train: New cache created: /root/datasets/coco128/labels/train2017.cache
train: Caching images (0.0GB ram): 100%|██████████| 128/128 [00:00<00:00, 2060.30it/s]                                                                
val: Scanning '/root/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 
val: Caching images (0.0GB ram): 100%|██████████| 128/128 [00:00<00:00, 1655.63it/s]                                                                  
Plotting labels to container/testing3/labels.jpg... 

AutoAnchor: 3.96 anchors/target, 0.957 Best Possible Recall (BPR). Anchors are a poor fit to dataset ⚠️, attempting to improve...
AutoAnchor: WARNING: Extremely small objects found: 35 of 929 labels are < 3 pixels in size
AutoAnchor: Running kmeans for 9 anchors on 927 points...
AutoAnchor: Evolving anchors with Genetic Algorithm: fitness = 0.6699: 100%|██████████| 1000/1000 [00:00<00:00, 4459.00it/s]                          
AutoAnchor: thr=0.25: 0.9935 best possible recall, 3.75 anchors past thr
AutoAnchor: n=9, img_size=320, metric_all=0.263/0.670-mean/best, past_thr=0.477-mean: 5,7, 12,11, 16,27, 43,36, 54,72, 62,145, 146,107, 166,191, 298,218
AutoAnchor: Done ✅ (optional: update model *.yaml to use these anchors in the future)
Image sizes 320 train, 320 val
Using 2 dataloader workers
Logging results to container/testing3
Starting training for 3 epochs...

..........

3 epochs completed in 0.013 hours.
Optimizer stripped from container/testing3/weights/last.pt, 14.7MB
Optimizer stripped from container/testing3/weights/best.pt, 14.7MB

Validating container/testing3/weights/best.pt...
Fusing layers... 
Model summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
               Class     Images     Labels          P          R     [email protected] [email protected]:.95:   0%|          | 0/32 [00:00<?, ?it/s]  

..........
Results saved to container/testing3

from clearml-server.

jkhenning commented on June 24, 2024

Did you set up clearml.conf inside your container?

from clearml-server.

sriram-dsl commented on June 24, 2024

Did you set up clearml.conf inside your container?

Yes, I've installed ClearML and configured the ClearML client using credentials generated from the self-hosting server.

from clearml-server.

eugen-ajechiloae-clearml commented on June 24, 2024

Hi @sriram-dsl ! It is possible that this may solve your problem:
If you init the task manually: can you please try initializing your clearml task using output_uri=True? You can set it to the location you upload the model to, or set sdk.development.default_output_uri (or even CLEARML_DEFAULT_OUTPUT_URI env var) to the file server you want the model to be uploaded to. It can be the same as the file server used under the api section in clearml.conf.

from clearml-server.

sriram-dsl commented on June 24, 2024

this is the api and sdk section in the clearml.conf

# ClearML SDK configuration file
api {
    # Notice: 'host' is the api server (default port 8008), not the web server.
    api_server: http://<public_ip>:8008
    web_server: http://<public_ip>:8082
    files_server: http://<public_ip>:8081
    # Credentials are generated using the webapp, http://<public_ip>:8082/settings
    # Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
    credentials {"access_key": "OTVT8HO*********5RXU", "secret_key": "OZClLTtposYbKC5UVA*****************4LvM7yhiaFm06"}
}
sdk {
    # ClearML - default SDK configuration

    storage {
        cache {
            # Defaults to <system_temp_folder>/clearml_cache
            default_base_dir: "~/.clearml/cache"
            # default_cache_manager_size: 100
        }

        direct_access: [
            # Objects matching are considered to be available for direct access, i.e. they will not be downloaded
            # or cached, and any download request will return a direct reference.
            # Objects are specified in glob format, available for url and content_type.
            { url: "file://*" }  # file-urls are always directly referenced
        ]
    }

do i need to add any key:value pairs in it to log results while launching yolov5 training
iam initialising the training with the basic yolov5 training command, mention in here #225 (comment)

from clearml-server.

sriram-dsl commented on June 24, 2024

@eugen-ajechiloae-clearml, @jkhenning, @ainoam

Something similar to this might solve my problem: allegroai/clearml#363
In other words, should I make changes inside yolov5/train.py?

My requirement is to log the experiment results generated during training to the ClearML server. The training is conducted inside a container. I have set up an Ubuntu 20.04 container, cloned the YOLOv5 repository inside the Ubuntu container, and started training using that YOLOv5 Git repository.

from clearml-server.

eugen-ajechiloae-clearml commented on June 24, 2024

Hi @sriram-dsl ! If you wish to modify train.py, you could use a dataset, or you could use the OutputModel https://clear.ml/docs/latest/docs/references/sdk/model_outputmodel class to upload your models if it fits your case.
Usually, datasets are used to store more than just models (like training datasets).

from clearml-server.

generating clearml-reports about clearml-server HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent