Dear author: When I trained the model, I found the Volatile GPU-Util equal to 0%.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The Volatile GPU-Util 0% about db HOT 5 CLOSED

mhliao commented on June 10, 2024

The Volatile GPU-Util 0%

from db.

Comments (5)

MhLiao commented on June 10, 2024

@zjuzxm Is the GPU-Util always 0%? How many images do your training data contain? You can check whether it is blocked in the dataloader. The data loader will prepare the data at the beginning and the preparing time would be longer if the number of your training data is larger.

from db.

zjuzxm commented on June 10, 2024

I checked the log, it seemed like the there is something wrong with the training. I used 4 Titian GPUs to train the total_text with command
python train.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --num_gpus 4
But the GPU-Util is always 0%, and the log are as follows.
[INFO] [2019-12-10 07:59:42,832] Training epoch 0 [INFO] [2019-12-10 07:59:47,038] Training epoch 0 [INFO] [2019-12-10 07:59:59,624] Training epoch 0 [INFO] [2019-12-10 07:59:59,747] Training epoch 0 [INFO] [2019-12-10 07:59:59,867] Training epoch 0 [INFO] [2019-12-10 08:00:11,334] Training epoch 0 [INFO] [2019-12-10 08:00:16,321] Training epoch 0 [INFO] [2019-12-10 08:02:45,156] Training epoch 0 [INFO] [2019-12-10 08:18:24,331] step: 0, epoch: 0, loss: 40.107483, lr: 0.007000 [INFO] [2019-12-10 08:18:24,347] bce_loss: 7.067471 [INFO] [2019-12-10 08:18:24,348] thresh_loss: 0.953832 [INFO] [2019-12-10 08:18:24,349] l1_loss: 0.381630 [INFO] [2019-12-10 08:37:19,139] Training epoch 1 [INFO] [2019-12-10 09:20:08,563] Training epoch 2 [INFO] [2019-12-10 10:05:06,779] Training epoch 3 [INFO] [2019-12-10 10:54:39,408] Training epoch 4 [INFO] [2019-12-10 11:32:45,776] Training epoch 5 [INFO] [2019-12-10 11:49:39,380] step: 450, epoch: 5, loss: 3.489134, lr: 0.006974 [INFO] [2019-12-10 11:49:39,422] bce_loss: 0.450540 [INFO] [2019-12-10 11:49:39,426] thresh_loss: 0.402760 [INFO] [2019-12-10 11:49:39,439] l1_loss: 0.083367 [INFO] [2019-12-10 11:50:02,397] Training epoch 6

from db.

MhLiao commented on June 10, 2024

@zjuzxm Your training speed is too slow. It only takes about 1~2 minutes/epoch for me to train the total-text dataset.

from db.

rogerxsj commented on June 10, 2024

@MhLiao Dear author:
I am trying to pretrain the dbnet with synthtext-800k, could you provide me the *.yaml file as a reference？ Thanks a lot！

from db.

MhLiao commented on June 10, 2024

@rogerxsj The only difference is the two parameters epochs in the YAML file are both set to 400.

from db.

Recommend Projects

The Volatile GPU-Util 0% about db HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent