Comments (5)
@zjuzxm Is the GPU-Util always 0%? How many images do your training data contain? You can check whether it is blocked in the dataloader. The data loader will prepare the data at the beginning and the preparing time would be longer if the number of your training data is larger.
from db.
I checked the log, it seemed like the there is something wrong with the training. I used 4 Titian GPUs to train the total_text with command
python train.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --num_gpus 4
But the GPU-Util is always 0%, and the log are as follows.
[INFO] [2019-12-10 07:59:42,832] Training epoch 0 [INFO] [2019-12-10 07:59:47,038] Training epoch 0 [INFO] [2019-12-10 07:59:59,624] Training epoch 0 [INFO] [2019-12-10 07:59:59,747] Training epoch 0 [INFO] [2019-12-10 07:59:59,867] Training epoch 0 [INFO] [2019-12-10 08:00:11,334] Training epoch 0 [INFO] [2019-12-10 08:00:16,321] Training epoch 0 [INFO] [2019-12-10 08:02:45,156] Training epoch 0 [INFO] [2019-12-10 08:18:24,331] step: 0, epoch: 0, loss: 40.107483, lr: 0.007000 [INFO] [2019-12-10 08:18:24,347] bce_loss: 7.067471 [INFO] [2019-12-10 08:18:24,348] thresh_loss: 0.953832 [INFO] [2019-12-10 08:18:24,349] l1_loss: 0.381630 [INFO] [2019-12-10 08:37:19,139] Training epoch 1 [INFO] [2019-12-10 09:20:08,563] Training epoch 2 [INFO] [2019-12-10 10:05:06,779] Training epoch 3 [INFO] [2019-12-10 10:54:39,408] Training epoch 4 [INFO] [2019-12-10 11:32:45,776] Training epoch 5 [INFO] [2019-12-10 11:49:39,380] step: 450, epoch: 5, loss: 3.489134, lr: 0.006974 [INFO] [2019-12-10 11:49:39,422] bce_loss: 0.450540 [INFO] [2019-12-10 11:49:39,426] thresh_loss: 0.402760 [INFO] [2019-12-10 11:49:39,439] l1_loss: 0.083367 [INFO] [2019-12-10 11:50:02,397] Training epoch 6
from db.
@zjuzxm Your training speed is too slow. It only takes about 1~2 minutes/epoch for me to train the total-text dataset.
from db.
@MhLiao Dear author:
I am trying to pretrain the dbnet with synthtext-800k, could you provide me the *.yaml file as a reference? Thanks a lot!
from db.
@rogerxsj The only difference is the two parameters epochs
in the YAML file are both set to 400.
from db.
Related Issues (20)
- ICDAR 2015 validation
- 百度网盘地址失效了 HOT 2
- are there any resources for sorting the bounding boxes ?
- Spelling of prepare is wrong in README
- could you add a license file to your repo?
- ValueError: num_samples should be a positive integer value, but got num_samples=0
- 计算FPS时,--speed报错ZeroDivisionError: division by zero
- Error when converting checkpoint to ONNX HOT 3
- eval.py 报错:RuntimeError: Attempted to set the storage of a tensor on device "cuda:0" to a storage on different device "cpu". This is no longer allowed; the devices must match. HOT 4
- eval结果为0 HOT 6
- 想使用torch自带的dcn,代替代码中的DCN编译算子,请问大家如何下手?还是建议不修改 HOT 1
- The model trained by ASF config can't be used by demo.py
- Could not find a version that satisfies the requirement opencv-python==4.1.2.30 HOT 1
- CUDA error: device-side assert triggered
- 运行python setup.py build_ext --inplace命令时出现错误 HOT 2
- Unknown CUDA
- .
- TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
- Sizes of tensors must match except in dimension 0. Got 2 and 1 in dimension 1 HOT 1
- Calculation of Valiation loss
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from db.