My execute command is : <div class="snippet-clipboard-content notranslate position

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

when I train the model from duke to market, it takes too much time and not finish yet after two days. about domainadaptivereid HOT 4 CLOSED

zhuhongyue commented on July 17, 2024

when I train the model from duke to market, it takes too much time and not finish yet after two days.

from domainadaptivereid.

Comments (4)

lsongx commented on July 17, 2024

@zhuhongyue I trained on two Titan X, and finished in about 12 hours for 30 iterations. The GPU-Util of your posted info is 0%, but the Memory-Usage is normal. I guess that there is some problem with your dataloader, maybe because of the io of your disc. You may try watch -n 0.5 nvidia-smi to see whether the GPU-Util always stays around 0%.

from domainadaptivereid.

zhuhongyue commented on July 17, 2024

@LcDog
Thank you for your reply!
I noticed the issue about GPU-Util which remain 0% most of time. Thus, considering your suggestion, I check my disk state and find nothing strange. I guess the problem is related to the dataloader.
So I run your program step by step and I notice that:

     source_features, _ = extract_features(model, src_extfeat_loader, print_freq=args.print_freq)
     target_features, _ = extract_features(model, tgt_extfeat_loader, print_freq=args.print_freq)
     rerank_dist = re_ranking(source_features, target_features, lambda_value=args.lambda_value)

these three code cost around 30 miniutes every iteration and I suppose it means only these data preprocessing cost about 15 hours in 30 iterations.
I am wondering how much time these code cost in your environment?
In addition, because of the limitation of gpu resources, I cannot test how much time the train part cost every iteration while another gpu program is running. Hence I will check the training time tommorow.

from domainadaptivereid.

lsongx commented on July 17, 2024

@zhuhongyue extract_features is done on GPU, but re_ranking is written by numpy, thus the process is running on CPU. Our CPU is Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz and re_ranking cost less than 5mins. Besides, extract_features copy CUDA variables to CPU memory, and this process also heavily depends on the hardware...

You can set lambda_value=0 for a quiker version. Though the results maybe more unstable with this setting.

from domainadaptivereid.

zhuhongyue commented on July 17, 2024

When I set only one gpu visible, the whole training process cost about 15 hours and the gpu utilization rate become normal as expected.
I suppose this issue is solved.
by the way, my command is

CUDA_VISIBLE_DEVICES=0  python2 selftraining.py  --src_dataset dukemtmc\
	        --tgt_dataset market1501\
		--resume dukemtmc_trained.pth.tar\
		--data_dir data  \
		--logs_dir log

from domainadaptivereid.

when I train the model from duke to market, it takes too much time and not finish yet after two days. about domainadaptivereid HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent