Code Monkey home page Code Monkey logo

Comments (4)

lsongx avatar lsongx commented on July 17, 2024

@zhuhongyue I trained on two Titan X, and finished in about 12 hours for 30 iterations. The GPU-Util of your posted info is 0%, but the Memory-Usage is normal. I guess that there is some problem with your dataloader, maybe because of the io of your disc. You may try watch -n 0.5 nvidia-smi to see whether the GPU-Util always stays around 0%.

from domainadaptivereid.

zhuhongyue avatar zhuhongyue commented on July 17, 2024

@LcDog
Thank you for your reply!
I noticed the issue about GPU-Util which remain 0% most of time. Thus, considering your suggestion, I check my disk state and find nothing strange. I guess the problem is related to the dataloader.
So I run your program step by step and I notice that:

     source_features, _ = extract_features(model, src_extfeat_loader, print_freq=args.print_freq)
     target_features, _ = extract_features(model, tgt_extfeat_loader, print_freq=args.print_freq)
     rerank_dist = re_ranking(source_features, target_features, lambda_value=args.lambda_value) 

these three code cost around 30 miniutes every iteration and I suppose it means only these data preprocessing cost about 15 hours in 30 iterations.
I am wondering how much time these code cost in your environment?
In addition, because of the limitation of gpu resources, I cannot test how much time the train part cost every iteration while another gpu program is running. Hence I will check the training time tommorow.

from domainadaptivereid.

lsongx avatar lsongx commented on July 17, 2024

@zhuhongyue extract_features is done on GPU, but re_ranking is written by numpy, thus the process is running on CPU. Our CPU is Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz and re_ranking cost less than 5mins. Besides, extract_features copy CUDA variables to CPU memory, and this process also heavily depends on the hardware...

You can set lambda_value=0 for a quiker version. Though the results maybe more unstable with this setting.

from domainadaptivereid.

zhuhongyue avatar zhuhongyue commented on July 17, 2024

When I set only one gpu visible, the whole training process cost about 15 hours and the gpu utilization rate become normal as expected.
I suppose this issue is solved.
by the way, my command is

CUDA_VISIBLE_DEVICES=0  python2 selftraining.py  --src_dataset dukemtmc\
	        --tgt_dataset market1501\
		--resume dukemtmc_trained.pth.tar\
		--data_dir data  \
		--logs_dir log

from domainadaptivereid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.