Code Monkey home page Code Monkey logo

Comments (7)

moyans avatar moyans commented on August 27, 2024 1

finnally the train loss eaqual 4 is normal?

from refinedet.

sfzhang15 avatar sfzhang15 commented on August 27, 2024

@zxt881108 Hi, the training log is too big, here is a part of training log of RefineDet512_ResNet101_COCO:
I1009 15:11:17.230608 5518 solver.cpp:243] Iteration 0, loss = 77.6168
I1009 15:11:17.230693 5518 solver.cpp:259] Train net output #0: arm_loss = 30.3417 (* 1 = 30.3417 loss)
I1009 15:11:17.230715 5518 solver.cpp:259] Train net output #1: odm_loss = 47.2751 (* 1 = 47.2751 loss)
I1009 15:11:17.230823 5518 sgd_solver.cpp:138] Iteration 0, lr = 0.001
I1009 22:51:56.919220 13991 solver.cpp:243] Iteration 10000, loss = 8.28546
I1009 22:51:56.919309 13991 solver.cpp:259] Train net output #0: arm_loss = 4.22539 (* 1 = 4.22539 loss)
I1009 22:51:56.919328 13991 solver.cpp:259] Train net output #1: odm_loss = 5.07301 (* 1 = 5.07301 loss)
I1009 22:51:58.349028 13991 sgd_solver.cpp:138] Iteration 10000, lr = 0.001
1010 06:42:09.443035 13991 solver.cpp:243] Iteration 20000, loss = 8.75266
I1010 06:42:09.443153 13991 solver.cpp:259] Train net output #0: arm_loss = 4.03777 (* 1 = 4.03777 loss)
I1010 06:42:09.443168 13991 solver.cpp:259] Train net output #1: odm_loss = 3.69968 (* 1 = 3.69968 loss)
I1010 06:42:10.775079 13991 sgd_solver.cpp:138] Iteration 20000, lr = 0.001
I1010 22:09:05.001247 13991 solver.cpp:243] Iteration 40000, loss = 8.14145
I1010 22:09:05.001334 13991 solver.cpp:259] Train net output #0: arm_loss = 3.60879 (* 1 = 3.60879 loss)
I1010 22:09:05.001350 13991 solver.cpp:259] Train net output #1: odm_loss = 3.71262 (* 1 = 3.71262 loss)
I1010 22:09:05.995534 13991 sgd_solver.cpp:138] Iteration 40000, lr = 0.001
I1012 05:23:05.806807 13991 solver.cpp:243] Iteration 80000, loss = 7.2509
I1012 05:23:05.806875 13991 solver.cpp:259] Train net output #0: arm_loss = 3.2211 (* 1 = 3.2211 loss)
I1012 05:23:05.806884 13991 solver.cpp:259] Train net output #1: odm_loss = 2.79722 (* 1 = 2.79722 loss)
I1012 05:23:06.111764 13991 sgd_solver.cpp:138] Iteration 80000, lr = 0.001
I1014 21:35:23.140607 13991 solver.cpp:243] Iteration 160000, loss = 6.43958
I1014 21:35:23.140681 13991 solver.cpp:259] Train net output #0: arm_loss = 3.72447 (* 1 = 3.72447 loss)
I1014 21:35:23.140689 13991 solver.cpp:259] Train net output #1: odm_loss = 2.4261 (* 1 = 2.4261 loss)
I1014 21:35:24.672145 13991 sgd_solver.cpp:138] Iteration 160000, lr = 0.001
I1019 00:20:20.329696 21651 solver.cpp:243] Iteration 280000, loss = 6.08943
I1019 00:20:20.329771 21651 solver.cpp:259] Train net output #0: arm_loss = 3.02973 (* 1 = 3.02973 loss)
I1019 00:20:20.329785 21651 solver.cpp:259] Train net output #1: odm_loss = 2.79212 (* 1 = 2.79212 loss)
I1019 00:20:21.572805 21651 sgd_solver.cpp:138] Iteration 280000, lr = 0.001
I1025 10:35:06.596961 22840 solver.cpp:243] Iteration 480000, loss = 5.46378
I1025 10:35:06.597018 22840 solver.cpp:259] Train net output #0: arm_loss = 3.11097 (* 1 = 3.11097 loss)
I1025 10:35:06.597024 22840 solver.cpp:259] Train net output #1: odm_loss = 3.3891 (* 1 = 3.3891 loss)
I1025 10:35:06.973990 22840 sgd_solver.cpp:138] Iteration 480000, lr = 1e-05

PS: If you train the RefineDet512_ResNet101_COCO model, every GPU must have more than 4 images (e.g., 5 images in our training stage) to keep the BN layer stable.

from refinedet.

zxt881108 avatar zxt881108 commented on August 27, 2024

Thx! Limit to the GPU memory, I set minibatch=2 for each GPU, maybe this is the main reason.

from refinedet.

XiongweiWu avatar XiongweiWu commented on August 27, 2024

@sfzhang15 hi, which gpu hardware and cuda/cudnn version are u using for training resnet101-512? I use P100 cards with 16G memory ,but can only holds at most 3 images.

from refinedet.

sfzhang15 avatar sfzhang15 commented on August 27, 2024

@XiongweiWu Hi, as said in footnote 7 in our paper, we use 4 M40 (24G) with cuda 8.0 and cudnn 6.0.

from refinedet.

sfzhang15 avatar sfzhang15 commented on August 27, 2024

@moyans
The following is our log in the end:
image

from refinedet.

moyans avatar moyans commented on August 27, 2024

@sfzhang15 thanks

from refinedet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.