hi, this is NO aggregation for current frame when training. but when testing, there is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

difference between training and inference about flow-guided-feature-aggregation HOT 14 CLOSED

msracver commented on September 15, 2024

difference between training and inference

from flow-guided-feature-aggregation.

Comments (14)

jeremy43 commented on September 15, 2024

Actually, current frame can be randomly sampled in the training process, when this occurs, there is an aggregation for current frame.

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

"current frame can be randomly sampled in the training process.". Is this in the current code? if is, where can I find it? thank you.

from flow-guided-feature-aggregation.

jeremy43 commented on September 15, 2024

you can find it in the function "get_triple_image" (in lib/utils/image.py)

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

but function "get_triple_image" is not used in project. and in 'train_end2end.py', aggregation is done for 2N frames( no current frame), but 2N+1 frames( including current frame) when testing. can you explain more? thank you.

from flow-guided-feature-aggregation.

jeremy43 commented on September 15, 2024

For the former question "get_triple_image is not used":
In the training process, we load training data in function "AnchorLoader"(in fgfa_rfcn/train_end2end.py), in function "AnchorLoader" (in fgfa_rfcn/core/loader.py), we use function "self.get_batch_individual()" to fill in provide_data and provide_label. Then in the 349^th line of function "self.get_batch_individual" , we refer to function "self.parfetch", in function "parfetch"(in fgfa_rfcn/core/loader.py line 357) we jump to function "get_rpn_triple_batch" (in lib/rpn/rpn.py line 102), there is the function "get_triple_image" . In that function, current frame can be randomly sampled.
For the latter question "no current frame is used in training process":
The current frame can be randomly sampled according to function "get_triple_image". :)

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

Thank for your answer. Now I have another question since I am not familiar with MXNet. The training network has three part inputs: data, data_before and data_after. But the batchsize is 2 when training, it even can not contain 3 frames. Is there a mechanism to train a part of network in MXNet? thank you.

from flow-guided-feature-aggregation.

jeremy43 commented on September 15, 2024

Actually, in the training process, one batch only deal with the current frame (the other two frames for reference), the detail can be referred from paper https://arxiv.org/abs/1703.10025 In other words, the composition of those three frames shall be handled in one batch.

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

You means that 1 batch contains three frames, and it contains six frames when batchsize is 2 ?

from flow-guided-feature-aggregation.

einsiedler0408 commented on September 15, 2024

@zhengzhugithub For training strategy, please refer to our paper. We use 4 GPUs during training, with each GPU holding one mini-batch. Loss function is applied on Eq.3. During training, temporal dropout was applied to avoid out-of-memory, i.e. we randomly sample 2 frames for feature aggregation.

from flow-guided-feature-aggregation.

einsiedler0408 commented on September 15, 2024

@zhengzhugithub Also you can refer to Table. 3 in our paper, which validated our temporal dropout trick, i.e. randomly sample 2 frames for training is enough.

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

@einsiedler0408 When there is only 1 GPUs, one mini-batch only contains 2 frames. But the training network needs 3 frames at least, how to deal with it? thank you

from flow-guided-feature-aggregation.

einsiedler0408 commented on September 15, 2024

@zhengzhugithub Please refer to Eq.2 and Eq.4 in our paper. The adaptive weight f_(j->i) for aggregated f_i is proportional to cosine(f_(j->i)^e, f_i^e).

from flow-guided-feature-aggregation.

einsiedler0408 commented on September 15, 2024

@zhengzhugithub So the answer is that we do not need three frames during training, but we need f_i (features from the current frame) to compute the adaptive weight.

from flow-guided-feature-aggregation.

zhengzhugithub commented on September 15, 2024

thank you. problem sovled.

from flow-guided-feature-aggregation.

difference between training and inference about flow-guided-feature-aggregation HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent