Code Monkey home page Code Monkey logo

Comments (20)

ZhihuaGao avatar ZhihuaGao commented on September 15, 2024 1

I think the warping sequence before or after embedding don't matter.
Because the warping operating do not contain any learning parameters.
My personal opinion,hope to help U.

from flow-guided-feature-aggregation.

FCInter avatar FCInter commented on September 15, 2024

According to my test case, I'm afraid it really matters, because when I build the training network and load the test checkpoint, the model does not converge very well. Moreover, though the warping operation does not contain parameters, it changes the feature map. That is, performing warping first and then embedding, yields very different feature map, compared with embedding first and then warping.

from flow-guided-feature-aggregation.

ZhihuaGao avatar ZhihuaGao commented on September 15, 2024

Really? I have train and test the network, it works well......
Could U show your logs?

from flow-guided-feature-aggregation.

FCInter avatar FCInter commented on September 15, 2024

@aresgao I have updated the issue. I posted the printed logs during training process. The problem is that I cannot get good results when I continue to train from the demo checkpoint provided in the README. The demo checkpoint yields very good results, but when I continue training from this checkpoint, the results become terrible. Though I only trained for 4k iterations, I would believe that, since the initial checkpoint is pretty good, I do not need to train it for that many iterations.
BTW, I'm curious about why we are suggested to train the model from the checkpoint of ResNet-101 and FlowNet, instead of directly train from the demo checkpoint?
I also tried to train from the checkpoint of ResNet-101 and FlowNet for 100k+ iterations, the performance was even worse.

Thank you for your patience and kindness in helping me!

from flow-guided-feature-aggregation.

ZhihuaGao avatar ZhihuaGao commented on September 15, 2024

That's really strange, I train from the checkpoint of ResNet-101 and FlowNet for 100k+ iterations, the performance was even better, here is test results.
motion [0.0 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7648motion [0.0 0.7], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.5727motion [0.7 0.9], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7515motion [0.9 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.8444.

from flow-guided-feature-aggregation.

txf201604 avatar txf201604 commented on September 15, 2024

@aresgao Can you help me ?
I have a problem about "sh ./init.sh".
Traceback (most recent call last):
File "setup_linux.py", line 63, in
CUDA = locate_cuda()
File "setup_linux.py", line 58, in locate_cuda
for k, v in cudaconfig.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
If youIf you can reply me in time, I will be very grateful.

from flow-guided-feature-aggregation.

FCInter avatar FCInter commented on September 15, 2024

@aresgao What version of mxnet are you using? I was wondering if it's caused by the version, since I got a bug because of the wrong version I was using.

from flow-guided-feature-aggregation.

ZhihuaGao avatar ZhihuaGao commented on September 15, 2024

I use the latest version of mxnet @FCInter

from flow-guided-feature-aggregation.

ZhihuaGao avatar ZhihuaGao commented on September 15, 2024

@txf201604
the func locate_cuda() finds where your cuda installed, I thought u might check your cuda location

def locate_cuda():
    """Locate the CUDA environment on the system
    Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64'
    and values giving the absolute path to each directory.
    Starts by looking for the CUDAHOME env variable. If not found, everything
    is based on finding 'nvcc' in the PATH.
    """

from flow-guided-feature-aggregation.

txf201604 avatar txf201604 commented on September 15, 2024

from flow-guided-feature-aggregation.

txf201604 avatar txf201604 commented on September 15, 2024

from flow-guided-feature-aggregation.

FCInter avatar FCInter commented on September 15, 2024

@aresgao Finally I got good results after training for 2 complete epoch!!!

I just have one last question. I find that when saving checkpoint at the end of each epoch, the following codes are used to create two new weights, namely rfcn_bbox_weight_test and rfcn_bbox_bias_test.

arg['rfcn_bbox_weight_test'] = weight * mx.nd.repeat(mx.nd.array(stds), repeats=repeat).reshape((bias.shape[0], 1, 1, 1))
arg['rfcn_bbox_bias_test'] = arg['rfcn_bbox_bias'] * mx.nd.repeat(mx.nd.array(stds), repeats=repeat) + mx.nd.repeat(mx.nd.array(means), repeats=repeat)

Why do we need to do this?
I have tested that if I do not do this, the checkpoint will make terrible predictions on the test data. This is also the reason why my previous predictions are bad, though the training loss looked good.
Thank you!

from flow-guided-feature-aggregation.

samanthawyf avatar samanthawyf commented on September 15, 2024

Hi, @aresgao @FCInter @YuwenXiong , I tried the training and inference of the code. I used 4 gpus and all the setting is not changed, and the final mAP is 75.78. I am confused about the drop of mAP. Do you change any setting or do you have some advice on my case?

from flow-guided-feature-aggregation.

Feywell avatar Feywell commented on September 15, 2024

That's really strange, I train from the checkpoint of ResNet-101 and FlowNet for 100k+ iterations, the performance was even better, here is test results.
motion [0.0 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7648motion [0.0 0.7], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.5727motion [0.7 0.9], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7515motion [0.9 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.8444.

@aresgao Hi~ I just have one GPU(1080TI), and only get mAP=0.7389 test by default setting.
How can you get better mAP? Can you tell us your setting detail?
Such as epochs, min_diff/max_diff, lr, gpus, test key_frame and so on...
Thank you!

from flow-guided-feature-aggregation.

withinnoitatpmet avatar withinnoitatpmet commented on September 15, 2024

@Feywell Hi Feywell, I have test the default setting with 2 GPU and 4 GPU, the result of 4 GPU is much better than 2. PS. lr = 0.00025 is equivalent to paper described 0.001. You could find more details in their code.

from flow-guided-feature-aggregation.

Feywell avatar Feywell commented on September 15, 2024

@withinnoitatpmet Thank you! So, if I just have one GPU , setting lr = 0.001, it will be better?

from flow-guided-feature-aggregation.

withinnoitatpmet avatar withinnoitatpmet commented on September 15, 2024

@Feywell I think the result could be even worse. Considering the relation between batch size and lr (idk if it is valid for small this batch size), lr should be 0.00025.

from flow-guided-feature-aggregation.

jucaowei avatar jucaowei commented on September 15, 2024

That's really strange, I train from the checkpoint of ResNet-101 and FlowNet for 100k+ iterations, the performance was even better, here is test results.
motion [0.0 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7648motion [0.0 0.7], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.5727motion [0.7 0.9], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.7515motion [0.9 1.0], area [0.0 0.0 100000.0 100000.0]Mean [email protected] = 0.8444.

hi,i want to know how many epochs exactly you set to train the model, i train this model for
2 epochs,and get a result aboult 73.16% ,and why paper always talk about iterration not epochs,
i wish to hearing from you ,thank you

from flow-guided-feature-aggregation.

jucaowei avatar jucaowei commented on September 15, 2024

@aresgao
hi,i want to know how many epochs exactly you set to train the model, i train this model for
2 epochs,and get a result aboult 73.16% ,and why paper always talk about iterration not epochs,
i wish to hearing from you ,thank you

from flow-guided-feature-aggregation.

Feywell avatar Feywell commented on September 15, 2024

@aresgao Finally I got good results after training for 2 complete epoch!!!

I just have one last question. I find that when saving checkpoint at the end of each epoch, the following codes are used to create two new weights, namely rfcn_bbox_weight_test and rfcn_bbox_bias_test.

arg['rfcn_bbox_weight_test'] = weight * mx.nd.repeat(mx.nd.array(stds), repeats=repeat).reshape((bias.shape[0], 1, 1, 1))
arg['rfcn_bbox_bias_test'] = arg['rfcn_bbox_bias'] * mx.nd.repeat(mx.nd.array(stds), repeats=repeat) + mx.nd.repeat(mx.nd.array(means), repeats=repeat)

Why do we need to do this?
I have tested that if I do not do this, the checkpoint will make terrible predictions on the test data. This is also the reason why my previous predictions are bad, though the training loss looked good.
Thank you!

Hi, @FCInter Do you know why there is arg['rfcn_bbox_weight_test'] here?
I try to change detection network to light-head, so I do not keep the arg['rfcn_bbox_weight_test'] .
but I get a bad result. Do you know what the meaning of arg['rfcn_bbox_weight_test'] is?

from flow-guided-feature-aggregation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.