Code Monkey home page Code Monkey logo

Comments (7)

fangruizhu avatar fangruizhu commented on September 26, 2024

Thanks for the attention! The code will be released within this month.
About the training, (1) five previous frames are used as reference, same as [7]; (2) yes, all reference frames are used to reconstruct the query at the same time, with a big affinity matrix computed and softmax applied afterwards.

from self_sup_semivos.

WeidiXie avatar WeidiXie commented on September 26, 2024

@veizgyauzgyauz
If you want to get the rough idea, please use this code for the first, https://github.com/zlai0/MAST

from self_sup_semivos.

veizgyauzgyauz avatar veizgyauzgyauz commented on September 26, 2024

Thanks! @fangruizhu @WeidiXie

I am just curious about the application of momentum update in the VOS field. It seems that for each target frame, you choose K previous frames from the same video sequence as reference and treat them as the frames from the memory bank. In a forward pass, update the key encoder using momentum update first and then reconstruct the target frame. In the next iteration, the reference frames in the last iteration will not be used again. Instead, previous frames from the current video sequence serves as the memory frames. However, the keys in the memory dictionary are independent (the images are taken from different scenes) and are used for several iterations until the are dequeued in MoCo, while the keys from the memory bank have consistency and are dequeued after one iteration in VOS. Will it damage the smoothness of the key encoder?

I tried to implement the momentum memory mechanism into MAST. Everything is the same as MAST except that I apply the momentum update on the key encoder during finetuning. The pairwise training went smoothly and similar results were obtained as MAST. But at the finetuning stage, the training loss kept flutuating. Besides, I achieved 0.57 J&F-mean, 0.56 J-mean, and 0.60 F-mean which are lower than the ones before finetuning. I really want to figure it out.

from self_sup_semivos.

fangruizhu avatar fangruizhu commented on September 26, 2024

@veizgyauzgyauz Hi, our momentum update is quite different with MoCo, where frames in the memory bank are changed constantly, different among iterations. Since our task does not aim for learning instance discrimination, but for the matching ability of the key and query encoder, which may not be disturbed by the changing memory. Also, a relative large momentum value helps maintain the smoothness. We use two encoders (key and query) during training, where params. of the query encoder are always updated with BP and the key encoder with momentum update. Besides, in this part https://github.com/zlai0/MAST/blob/master/models/colorizer.py we take all pixels to compute similarity and apply softmax.

from self_sup_semivos.

veizgyauzgyauz avatar veizgyauzgyauz commented on September 26, 2024

Do you mean that you use a global attention rather than a restricted attention during training and inference? I think it may introduce a high meomry usage and, not surprisingly, RuntimeError: CUDA out of memory. occurred in the inference! Could you plz tell me what GPU you use? Maybe I need to estimate whether my machine can manage it. Thanks! @fangruizhu

from self_sup_semivos.

fangruizhu avatar fangruizhu commented on September 26, 2024

Hi, the size of the affinity matrix during training is B x (96x96) x (25x25), where 96 is the size of feature map, 25 the window size. 16G V100 is enough for training, with 6 images per gpu. At inference, due to the use of images with full-resolution (same as MAST), a 32G V100 is needed.

from self_sup_semivos.

bo-miao avatar bo-miao commented on September 26, 2024

Hi,

Great job! May I know when the code will be released?

Thanks.

from self_sup_semivos.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.