Comments (7)
Thanks for the attention! The code will be released within this month.
About the training, (1) five previous frames are used as reference, same as [7]; (2) yes, all reference frames are used to reconstruct the query at the same time, with a big affinity matrix computed and softmax applied afterwards.
from self_sup_semivos.
@veizgyauzgyauz
If you want to get the rough idea, please use this code for the first, https://github.com/zlai0/MAST
from self_sup_semivos.
Thanks! @fangruizhu @WeidiXie
I am just curious about the application of momentum update in the VOS field. It seems that for each target frame, you choose K previous frames from the same video sequence as reference and treat them as the frames from the memory bank. In a forward pass, update the key encoder using momentum update first and then reconstruct the target frame. In the next iteration, the reference frames in the last iteration will not be used again. Instead, previous frames from the current video sequence serves as the memory frames. However, the keys in the memory dictionary are independent (the images are taken from different scenes) and are used for several iterations until the are dequeued in MoCo, while the keys from the memory bank have consistency and are dequeued after one iteration in VOS. Will it damage the smoothness of the key encoder?
I tried to implement the momentum memory mechanism into MAST. Everything is the same as MAST except that I apply the momentum update on the key encoder during finetuning. The pairwise training went smoothly and similar results were obtained as MAST. But at the finetuning stage, the training loss kept flutuating. Besides, I achieved 0.57 J&F-mean, 0.56 J-mean, and 0.60 F-mean which are lower than the ones before finetuning. I really want to figure it out.
from self_sup_semivos.
@veizgyauzgyauz Hi, our momentum update is quite different with MoCo, where frames in the memory bank are changed constantly, different among iterations. Since our task does not aim for learning instance discrimination, but for the matching ability of the key and query encoder, which may not be disturbed by the changing memory. Also, a relative large momentum value helps maintain the smoothness. We use two encoders (key and query) during training, where params. of the query encoder are always updated with BP and the key encoder with momentum update. Besides, in this part https://github.com/zlai0/MAST/blob/master/models/colorizer.py we take all pixels to compute similarity and apply softmax.
from self_sup_semivos.
Do you mean that you use a global attention rather than a restricted attention during training and inference? I think it may introduce a high meomry usage and, not surprisingly, RuntimeError: CUDA out of memory. occurred in the inference! Could you plz tell me what GPU you use? Maybe I need to estimate whether my machine can manage it. Thanks! @fangruizhu
from self_sup_semivos.
Hi, the size of the affinity matrix during training is B x (96x96) x (25x25), where 96 is the size of feature map, 25 the window size. 16G V100 is enough for training, with 6 images per gpu. At inference, due to the use of images with full-resolution (same as MAST), a 32G V100 is needed.
from self_sup_semivos.
Hi,
Great job! May I know when the code will be released?
Thanks.
from self_sup_semivos.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from self_sup_semivos.