seoungwugoh / ivs-demo Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 9.0 2.24 MB

Demo for "Fast User-Guided Video Object Segmentation by Interaction-and-Propagation Networks", CVPR 2019.

Python 100.00%

ivs-demo's People

Stargazers

Watchers

Forkers

hosseinjavidnia shleecs templeblock joonyoung-cv mornydew ericustc killsking joaogjorge zyy-cn

ivs-demo's Issues

Details about training

Hi:

Thanks for sharing this code. Based on this code I'm trying to reproduce the results reported in table.2, which the model is trained only on DAVIS 17. The reported AUC=0.555 and J@60=0.589. However in my reproduction, only AUC=0.496 and J@60=0.521 achieved:

time: [0.00, 4.64, 9.63, 14.19, 18.32, 22.38, 26.16, 29.85, 33.67, 102.02]
J: [0.00, 39.86, 45.37, 48.06, 49.39, 50.31, 50.93, 51.56, 52.08, 52.08]

Since I don't know the difference between my implementation and yours, I present my training details for review:

data prepare:
1.1 split each sequence in DAVIS17 into multi new sequence, each new sequence contains only ONE target object. The total number of new training sequences is 144.
1.2 calculate current max_skip_step, max_skip_step is linearly increased from 4 to 8. (increase 1 at every 20 epochs and fixed to 8 after 80 epochs)
1.3 sample 8 frames with a fixed interval according to current skip_step, which is a random integer in [4, max_skip_step].
1.4 data augment
1.4.1 Resize: resize the shortest edge to 480 and keep the ratio of image.
1.4.2 RandomCrop: random crop the same 400x400 location in all 8 frames and make sure the new sequence contains the target object.
1.4.3 RandomAffine: scale=(0.9, 1.1), shear=(-15, 15), rotate=(-25, 25)
1.4.4 RandomContrast
1.4.5 AdditiveNoise
1.4.6 RandomMirror
training details:
2.1 calculate max_num_interaction. (max_num_interaction = 1 if skip_step < 5, max_num_interaction = 2 if skip_step < 7, max_num_interaction = 3 if skip_step >= 7)
2.2 random select a frame in the first round to interact with scribble based on robot proposed on Davisinteractive framework. In the following rounds, the worst segmented frame will be used.
2.3 infer the current intermediate estimations based on current scribble, previous estimations and aggregated feature(if available)
2.4 collect CE losses from multi-scale decoder outputs ((256,256), (64,64), (32,32), (16,16), (8,8)) of interaction-net and propagation-net for all the intermediate estimations and perform back-propagation.
2.5 clear all the gradient and the graph before next round.
2.6 soft-aggregation post-processing is not performed since the training sample only contains a single target object.
2.7 the number of training epoch is set to 2000, each epoch has 144 training samples, each sample contains 8 frames, and each frame contains one target object (at most).
2.8 we use SGD to optimize all the parameters in interaction and propagation networks except all the BN layers. The learning rate is fixed to 5e-5 and momentum = 0.9. The training batch is set to 1.
2.9 we initialize the model with the weight pre-trained on ImageNet.
2.10 we use a single V100 GPU to train and evaluate this model.

Are these training details correct?

Update model.py - CUDA Out of memory

The volatile flag is deprecated. In the latest stable release (0.4.0) you should change the "model.py" by adding the following to lines 98-99:

with torch.no_grad():
    self.Prop_forward(target, right_end)
    self.Prop_backward(target, left_end)

Link to commit:
hosseinjavidnia@2bbf2d9

Some question about "3.3. Testing Scheme"

Dear Sir:
In "3.3. Testing Scheme",There is a saying that "we propagate the object mask until we reach a frame in which user annotations were given in any previous rounds."
I don't understand why the weight should get inverse when we reach the frame where our annotions were given before, just like the graph in Fig.4.
Are there some rofound meaning behind it？Or maybe I have a misunderstanding？
Looking forward to your reply！Thank you!

Input to the propagation network

Hi, in the paper, you say that: "The two object masks are represented with probabilities and the neutral mask is used if the mask is not available."(Section 3.1 Propagation Network).

However, in the code you write:

def Prop_forward(self, target, end):
        for n in range(target+1, end+1):  #[1,2,...,N-1]
            print('[MODEL: propagation network] >>>>>>>>> {} to {}'.format(n-1, n))
            self.all_E[:,n], _, self.next_a_ref = self.model_P(self.ref, self.a_ref, self.all_F[:,:,n], self.prev_E[:,n], torch.round(self.all_E[:,n-1]), self.dummy_M, [1,0,0,0,0])

It seems that you take the probability map(self.prev_E[:,n]) obtained at the previous round and the binary mask(torch.round(self.all_E[:,n-1])) obtained at the previous frame as input to the propagation network. I tried to reproduce the results presented in the paper using your code and the pre-trained model. I would like to know what input should I use.

Some details about this network.

Hello, I'm interested in your work. And I have some questions about this network.

During pre-training, did you utilize the scribbles of the synthetic image data, too? Was the full network optimized?
During training, this article mentioned that "N is gradually increased from 4 to 8" and " the number of rounds also grows from 1 to 3", how to grow?
Thank you very much. Could you provide more code about the training process?

Could not load the Qt platform plugin "xcb" in ""

(ivs) administrator@ubuntu:~/ivs-demo-master$ python gui.py -seq camel
Interaction Network: initialized
Propagation Network: initialized
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.

已放弃 (核心已转储)

please help me how to deal with this error.thanks.

Propagation indexing

The indexing in propagation function, Line 102 is wrong:

self.all_E[:,:,f] = weight[f] * self.all_E[:,:,f] + (1-weight[f]) * self.prev_E[:,:,f]

Has to be replaced with:
self.all_E[:, f, :, :] = weight[f] * self.all_E[:, f, :, :] + (1 - weight[f]) * self.prev_E[:, f, :, :]

seoungwugoh / ivs-demo Goto Github PK

ivs-demo's People

Stargazers

Watchers

Forkers

ivs-demo's Issues

Details about training

Update model.py - CUDA Out of memory

Some question about "3.3. Testing Scheme"

Input to the propagation network

Some details about this network.

Could not load the Qt platform plugin "xcb" in ""

Propagation indexing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent