Code Monkey home page Code Monkey logo

ivs-demo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ivs-demo's Issues

Details about training

Hi:

Thanks for sharing this code. Based on this code I'm trying to reproduce the results reported in table.2, which the model is trained only on DAVIS 17. The reported AUC=0.555 and J@60=0.589. However in my reproduction, only AUC=0.496 and J@60=0.521 achieved:

time: [0.00, 4.64, 9.63, 14.19, 18.32, 22.38, 26.16, 29.85, 33.67, 102.02]
J: [0.00, 39.86, 45.37, 48.06, 49.39, 50.31, 50.93, 51.56, 52.08, 52.08]

Since I don't know the difference between my implementation and yours, I present my training details for review:

  1. data prepare:
    1.1 split each sequence in DAVIS17 into multi new sequence, each new sequence contains only ONE target object. The total number of new training sequences is 144.
    1.2 calculate current max_skip_step, max_skip_step is linearly increased from 4 to 8. (increase 1 at every 20 epochs and fixed to 8 after 80 epochs)
    1.3 sample 8 frames with a fixed interval according to current skip_step, which is a random integer in [4, max_skip_step].
    1.4 data augment
    1.4.1 Resize: resize the shortest edge to 480 and keep the ratio of image.
    1.4.2 RandomCrop: random crop the same 400x400 location in all 8 frames and make sure the new sequence contains the target object.
    1.4.3 RandomAffine: scale=(0.9, 1.1), shear=(-15, 15), rotate=(-25, 25)
    1.4.4 RandomContrast
    1.4.5 AdditiveNoise
    1.4.6 RandomMirror
  2. training details:
    2.1 calculate max_num_interaction. (max_num_interaction = 1 if skip_step < 5, max_num_interaction = 2 if skip_step < 7, max_num_interaction = 3 if skip_step >= 7)
    2.2 random select a frame in the first round to interact with scribble based on robot proposed on Davisinteractive framework. In the following rounds, the worst segmented frame will be used.
    2.3 infer the current intermediate estimations based on current scribble, previous estimations and aggregated feature(if available)
    2.4 collect CE losses from multi-scale decoder outputs ((256,256), (64,64), (32,32), (16,16), (8,8)) of interaction-net and propagation-net for all the intermediate estimations and perform back-propagation.
    2.5 clear all the gradient and the graph before next round.
    2.6 soft-aggregation post-processing is not performed since the training sample only contains a single target object.
    2.7 the number of training epoch is set to 2000, each epoch has 144 training samples, each sample contains 8 frames, and each frame contains one target object (at most).
    2.8 we use SGD to optimize all the parameters in interaction and propagation networks except all the BN layers. The learning rate is fixed to 5e-5 and momentum = 0.9. The training batch is set to 1.
    2.9 we initialize the model with the weight pre-trained on ImageNet.
    2.10 we use a single V100 GPU to train and evaluate this model.

Are these training details correct?

Update model.py - CUDA Out of memory

The volatile flag is deprecated. In the latest stable release (0.4.0) you should change the "model.py" by adding the following to lines 98-99:

with torch.no_grad():
    self.Prop_forward(target, right_end)
    self.Prop_backward(target, left_end)

Link to commit:
hosseinjavidnia@2bbf2d9

Some question about "3.3. Testing Scheme"

Dear Sir:
In "3.3. Testing Scheme",There is a saying that "we propagate the object mask until we reach a frame in which user annotations were given in any previous rounds."
I don't understand why the weight should get inverse when we reach the frame where our annotions were given before, just like the graph in Fig.4.
Are there some rofound meaning behind it?Or maybe I have a misunderstanding?
Looking forward to your reply!Thank you!

image

Input to the propagation network

Hi, in the paper, you say that: "The two object masks are represented with probabilities and the neutral mask is used if the mask is not available."(Section 3.1 Propagation Network).

However, in the code you write:

def Prop_forward(self, target, end):
        for n in range(target+1, end+1):  #[1,2,...,N-1]
            print('[MODEL: propagation network] >>>>>>>>> {} to {}'.format(n-1, n))
            self.all_E[:,n], _, self.next_a_ref = self.model_P(self.ref, self.a_ref, self.all_F[:,:,n], self.prev_E[:,n], torch.round(self.all_E[:,n-1]), self.dummy_M, [1,0,0,0,0])

It seems that you take the probability map(self.prev_E[:,n]) obtained at the previous round and the binary mask(torch.round(self.all_E[:,n-1])) obtained at the previous frame as input to the propagation network. I tried to reproduce the results presented in the paper using your code and the pre-trained model. I would like to know what input should I use.

Some details about this network.

Hello, I'm interested in your work. And I have some questions about this network.

  1. During pre-training, did you utilize the scribbles of the synthetic image data, too? Was the full network optimized?
  2. During training, this article mentioned that "N is gradually increased from 4 to 8" and " the number of rounds also grows from 1 to 3", how to grow?
    Thank you very much. Could you provide more code about the training process?

Could not load the Qt platform plugin "xcb" in ""

(ivs) administrator@ubuntu:~/ivs-demo-master$ python gui.py -seq camel
Interaction Network: initialized
Propagation Network: initialized
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.

已放弃 (核心已转储)

please help me how to deal with this error.thanks.

Propagation indexing

The indexing in propagation function, Line 102 is wrong:

self.all_E[:,:,f] = weight[f] * self.all_E[:,:,f] + (1-weight[f]) * self.prev_E[:,:,f]

Has to be replaced with:
self.all_E[:, f, :, :] = weight[f] * self.all_E[:, f, :, :] + (1 - weight[f]) * self.prev_E[:, f, :, :]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.