Code Monkey home page Code Monkey logo

dstt's People

Contributors

ruiliu-ai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dstt's Issues

About GPU issues

Hi Thanks for sharing your work, I have the following problem:

We have 6 2080ti graphics cards, but the following error will be reported when running:
RuntimeError: CUDA error: out of memory

image
image

Questions about pos_embedding

Hello,

This is such a great work and thanks for sharing the codes!
I have a question about why there is no pos_embedding in the codes while transformer is not aware of the temporal orders of inputs. Please give me any hints, thanks!

Best,
Kejie

License

Hey, what is the license of the repo?

About HierarchyEncoder

Hi, in the paper, it is stated that the interaction between different scale feature maps is isolated by group convolution to preserve the spatial structure. In theory, x0 and out0 should be spliced directly without grouping. However, in the code, the Fj and F1 layer feature maps are grouped before the channel dimension concat. Does this operation lead to the information interaction between different scale feature maps?

def forward(self, x):
    bt, c, h, w = x.size()
    out = x
    for i, layer in enumerate(self.layers):
        if i % 2 == 0 and i != 0: 
            g = self.group[i//2]
            x0 = x.view(bt, g, -1, h, w)
            out0 = out.view(bt, g, -1, h, w)
            out = torch.cat([x0, out0], 2).view(bt, -1, h, w) 
        out = layer(out) 
    return out

Question about the inference speed.

Hi, friend.
I got different results compared with the Figure 1 in your paper when I test the inference speed of the models.
For STTN, I got a result about 11 FPS.
Could you tell me how you test it?

Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed

when i run test.py,it return an error , just like this.
python3 test.py -c *** -v *** -m ***
then it return error:
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [222,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "test.py", line 162, in
main_worker()
File "test.py", line 135, in main_worker
pred_img = model(masked_imgs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, **kwargs)
File "/ProjectRoot/test/Video_inpainting/DSTT-master/model/DSTT.py", line 144, in forward
enc_feat = self.encoder(masked_frames.view(b
t, c, h, w))
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Input resolutions other than 432x240

I am trying to test your work with input images of resolution 640x448 but I keep getting the following error:

File "/home/cosmos/AI/DSTT/model/DSTT.py", line 241, in forward
key = key.view(b, t, 2, self.h//2, 2, self.w//2, self.head, c_h)
RuntimeError: shape '[1, 11, 2, 10, 2, 18, 4, 128]' is invalid for input of size 11556864

Is it only possible to use 432x240 input images with the pre-trained model?
If so, would I need to train a new model specifically for the 640x448 resolution?
I also tried to change the resolution in youtube-vos.json and run the train.py script but I get a similar error there.

Can you explain how to make it work for resolutions other than 432x240?

Thank you!

Generate Mask for custom video

Hi, thanks for the code!

Any code, technique or framework recommendation to perform the necessary segmentation/mask (like --mask examples/schoolgirls) as input for your algorithm? Thanks in advance!

Some question about the paper!

In your paper Section 3.2, split the F in the s^2 zones, Then total number is t * s^2 * n, why this number need to * n , should the number is t * s^2?
Looking forward your reply

how does this network realize self-training?

the paper uses most of the parts to show the generator network. but because of lacking ground-truth, we need self-training to realize objective moving? so I wonder how to realize self-training? GAN is supervised learning as I know.
tks for answer!

question about the dataset

sorry for bother u, but i can't download the dataset from google driver.
And I tried download dataset from Onedrive then meet the issue "CSC error".
So i wonder that if you have any other way for us to download the YouTube-VOS dataset. thank u.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.