The dstt from ruiliu-ai

About GPU issues

Hi Thanks for sharing your work, I have the following problem:

We have 6 2080ti graphics cards, but the following error will be reported when running:
RuntimeError: CUDA error: out of memory

This is such a great work and thanks for sharing the codes!
I have a question about why there is no pos_embedding in the codes while transformer is not aware of the temporal orders of inputs. Please give me any hints, thanks!

Best,
Kejie

License

Hey, what is the license of the repo?

About HierarchyEncoder

Hi, in the paper, it is stated that the interaction between different scale feature maps is isolated by group convolution to preserve the spatial structure. In theory, x0 and out0 should be spliced directly without grouping. However, in the code, the Fj and F1 layer feature maps are grouped before the channel dimension concat. Does this operation lead to the information interaction between different scale feature maps?

def forward(self, x):
    bt, c, h, w = x.size()
    out = x
    for i, layer in enumerate(self.layers):
        if i % 2 == 0 and i != 0: 
            g = self.group[i//2]
            x0 = x.view(bt, g, -1, h, w)
            out0 = out.view(bt, g, -1, h, w)
            out = torch.cat([x0, out0], 2).view(bt, -1, h, w) 
        out = layer(out) 
    return out

I just came into contact with the research direction of video inpainting recently. The test sets of Davis and YouTube-VOS only correspond to one mask for each video. How did you use these data sets to conduct the test?

Question about the inference speed.

Hi, friend.
I got different results compared with the Figure 1 in your paper when I test the inference speed of the models.
For STTN, I got a result about 11 FPS.
Could you tell me how you test it?

How to align the consecutive frames or patches?

Nice work! I hope to know whether use alignment, such as optical flow or affine transformation to these image patches?

Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed

when i run test.py,it return an error , just like this.
python3 test.py -c *** -v *** -m ***
then it return error:
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [222,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "test.py", line 162, in
main_worker()
File "test.py", line 135, in main_worker
pred_img = model(masked_imgs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(input, **kwargs)
File "/ProjectRoot/test/Video_inpainting/DSTT-master/model/DSTT.py", line 144, in forward
enc_feat = self.encoder(masked_frames.view(bt, c, h, w))
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

When you will release your code

Can you please point out the download links of the dataset you use?

Hi author,
Seems no YouTube-VOS download link from https://competitions.codalab.org/competitions/19544 and too many DAVIS download links from https://davischallenge.org/davis2017/code.html
Can you please give us a concrete hint about it?

Input resolutions other than 432x240

I am trying to test your work with input images of resolution 640x448 but I keep getting the following error:

File "/home/cosmos/AI/DSTT/model/DSTT.py", line 241, in forward
key = key.view(b, t, 2, self.h//2, 2, self.w//2, self.head, c_h)
RuntimeError: shape '[1, 11, 2, 10, 2, 18, 4, 128]' is invalid for input of size 11556864

Is it only possible to use 432x240 input images with the pre-trained model?
If so, would I need to train a new model specifically for the 640x448 resolution?
I also tried to change the resolution in youtube-vos.json and run the train.py script but I get a similar error there.

Can you explain how to make it work for resolutions other than 432x240?

Thank you!

Algorithm output format (mp4, other, etc)

Hi, thanks for the code. Can I modify the output format, or should I transform it post algorithm? (from mp4 to png, for example?)

How to align the consecutive frames or patches?

Some question about the inference speed.

Generate Mask for custom video

Hi, thanks for the code!

Any code, technique or framework recommendation to perform the necessary segmentation/mask (like --mask examples/schoolgirls) as input for your algorithm? Thanks in advance!

Some question about the paper!

In your paper Section 3.2, split the F in the s^2 zones, Then total number is t * s^2 * n, why this number need to * n , should the number is t * s^2?
Looking forward your reply

is the frame the ground truth? but the paper says there is no groundtruth for training

DSTT/core/trainer.py

Line 252 in 0b16ff1

hole_loss = self.l1_loss(pred_img*masks, frames*masks)

ruiliu-ai / dstt Goto Github PK

dstt's People

Contributors

Stargazers

Watchers

Forkers

dstt's Issues

Recommend Projects

Recommend Topics

Recommend Org