layneh / greenmim Goto Github PK

View Code? Open in Web Editor NEW

167.0 167.0 6.0 1.42 MB

[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.

License: Other

Python 93.35% Shell 6.65%

efficient-deep-learning masked-image-modeling pytorch self-supervised-learning vision-transformer

greenmim's People

Contributors

Stargazers

Watchers

Forkers

23119841 dl-vit cvsch mldl marcosncosta1 dl-mae

greenmim's Issues

MSSP-I is strongly NP-hard

Note: MSSP in following screen shot is actually MSSP-I

Caprara, Alberto; Kellerer, Hans; Pferschy, Ulrich (2000). The Multiple Subset Sum Problem. SIAM Journal on Optimization, 11(2), 308–319. doi:10.1137/s1052623498348481

How can a strongly NP-hard problem be solved in pseudo-polynomial time?

How to run this slurm code

Hi, I haven't used slum. Can you tell me how to configure the environment and what's the demo for running scripts?

How can l use the weights trained by GreenMIM as the pre-trained weights for official swing transformer?

Hi ,Thanks for the great work!
I'v tried GreenMIM on my data and it really worked well(I have visualized the masked patch generated by the model). And then i want to use this weights as official swing transformer classification work pre-trained weights but i find these weights are not in GreenMIM :
'head.weight', 'head.bias', 'layers.0.blocks.1.attn_mask', 'layers.1.blocks.1.attn_mask', 'layers.2.blocks.1.attn_mask',
'layers.2.blocks.3.attn_mask', 'layers.2.blocks.5.attn_mask', 'layers.2.blocks.7.attn_mask', 'layers.2.blocks.9.attn_mask',
'layers.2.blocks.11.attn_mask', 'layers.2.blocks.13.attn_mask', 'layers.2.blocks.15.attn_mask', 'layers.2.blocks.17.attn_mask'
I've simply use official swing transformer init code as follow:
def init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal(m.weight, std=.02)
if isinstance(m, nn.Linear) and m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.LayerNorm):
nn.init.constant_(m.bias, 0)
nn.init.constant_(m.weight, 1.0)
Are there any better ways? Many thanks! I would be highly appreciated.

How can l install the 'mc' modules which imported in the 'base_dataset.py' ?

Hi :
Which packages should be installed when import the 'mc' module as in the 'base_dataset.py' ? I've not found the packages requirments file.
Many thanks!

Visualize the results after training is wrong

Visualize the results after training, the results are blurry, and the part without the mask has changed

About different random masks for each sample in a batch

Thanks for the great work!
I notice that GreenMIN requires the same mask in a batch for training efficiency. I wonder how much the sample-wise mask would affect the performance. Could you please give some quantitive results?
Many thanks! I would be highly appreciated.

How to run code on multiple machines？

Wonderful job! I wonder how can I run the code in multiple machines?

loss is nan, stopping training!

Wonderful job! When I try to train the Swin-B for 800 epochs, I meet this problem, 'loss is nan, stopping training'. But I find the loss value has no question. If I skip this error, the loss will keep as nan forever. Would you have some suggestions for this problem? Thanks very much!

About Optimal Grouping with Dynamic Programming

Thanks for the great work!
I would like to know that where the Optimal Grouping with Dynamic Programming process being conducted in your code?
Hope for your reply.

Did you try pretraining swin-tiny？

For the GPU memory restriction, I could use swin-tiny as my backbone. Did you try pretraining swin-tiny with GreenMIM? I tried it, but it seems get bad performance, even worse than training from scratch when I did the downstream task.

what does "ratio" mean in PatchMerging?

Thanks to your impressive work! However i met some problem.
In the models_swin.py, line 244:

ratio = H // 7 if H % 7 == 0 else H // 6   # FIXME
x = x.view(B, -1, ratio//2, 2, ratio//2, 2, C)
x = x.permute(0, 1, 2, 4, 3, 5, 6).reshape(B, L//4, 4 * C)

why you view x into (B, -1, ratio//2, 2, ratio//2, 2, C) and then reshape back to (B, L//4, 4 * C) ? Why not reshape x to (B, H//2, 2, W//2, 2, C), and then reshape it to (B, L//4, 4 * C) ?
I found it when i change the image resolution, and don't know why.

about mask in relative bias table

Hello! Thanks for your interesting work! I have some doubt about the mask in relative bias table: if we already have attn+mask, why still need to multiply mask and res_pos: relative_position_bias = relative_position_bias * rel_pos_mask.view(-1, N, N, 1) in window group attention?