yuhuan-wu / p2t Goto Github PK

View Code? Open in Web Editor NEW

198.0 198.0 18.0 168 KB

[TPAMI22] Pyramid Pooling Transformer for Scene Understanding

Python 99.32% Shell 0.68%

p2t's Introduction

Hi there 👋

p2t's People

Contributors

Stargazers

Watchers

Forkers

mathpopo junpuwang-sub fredzxx cv-ip mymuli abandonsea dl-vit sentirelapin mlkk518 zhengqun cvsch guanbo-tju limingxing00 zhu011 grantorshadow ray3417 xiongjiapan whuhxb

p2t's Issues

code of segment is not complete, is it?

the weights question

Hello!

Are there pre-training weights for p2t-small and p2t-base? Can you send it to me if there is one?

Thanks you！

Training log of ImageNet

It seems that the ImageNet training log of the tiny version is incomplete, could you update it?

FPS testing

Could you please release the code for the fps testing?

关于**学生浏览google drive不便的问题

由于墙的存在，google drive在**没有服务器，**的学生想要下载权重和log只能找付费的翻墙软件，会造成额外的开销。您这里能将权重和log再上传到百度云盘或者阿里云盘上吗？或者其他的资料共享平台？

Loading the pretrained model (Object Detection Task)

Hello author, I have a question about loading pretrained models for object detection tasks. The following is the part of loading the pre-trained model in my training log file, which is a little different from the result of loading the pretrained model in the log file you provided (yours shows 'pretrained=.... ' directly after type='retinanet', is there any difference between the two, and whether I successfully loaded the pretrained model, thank you.)
My:
model = dict(
type='RetinaNet',
backbone=dict(
type='p2t_tiny',
depth=50，
num stages=4,
out indices=(0，1，2，3),
frozen stages=1,
norm cfg=dict(type='BN', requires_grad=True)
norm eval=True,
style='pytorch',
init_cfg=dict( type='Pretrained',
checkpoint='pretrained/p2t_tiny.pth' ),
)

Yours:
model = dict(
type='RetinaNet',
pretrained='data/pretrained/p2t/p2t_tiny.pth',
backbone=dict(
type='p2t_tiny',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'
)

It's been three months

unexpected keyword argument 'pretrain_cfg'

Hellow!
when i used the code follow the guide of 'readme.md'
(!python -m torch.distributed.launch --nproc_per_node=2 \ --master_port=$((RANDOM+10000)) --use_env main.py --data-path ~/autodl-tmp/p2t/P2T-main/imagenette2-320/ --batch-size 128 --model p2t_tiny --drop-path 0.1),
i met the following error:
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/root/autodl-tmp/p2t/P2T-main/imagenette2-320/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distributed=True, drop=0.0, drop_path=0.1, epochs=300, eval=False, eval_epoch=10, finetune='', fp32_resume=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='p2t_tiny', momentum=0.9, num_workers=16, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='./checkpoint.pth', sched='cosine', seed=123, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', use_mcloader=False, warmup_epochs=20, warmup_lr=1e-06, weight_decay=0.05, world_size=2) /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Traceback (most recent call last): File "main.py", line 422, in <module> main(args) File "main.py", line 245, in main model = create_model( File "/root/autodl-tmp/pytorch-image-models/timm/models/_factory.py", line 114, in create_model model = create_fn( File "/root/autodl-tmp/p2t/P2T-main/p2t.py", line 353, in p2t_tiny model = PyramidPoolingTransformer( **TypeError: __init__() got an unexpected keyword argument 'pretrained_cfg'**

i just use the environment by pip install timm, i am new to this repo and don't know whether the repo has been renewed so that the code can't be runned. I am sorry for that i can't slove this problem by reading 'readme.md' and 'timmdocs',
Look forward for your help!

Questions about your ablation studies

Hello,

I have some questions about your ablation studies of pyramid pooling.
Could you detail about your baseline version in Table 9?
First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios?
Second, do you compared your method with original MHSA?

How to load ImageNet1K pretrained weight to semantic segmentation model?

Hello, thanks for open source!

I use mmseg, and load weight from image classification result, it warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias
unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ......
And the experimental results are terrible as the experiments initialize weight with random.

So I load weight from ADE20K result, it work and warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias
And the result is similar to the result you offer.

Which weight should I load? ImageNet1K or ADE20K?
Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?

When will the code be released?

I am very interested in your model. Will you release the code? Thanks.

P2T replaces PVT trunk bug

When I replaced the PVT trunk with P2T in my code, I encountered an error ：
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

About random results

Excuse me, I recently encountered an issue where I trained a semantic segmentation model with the same configuration multiple times, and although the results are similar to what you reported in your paper, they appear to be random.

I was wondering if you could kindly provide some guidance on how to handle this randomness.

Additionally, may I ask if it is appropriate to compare my best result among multiple experiments with your reported result?

grid features

Excuse me,may the Object Detection model use to extract grid features

Pooling layers vs. Conv layers

Thanks for your great work.

I found in tab7, the avg pooling layer is much better than the max pooling and depth-wise conv. However, a depthwise conv layer can also act as a pooling layer when all parameters are equal. It seems that the convlayer version should perform better, which contradicts with the experimental results.

Can you explain the result?

Thank u

resort

Hello author, I would like to know what the configuration of the P2P environment you are using is because I may encounter some bugs during my own configuration, which is why I want to know

P2T on ImageNet-22K?

Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?

Thank you.