yuhuan-wu / p2t Goto Github PK
View Code? Open in Web Editor NEW[TPAMI22] Pyramid Pooling Transformer for Scene Understanding
[TPAMI22] Pyramid Pooling Transformer for Scene Understanding
Hello!
Are there pre-training weights for p2t-small and p2t-base? Can you send it to me if there is one?
Thanks you!
pool = pool + l(pool)
It seems that the ImageNet training log of the tiny version is incomplete, could you update it?
Could you please release the code for the fps testing?
我应该如何导入预训练好的权重到模型中?以便开始训练我的模型。
由于墙的存在,google drive在**没有服务器,**的学生想要下载权重和log只能找付费的翻墙软件,会造成额外的开销。您这里能将权重和log再上传到百度云盘或者阿里云盘上吗?或者其他的资料共享平台?
Hello author, I have a question about loading pretrained models for object detection tasks. The following is the part of loading the pre-trained model in my training log file, which is a little different from the result of loading the pretrained model in the log file you provided (yours shows 'pretrained=.... ' directly after type='retinanet', is there any difference between the two, and whether I successfully loaded the pretrained model, thank you.)
My:
model = dict(
type='RetinaNet',
backbone=dict(
type='p2t_tiny',
depth=50,
num stages=4,
out indices=(0,1,2,3),
frozen stages=1,
norm cfg=dict(type='BN', requires_grad=True)
norm eval=True,
style='pytorch',
init_cfg=dict( type='Pretrained',
checkpoint='pretrained/p2t_tiny.pth' ),
)
Yours:
model = dict(
type='RetinaNet',
pretrained='data/pretrained/p2t/p2t_tiny.pth',
backbone=dict(
type='p2t_tiny',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'
)
Hellow!
when i used the code follow the guide of 'readme.md'
(!python -m torch.distributed.launch --nproc_per_node=2 \ --master_port=$((RANDOM+10000)) --use_env main.py --data-path ~/autodl-tmp/p2t/P2T-main/imagenette2-320/ --batch-size 128 --model p2t_tiny --drop-path 0.1),
i met the following error:
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/root/autodl-tmp/p2t/P2T-main/imagenette2-320/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distributed=True, drop=0.0, drop_path=0.1, epochs=300, eval=False, eval_epoch=10, finetune='', fp32_resume=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='p2t_tiny', momentum=0.9, num_workers=16, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='./checkpoint.pth', sched='cosine', seed=123, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', use_mcloader=False, warmup_epochs=20, warmup_lr=1e-06, weight_decay=0.05, world_size=2) /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Traceback (most recent call last): File "main.py", line 422, in <module> main(args) File "main.py", line 245, in main model = create_model( File "/root/autodl-tmp/pytorch-image-models/timm/models/_factory.py", line 114, in create_model model = create_fn( File "/root/autodl-tmp/p2t/P2T-main/p2t.py", line 353, in p2t_tiny model = PyramidPoolingTransformer( **TypeError: __init__() got an unexpected keyword argument 'pretrained_cfg'**
i just use the environment by pip install timm
, i am new to this repo and don't know whether the repo has been renewed so that the code can't be runned. I am sorry for that i can't slove this problem by reading 'readme.md' and 'timmdocs',
Look forward for your help!
Hello,
I have some questions about your ablation studies of pyramid pooling.
Could you detail about your baseline version in Table 9?
First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios?
Second, do you compared your method with original MHSA?
Hello, thanks for open source!
I use mmseg, and load weight from image classification result, it warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias
unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ......
And the experimental results are terrible as the experiments initialize weight with random.
So I load weight from ADE20K result, it work and warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias
And the result is similar to the result you offer.
Which weight should I load? ImageNet1K or ADE20K?
Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?
I am very interested in your model. Will you release the code? Thanks.
When I replaced the PVT trunk with P2T in my code, I encountered an error :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Excuse me, I recently encountered an issue where I trained a semantic segmentation model with the same configuration multiple times, and although the results are similar to what you reported in your paper, they appear to be random.
I was wondering if you could kindly provide some guidance on how to handle this randomness.
Additionally, may I ask if it is appropriate to compare my best result among multiple experiments with your reported result?
Excuse me,may the Object Detection model use to extract grid features
Thanks for your great work.
I found in tab7, the avg pooling layer is much better than the max pooling and depth-wise conv. However, a depthwise conv layer can also act as a pooling layer when all parameters are equal. It seems that the convlayer version should perform better, which contradicts with the experimental results.
Can you explain the result?
Thank u
Hello author, I would like to know what the configuration of the P2P environment you are using is because I may encounter some bugs during my own configuration, which is why I want to know
Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.