Code Monkey home page Code Monkey logo

p2t's Introduction

Hi there 👋

p2t's People

Contributors

yuhuan-wu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

p2t's Issues

the weights question

Hello!

Are there pre-training weights for p2t-small and p2t-base? Can you send it to me if there is one?

Thanks you!

Training log of ImageNet

It seems that the ImageNet training log of the tiny version is incomplete, could you update it?

FPS testing

Could you please release the code for the fps testing?

weights

我应该如何导入预训练好的权重到模型中?以便开始训练我的模型。

关于**学生浏览google drive不便的问题

由于墙的存在,google drive在**没有服务器,**的学生想要下载权重和log只能找付费的翻墙软件,会造成额外的开销。您这里能将权重和log再上传到百度云盘或者阿里云盘上吗?或者其他的资料共享平台?

Loading the pretrained model (Object Detection Task)

Hello author, I have a question about loading pretrained models for object detection tasks. The following is the part of loading the pre-trained model in my training log file, which is a little different from the result of loading the pretrained model in the log file you provided (yours shows 'pretrained=.... ' directly after type='retinanet', is there any difference between the two, and whether I successfully loaded the pretrained model, thank you.)
My:
model = dict(
type='RetinaNet',
backbone=dict(
type='p2t_tiny',
depth=50,
num stages=4,
out indices=(0,1,2,3),
frozen stages=1,
norm cfg=dict(type='BN', requires_grad=True)
norm eval=True,
style='pytorch',
init_cfg=dict( type='Pretrained',
checkpoint='pretrained/p2t_tiny.pth' ),

)

Yours:
model = dict(
type='RetinaNet',
pretrained='data/pretrained/p2t/p2t_tiny.pth',
backbone=dict(
type='p2t_tiny',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'
)

unexpected keyword argument 'pretrain_cfg'

Hellow!
when i used the code follow the guide of 'readme.md'
(!python -m torch.distributed.launch --nproc_per_node=2 \ --master_port=$((RANDOM+10000)) --use_env main.py --data-path ~/autodl-tmp/p2t/P2T-main/imagenette2-320/ --batch-size 128 --model p2t_tiny --drop-path 0.1),
i met the following error:
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=128, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/root/autodl-tmp/p2t/P2T-main/imagenette2-320/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distributed=True, drop=0.0, drop_path=0.1, epochs=300, eval=False, eval_epoch=10, finetune='', fp32_resume=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='p2t_tiny', momentum=0.9, num_workers=16, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='./checkpoint.pth', sched='cosine', seed=123, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', use_mcloader=False, warmup_epochs=20, warmup_lr=1e-06, weight_decay=0.05, world_size=2) /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( /root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Traceback (most recent call last): File "main.py", line 422, in <module> main(args) File "main.py", line 245, in main model = create_model( File "/root/autodl-tmp/pytorch-image-models/timm/models/_factory.py", line 114, in create_model model = create_fn( File "/root/autodl-tmp/p2t/P2T-main/p2t.py", line 353, in p2t_tiny model = PyramidPoolingTransformer( **TypeError: __init__() got an unexpected keyword argument 'pretrained_cfg'**

i just use the environment by pip install timm, i am new to this repo and don't know whether the repo has been renewed so that the code can't be runned. I am sorry for that i can't slove this problem by reading 'readme.md' and 'timmdocs',
Look forward for your help!

Questions about your ablation studies

Hello,

I have some questions about your ablation studies of pyramid pooling.
Could you detail about your baseline version in Table 9?
First, you say that you replace P-MHSA with an MHSA with a single pooling operation, what is the detail about single pooling operation? Ex: Pooling Ratios?
Second, do you compared your method with original MHSA?

How to load ImageNet1K pretrained weight to semantic segmentation model?

Hello, thanks for open source!

I use mmseg, and load weight from image classification result, it warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias
unexpected key in source state_dict: cls_token, ln1.bias, ln1.weight, layers.0.ln1.bias, layers.0.ln1.weight, layers.0.ln2.bias, layers.0.ln2.weight, layers.0.ffn.layers.0.0.bias, layers.0.ffn.layers.0.0.weight, layers.0.ffn.layers.1.bias, layers.0.ffn.layers.1.weight, layers.0.attn.attn.out_proj.bias, layers.0.attn.attn.out_proj.weight, layers.0.attn.attn.in_proj_bias, layers.0.attn.attn.in_proj_weight, layers.1.ln1.bias, layers.1.ln1.weight, layers.1.ln2.bias, layers.1.ln2.weight, layers.1.ffn.layers.0.0.bias, layers.1.ffn.layers.0.0.weight, layers.1.ffn.layers.1.bias, layers.1.ffn.layers.1.weight, layers.1.attn.attn.out_proj.bias, layers.1.attn.attn.out_proj.weight ......

And the experimental results are terrible as the experiments initialize weight with random.

So I load weight from ADE20K result, it work and warns:
WARNING - The model and loaded state dict do not match exactly
missing keys in source state_dict: backbone.head.weight, backbone.head.bias

And the result is similar to the result you offer.

Which weight should I load? ImageNet1K or ADE20K?
Or should I modify the keys of weight in ImageNet1K to adapt the key in segmentation?

P2T replaces PVT trunk bug

When I replaced the PVT trunk with P2T in my code, I encountered an error :
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 512, 3, 3]], which is output 0 of AdaptiveAvgPool2DBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

About random results

Excuse me, I recently encountered an issue where I trained a semantic segmentation model with the same configuration multiple times, and although the results are similar to what you reported in your paper, they appear to be random.

I was wondering if you could kindly provide some guidance on how to handle this randomness.

Additionally, may I ask if it is appropriate to compare my best result among multiple experiments with your reported result?

grid features

Excuse me,may the Object Detection model use to extract grid features

Pooling layers vs. Conv layers

Thanks for your great work.

I found in tab7, the avg pooling layer is much better than the max pooling and depth-wise conv. However, a depthwise conv layer can also act as a pooling layer when all parameters are equal. It seems that the convlayer version should perform better, which contradicts with the experimental results.

Can you explain the result?

Thank u

resort

Hello author, I would like to know what the configuration of the P2P environment you are using is because I may encounter some bugs during my own configuration, which is why I want to know

P2T on ImageNet-22K?

Hi @yuhuan-wu , thank you for share the code of this excellent work! Have you trained P2T on ImageNet-22K dataset or any further plan to do it? If so, could you please share the pretrained model on ImageNet-22k?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.