lucidrains / segformer-pytorch Goto Github PK

View Code? Open in Web Editor NEW

333.0 9.0 44.0 106 KB

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

License: MIT License

Python 100.00%

artificial-intelligence deep-learning attention-mechanism multilayer-perceptron segmentation image-segmentation

segformer-pytorch's Issues

are the imgsize parameter of MixVisionTransformer is necessary？

segformer drop the position encoding and make the model inductive when test resolution is diff with train

Why you use InstanceNorm instead of LayerNorm?

i see you use LayerNorm=partial(InstanceNorm2d)

Models weights + model output HxW

Hi,

Could you please add the models weights so we can start training from them?

Also, why you choose to train models with an output of size (H/4,W/4) and not the original (HxW) size?

Great job for the paper, very interesting :)

where are the pre-training weights？

Thank you for your wonderful code implementation. I would like to ask where are the pre-training weights?

The model configurations for all the SegFormer B0 ~ B5

Hello
How are you?
Thanks for contributing to this project.
Is the model configuration in README MiT-B0 correctly?
That's because the total number of params for the model is 36M.
Could u provide all the model configurations for SegFormer B0 ~ B5?

i find the decoder which in your implementation is conv2d,and it is different with MPLDecoder which used in segformer paper ?

did you test performance and infer speed,are conv2d is better than mlpdecode?

pretrain weight

which pretrain weight can use

how 2 ouput origin h w size？

batchNorm or layerNorm？

I see the author using batchNorm not layerNorm according to the mmsegmentation config file in the official depot. Am I misinterpreting this？

a question about kv reshape in Efficient Self-Attention

Thanks for sharing your work, your code is so elegant, and inspired me a lot.
Here is a question about the implementation of Efficient Self-Attention

It seems you use a "mean op" to reshape k,v.
and the official implementation uses a (learnable) linear mapping to reshape k,v

may I ask, whether this difference significantly matters in your experiment ?

in your code:

k, v = map(lambda t: reduce(t, 'b c (h r1) (w r2) -> b c h w', 'mean', r1 = r, r2 = r), (k, v))

the original implementation uses:

self.kv = nn.Linear(dim, dim * 2, bias=qkv_bias)
self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio)
self.norm = nn.LayerNorm(dim)

x_ = x.permute(0, 2, 1).reshape(B, C, H, W)
x_ = self.sr(x_).reshape(B, C, -1).permute(0, 2, 1)
x_ = self.norm(x_)
kv = self.kv(x_).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
k, v = kv[0], kv[1]

Something is wrong with your implementation.

Hello!

First of all, I really like the repo. The implementation is clean and so much easier to understand than the official repo. But after doing some digging, I realized that the number of parameters and layers (especially conv2d) is quite different from the official implementation. This is the case for all variants I have tested (B0 and B5).

Check out the README in my repo here, and you'll see what I mean. I also included images of the execution graphs of the two different implementations in the 'src' folder, which could help to debug.

I don't quite have time to dig into the source of the problem, but I just thought I'd share my observations with you.

how to use pre_trained weights?

Hi, your code implementation helped me a lot! I am doing a new segmentation task now, and I want to use pre-trained network weights like imagenet, how can I modify the code? Thanks!

patch size not used

Hi again,

I have also noticed that you don't use the parameter patch_size on the construction function of the segformer.

Is this okey?

lucidrains / segformer-pytorch Goto Github PK

segformer-pytorch's Issues

are the imgsize parameter of MixVisionTransformer is necessary？

Why you use InstanceNorm instead of LayerNorm?

Models weights + model output HxW

where are the pre-training weights？

The model configurations for all the SegFormer B0 ~ B5

i find the decoder which in your implementation is conv2d,and it is different with MPLDecoder which used in segformer paper ?

pretrain weight

how 2 ouput origin h w size？

batchNorm or layerNorm？

a question about kv reshape in Efficient Self-Attention

Something is wrong with your implementation.

how to use pre_trained weights?

patch size not used

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent