lucidrains / segformer-pytorch Goto Github PK

View Code? Open in Web Editor NEW

330.0 9.0 44.0 106 KB

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

License: MIT License

Python 100.00%

artificial-intelligence deep-learning attention-mechanism multilayer-perceptron segmentation image-segmentation

segformer-pytorch's Introduction

Segformer - Pytorch

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch.

Install

$ pip install segformer-pytorch

Usage

For example, MiT-B0

import torch
from segformer_pytorch import Segformer

model = Segformer(
    dims = (32, 64, 160, 256),      # dimensions of each stage
    heads = (1, 2, 5, 8),           # heads of each stage
    ff_expansion = (8, 8, 4, 4),    # feedforward expansion factor of each stage
    reduction_ratio = (8, 4, 2, 1), # reduction ratio of each stage for efficient attention
    num_layers = 2,                 # num layers of each stage
    decoder_dim = 256,              # decoder dimension
    num_classes = 4                 # number of segmentation classes
)

x = torch.randn(1, 3, 256, 256)
pred = model(x) # (1, 4, 64, 64)  # output is (H/4, W/4) map of the number of segmentation classes

Make sure the keywords are at most a tuple of 4, as this repository is hard-coded to give the MiT 4 stages as done in the paper.

Citations

@misc{xie2021segformer,
    title   = {SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers}, 
    author  = {Enze Xie and Wenhai Wang and Zhiding Yu and Anima Anandkumar and Jose M. Alvarez and Ping Luo},
    year    = {2021},
    eprint  = {2105.15203},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

segformer-pytorch's People

Contributors

Stargazers

Watchers

Forkers

dixonch ancientremember tanmdl ghali007 magnety kolaszko ravimk07 ankitshah009 sharkykittens trendingtechnology sundragon1993 qiu023 sailfish009 yafengge tatsuki-fukushima-avenue luoyizhi516 cjxtu shota74 swjtulinxi crashmoon zjuzwb ismarou yzluka gliese581gg yingtiandt smartzzh tmukande-debug s3nh 1ruizhi long-nguyen12 geozcx max-chenb mcx jzw0025 jkmlscy hxdaze 5l1v3r1 cv-seg tewarfel xuan-world pyh5214 raspberrycoke clxie

segformer-pytorch's Issues

pretrain weight

which pretrain weight can use

Why you use InstanceNorm instead of LayerNorm?

i see you use LayerNorm=partial(InstanceNorm2d)

The model configurations for all the SegFormer B0 ~ B5

Hello
How are you?
Thanks for contributing to this project.
Is the model configuration in README MiT-B0 correctly?
That's because the total number of params for the model is 36M.
Could u provide all the model configurations for SegFormer B0 ~ B5?

how 2 ouput origin h w size？

are the imgsize parameter of MixVisionTransformer is necessary？

segformer drop the position encoding and make the model inductive when test resolution is diff with train

Something is wrong with your implementation.

Hello!

First of all, I really like the repo. The implementation is clean and so much easier to understand than the official repo. But after doing some digging, I realized that the number of parameters and layers (especially conv2d) is quite different from the official implementation. This is the case for all variants I have tested (B0 and B5).

Check out the README in my repo here, and you'll see what I mean. I also included images of the execution graphs of the two different implementations in the 'src' folder, which could help to debug.

I don't quite have time to dig into the source of the problem, but I just thought I'd share my observations with you.

batchNorm or layerNorm？

I see the author using batchNorm not layerNorm according to the mmsegmentation config file in the official depot. Am I misinterpreting this？

Models weights + model output HxW

Hi,

Could you please add the models weights so we can start training from them?

Also, why you choose to train models with an output of size (H/4,W/4) and not the original (HxW) size?

Great job for the paper, very interesting :)

i find the decoder which in your implementation is conv2d,and it is different with MPLDecoder which used in segformer paper ?

did you test performance and infer speed,are conv2d is better than mlpdecode?

patch size not used

Hi again,

I have also noticed that you don't use the parameter patch_size on the construction function of the segformer.

Is this okey?

may I ask, whether this difference significantly matters in your experiment ?

in your code:

k, v = map(lambda t: reduce(t, 'b c (h r1) (w r2) -> b c h w', 'mean', r1 = r, r2 = r), (k, v))

the original implementation uses:

self.kv = nn.Linear(dim, dim * 2, bias=qkv_bias)
self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio)
self.norm = nn.LayerNorm(dim)

x_ = x.permute(0, 2, 1).reshape(B, C, H, W)
x_ = self.sr(x_).reshape(B, C, -1).permute(0, 2, 1)
x_ = self.norm(x_)
kv = self.kv(x_).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
k, v = kv[0], kv[1]