lucidrains / segformer-pytorch Goto Github PK
View Code? Open in Web Editor NEWImplementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch
License: MIT License
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch
License: MIT License
segformer drop the position encoding and make the model inductive when test resolution is diff with train
i see you use LayerNorm=partial(InstanceNorm2d)
Hi,
Could you please add the models weights so we can start training from them?
Also, why you choose to train models with an output of size (H/4,W/4) and not the original (HxW) size?
Great job for the paper, very interesting :)
Thank you for your wonderful code implementation. I would like to ask where are the pre-training weights?
Hello
How are you?
Thanks for contributing to this project.
Is the model configuration in README MiT-B0 correctly?
That's because the total number of params for the model is 36M.
Could u provide all the model configurations for SegFormer B0 ~ B5?
did you test performance and infer speed,are conv2d is better than mlpdecode?
which pretrain weight can use
I see the author using batchNorm not layerNorm according to the mmsegmentation config file in the official depot. Am I misinterpreting this?
Thanks for sharing your work, your code is so elegant, and inspired me a lot.
Here is a question about the implementation of Efficient Self-Attention
It seems you use a "mean op" to reshape k,v.
and the official implementation uses a (learnable) linear mapping to reshape k,v
may I ask, whether this difference significantly matters in your experiment ?
in your code:
k, v = map(lambda t: reduce(t, 'b c (h r1) (w r2) -> b c h w', 'mean', r1 = r, r2 = r), (k, v))
the original implementation uses:
self.kv = nn.Linear(dim, dim * 2, bias=qkv_bias)
self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio)
self.norm = nn.LayerNorm(dim)
x_ = x.permute(0, 2, 1).reshape(B, C, H, W)
x_ = self.sr(x_).reshape(B, C, -1).permute(0, 2, 1)
x_ = self.norm(x_)
kv = self.kv(x_).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
k, v = kv[0], kv[1]
Hello!
First of all, I really like the repo. The implementation is clean and so much easier to understand than the official repo. But after doing some digging, I realized that the number of parameters and layers (especially conv2d) is quite different from the official implementation. This is the case for all variants I have tested (B0 and B5).
Check out the README in my repo here, and you'll see what I mean. I also included images of the execution graphs of the two different implementations in the 'src' folder, which could help to debug.
I don't quite have time to dig into the source of the problem, but I just thought I'd share my observations with you.
Hi, your code implementation helped me a lot! I am doing a new segmentation task now, and I want to use pre-trained network weights like imagenet, how can I modify the code? Thanks!
Hi again,
I have also noticed that you don't use the parameter patch_size on the construction function of the segformer.
Is this okey?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.