thanos-db / fullyconvolutionaltransformer Goto Github PK

View Code? Open in Web Editor NEW

111.0 3.0 12.0 548 KB

Official implementation of The Fully Convolutional Transformer for Medical Image Segmentation

Home Page: https://chaitanya-kaul.github.io/

Python 32.85% Jupyter Notebook 67.15%

acdc fct segmentation synapse wacv2023 convolutional-transformer

fullyconvolutionaltransformer's Introduction

The Fully Convolutional Transformer for Medical Image Segmentation

Overview

We propose a novel transformer model, capable of segmenting medical images of varying modalities. Challenges posed by the fine grained nature of medical image analysis mean that the adaptation of the transformer for their analysis is still at nascent stages. The overwhelming success of the UNet lay in its ability to appreciate the fine-grained nature of the segmentation task, an ability which existing transformer based models do not currently posses. To address this shortcoming, we propose The Fully Convolutional Transformer (FCT), which builds on the proven ability of Convolutional Neural Networks to learn effective image representations, and combines them with the ability of Transformers to effectively capture long-term dependencies in its inputs. The FCT is the first fully convolutional Transformer model in medical imaging literature. It processes its input in two stages, where first, it learns to extract long range semantic dependencies from the input image, and then learns to capture hierarchical global attributes from the features. FCT is compact, accurate and robust. Our results show that it outperforms all existing transformer architectures by large margins across multiple medical image segmentation datasets of varying data modalities without the need for any pre-training. FCT outperforms its immediate competitor on the ACDC dataset by 1.3%, on the Synapse dataset by 4.4%, on the Spleen dataset by 1.2% and on ISIC 2017 dataset by 1.1% on the dice metric, with up to five times fewer parameters.

[Paper]

Further information please contact Chaitanya Kaul.

Author's Implementations

The experiments in our paper are done with the TensorFlow 2 implementation. The PyTorch implementation is provided here, but is still under development and not fully tested. We will continue to update this repository with more code in both TensorFlow 2 as well as PyTorch to enable maximum reproducibility.

Docker image

chaitanyakaul14/fct-tf:v2

Results

The performance is evaluated on Dice (higher is better). We release results on the ACDC dataset for now to enable researchers to use our work. The rest of the results will be updated with time.

Method	Avg.	RV	MYO	LV
R50 UNet	87.55	87.10	80.63	94.92
R50 Att-UNet	86.75	87.58	79.20	93.47
ViT	81.45	81.46	70.71	92.18
R50 ViT	87.57	86.07	81.88	94.75
TransUNet	89.71	88.86	84.53	95.73
Swin UNet	90.00	88.55	85.62	95.83
LeViT-UNet₃₈₄	90.32	89.55	87.64	93.76
nnUNet	91.61	90.24	89.24	95.36
nnFormer	91.78	90.22	89.53	95.59
FCT224 w/o D.S.	91.49	90.32	89.00	95.17
FCT224 full D.S.	91.49	90.49	88.76	95.23
FCT₂₂₄	92.84	92.02	90.61	95.89
FCT₃₈₄	93.02	92.64	90.51	95.90

Qualitative results on the different segmentation datasets. From the top - ACDC Segmentation Dataset [Colours - Maroon (LV), Blue (RV), Green (MYO)], Spleen Segmentation Dataset [Colours - Blue (Spleen)], Synapse Segmentation Dataset [Colours - Blue (Aorta), Purple (Gallbladder), Navy (Left Kidney), Aquatic (Right Kidney), Green (Liver), Yellow (Pancreas), Red (Spleen), Maroon (Stomach)] and ISIC 2017 Skin Cancer Segmentation Dataset [Colours - Blue (Skin Cancer)]. The images alternate between the ground truth and the segmentation map predicted by FCT. Best viewed in colour.

Citation

Please cite this paper if you want to use it in your work,

@article{tragakis2022fully,
title={The Fully Convolutional Transformer for Medical Image Segmentation},
author={Tragakis, Athanasios and Kaul, Chaitanya and Murray-Smith, Roderick and Husmeier, Dirk},
journal={arXiv preprint arXiv:2206.00566},
year={2022}
}

License

MIT License

fullyconvolutionaltransformer's People

Contributors

Stargazers

Watchers

Forkers

efss24 olawaleibrahim cv-seg omarkhaled1504 chadkowski tarekegn82 snaka99 mabrokma ayaelgebaly caz-t kjm42 pinghe-stan

fullyconvolutionaltransformer's Issues

Loading the model

When i save the model using torch.save(model.state_dict(), "path/name.pt") then load it using model.load_state_dict(torch.load("path/name.p")) to continue training, it starts training from the beginnig, is there another way to achieve what i want to do? as i see in your code you save the model using the same method.

Question Regarding Training

Hello! I hope that you are still able to have a discussion regarding your repository.

As for now, I have tried to use your random splits mentioned in #6, tried not to change anything from your repo, and tried to start the training myself, but I wasn't able to reproduce your results (I achieved significantly lower results compared to your notebook). May I have your solution from this matter? I was expecting that the problem comes from the differences between the libraries version installed on my device and yours, but I would like to hear from you. Thanks in advance!

ResourceExhaustedError:

ResourceExhaustedError: OOM when allocating tensor with shape[2,256,256,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_3/up_sampling2d_27/resize/ResizeNearestNeighbor (defined at C:\Users\adnan\AppData\Local\Temp\ipykernel_3992\1459390362.py:4) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_function_118790]

Function call stack:
train_function

I am using RtX2070 GPU for this Tensorflow model. But I am facing this issue after decreasing the batch size. Can you figure out the issue? Thank you

number of filters and attention heads

in papers number of filters per stage is [16, 32, 64, 128, 384, 128, 64, 32, 16], number of attention
heads per stage is[2, 4, 8, 12, 16, 12, 8, 4, 2].
but in code,is different.
i want to train Synapse,which should i use att_heads = [2, 2, 2, 2, 2, 2, 2, 2, 2],filters = [8, 16, 32, 64, 128, 64, 32, 16, 8] ?or filters = [32, 64, 128, 256, 512, 256, 128, 64, 32]

some questions about dice in ACDC dataset

In synapse dataset, all metrics is in 2D slices of synapse dataset. But in ACDC dataset, the dice of nnFormer is 91.78,I konw this result appeares in 3D data ，and your method is 93.02, so has your model on 2D slices outperformed other models on 3D data ?

Pretrained Models?

Hi,
first of all, you have done a great job....Hats off
can you please provide the pre-trained models of your experiments?
Thanks!

which dataset did you use for this code?

Thank you for your code. Can I download a dataset to work with your code? Thanks.

Parameter quantity

Hello, can you tell me the parameter quantities of the FCT224 and FCT384 models used for ACDC dataset segmentation？

output images

Hi.
firstly, you have done a great job thanks for your code.
is it possible to save the output images? how can I save the predicted images?

Can you provide the online download address of Spleen Segmentation Dataset?

pytorch model version

Thanks your work! But I find some problems in pytorch's model code. I used your code to do a bineary semantic segmentation experiment, and the effect was very bad. Even if all the predictions were background classes, the model would not converge at all. I ensured that the data processing in my experiment was effective, and I replaced it with the simplest unet model to ensure the accuracy of my data processing process. But the unet model get correct result.

here are my FCT model:

from torch import nn
import torch
import torch.nn.functional as F
import numpy as np


class Attention(nn.Module):
    def __init__(self,
                 channels,
                 num_heads,
                 proj_drop=0.0,
                 kernel_size=3,
                 stride_kv=1,
                 stride_q=1,
                 padding_q=1,
                 padding_kv=1,
                 attention_bias=True
                 ):
        super().__init__()
        self.stride_kv = stride_kv
        self.stride_q = stride_q
        self.num_heads = num_heads
        self.proj_drop = proj_drop

        self.conv_q = nn.Conv2d(channels, channels, kernel_size, stride_q, padding_q, bias=attention_bias,
                                groups=channels)
        self.layernorm_q = nn.LayerNorm(channels, eps=1e-5)
        self.conv_k = nn.Conv2d(channels, channels, kernel_size, stride_kv, padding_kv, bias=attention_bias,
                                groups=channels)
        self.layernorm_k = nn.LayerNorm(channels, eps=1e-5)
        self.conv_v = nn.Conv2d(channels, channels, kernel_size, stride_kv, padding_kv, bias=attention_bias,
                                groups=channels)
        self.layernorm_v = nn.LayerNorm(channels, eps=1e-5)

        self.attention = nn.MultiheadAttention(embed_dim=channels,
                                               bias=attention_bias,
                                               num_heads=1)

    def _build_projection(self, x, qkv):

        if qkv == "q":
            x1 = F.relu(self.conv_q(x))
            x1 = x1.permute(0, 2, 3, 1)
            x1 = self.layernorm_q(x1)
            proj = x1.permute(0, 3, 1, 2)
        elif qkv == "k":
            x1 = F.relu(self.conv_k(x))
            x1 = x1.permute(0, 2, 3, 1)
            x1 = self.layernorm_k(x1)
            proj = x1.permute(0, 3, 1, 2)
        elif qkv == "v":
            x1 = F.relu(self.conv_v(x))
            x1 = x1.permute(0, 2, 3, 1)
            x1 = self.layernorm_v(x1)
            proj = x1.permute(0, 3, 1, 2)
        else:
            proj = None
            ValueError('qkv is error')

        return proj

    def forward_conv(self, x):
        q = self._build_projection(x, "q")
        k = self._build_projection(x, "k")
        v = self._build_projection(x, "v")

        return q, k, v

    def forward(self, x):
        q, k, v = self.forward_conv(x)
        q = q.view(x.shape[0], x.shape[1], x.shape[2] * x.shape[3])
        k = k.view(x.shape[0], x.shape[1], x.shape[2] * x.shape[3])
        v = v.view(x.shape[0], x.shape[1], x.shape[2] * x.shape[3])
        q = q.permute(0, 2, 1)
        k = k.permute(0, 2, 1)
        v = v.permute(0, 2, 1)
        # 因为multi-head会有输出有两个值，所以就会后面只取x[0]
        x1 = self.attention(query=q, value=v, key=k, need_weights=False)

        x1 = x1[0].permute(0, 2, 1)
        x1 = x1.view(x1.shape[0], x1.shape[1], np.sqrt(x1.shape[2]).astype(int), np.sqrt(x1.shape[2]).astype(int))
        x1 = F.dropout(x1, self.proj_drop)

        return x1


class Transformer(nn.Module):

    def __init__(self,
                 channels,
                 num_heads,
                 proj_drop=0.0,
                 attention_bias=True,
                 padding_q=1,
                 padding_kv=1,
                 stride_kv=1,
                 stride_q=1):
        super().__init__()

        self.attention_output = Attention(channels=channels,
                                          num_heads=num_heads,
                                          proj_drop=proj_drop,
                                          padding_q=padding_q,
                                          padding_kv=padding_kv,
                                          stride_kv=stride_kv,
                                          stride_q=stride_q,
                                          attention_bias=attention_bias,
                                          )

        self.conv1 = nn.Conv2d(channels, channels, 3, 1, padding=1)
        self.layernorm = nn.LayerNorm(self.conv1.out_channels, eps=1e-5)
        self.wide_focus = Wide_Focus(channels, channels)

    def forward(self, x):
        x1 = self.attention_output(x)
        x1 = self.conv1(x1)
        x2 = torch.add(x1, x)
        # 因为是layer normalization ，所以要执行两次permute交换特征维度
        x3 = x2.permute(0, 2, 3, 1)
        x3 = self.layernorm(x3)
        x3 = x3.permute(0, 3, 1, 2)
        x3 = self.wide_focus(x3)
        x3 = torch.add(x2, x3)
        return x3


class Wide_Focus(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 padding_number=1):
        super().__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number)
        self.conv2 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number * 2, dilation=2)
        self.conv3 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number * 3, dilation=3)
        self.conv4 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number)

    def forward(self, x):
        x1 = self.conv1(x)
        x1 = F.gelu(x1)
        x1 = F.dropout(x1, 0.1)
        x2 = self.conv2(x)
        x2 = F.gelu(x2)
        x2 = F.dropout(x2, 0.1)
        x3 = self.conv3(x)
        x3 = F.gelu(x3)
        x3 = F.dropout(x3, 0.1)
        added = torch.add(x1, x2)
        added = torch.add(added, x3)
        x_out = self.conv4(added)
        x_out = F.gelu(x_out)
        x_out = F.dropout(x_out, 0.1)
        return x_out


class BlockEncoderBottleneck(nn.Module):
    def __init__(self, blk, in_channels, out_channels, att_heads, dpr, padding_number=1):
        super().__init__()
        self.blk = blk
        if (self.blk == "first") or (self.blk == "bottleneck"):
            self.layernorm = nn.LayerNorm(in_channels, eps=1e-5)
            self.conv1 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number)
            self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=padding_number)
            self.trans = Transformer(out_channels, att_heads, dpr)
        elif (self.blk == "second") or (self.blk == "third") or (self.blk == "fourth"):
            self.layernorm = nn.LayerNorm(in_channels, eps=1e-5)
            self.conv1 = nn.Conv2d(1, in_channels, 3, 1, padding=padding_number)
            self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=padding_number)
            self.conv3 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=padding_number)
            self.trans = Transformer(out_channels, att_heads, dpr)

    def forward(self, x, scale_img=None):
        if (self.blk == "first") or (self.blk == "bottleneck"):
            x1 = x.permute(0, 2, 3, 1)
            x1 = self.layernorm(x1)
            x1 = x1.permute(0, 3, 1, 2)
            x1 = F.relu(self.conv1(x1))
            x1 = F.relu(self.conv2(x1))
            x1 = F.dropout(x1, 0.3)
            x1 = F.max_pool2d(x1, (2, 2))
            out = self.trans(x1)
            # without skip
        elif (self.blk == "second") or (self.blk == "third") or (self.blk == "fourth"):
            x1 = x.permute(0, 2, 3, 1)
            x1 = self.layernorm(x1)
            x1 = x1.permute(0, 3, 1, 2)
            x1 = torch.cat((F.relu(self.conv1(scale_img)), x1), dim=1)
            x1 = F.relu(self.conv2(x1))
            x1 = F.relu(self.conv3(x1))
            x1 = F.dropout(x1, 0.3)
            x1 = F.max_pool2d(x1, (2, 2))
            out = self.trans(x1)
            # with skip
        else:
            out = None
            ValueError("blk is error")
        return out


class BlockDecoder(nn.Module):
    def __init__(self, in_channels, out_channels, att_heads, dpr, padding_number=1):
        super().__init__()
        self.layernorm = nn.LayerNorm(in_channels, eps=1e-5)
        self.upsample = nn.Upsample(scale_factor=2)
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number)
        self.conv2 = nn.Conv2d(out_channels * 2, out_channels, 3, 1, padding=padding_number)
        self.conv3 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=padding_number)
        self.trans = Transformer(out_channels, att_heads, dpr)

    def forward(self, x, skip):
        x1 = x.permute(0, 2, 3, 1)
        x1 = self.layernorm(x1)
        x1 = x1.permute(0, 3, 1, 2)
        x1 = self.upsample(x1)
        x1 = F.relu(self.conv1(x1))
        x1 = torch.cat([skip, x1], dim=1)
        x1 = F.relu(self.conv2(x1))
        x1 = F.relu(self.conv3(x1))
        x1 = F.dropout(x1, 0.3)
        out = self.trans(x1)
        return out


class DsOut(nn.Module):
    def __init__(self, in_channels, out_channels, num_classes=2, padding_number=1):
        super().__init__()
        self.num_classes = num_classes
        self.upsample = nn.Upsample(scale_factor=2)
        self.layernorm = nn.LayerNorm(in_channels, eps=1e-5)
        self.conv1 = nn.Conv2d(in_channels, in_channels, 3, 1, padding=padding_number)
        self.conv2 = nn.Conv2d(in_channels, out_channels, 3, 1, padding=padding_number)
        self.conv3 = nn.Conv2d(out_channels, self.num_classes, 3, 1, padding=padding_number)

    def forward(self, x):
        x1 = self.upsample(x)
        x1 = x1.permute(0, 2, 3, 1)
        x1 = self.layernorm(x1)
        x1 = x1.permute(0, 3, 1, 2)
        x1 = F.relu(self.conv1(x1))
        x1 = F.relu(self.conv2(x1))
        out = F.sigmoid(self.conv3(x1))

        return out


class FCT(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.num_classes = num_classes
        # attention heads and filters per block
        att_heads = [2, 2, 2, 2, 2, 2, 2, 2, 2]
        filters = [8, 16, 32, 64, 128, 64, 32, 16, 8]

        # number of blocks used in the model
        blocks = len(filters)

        stochastic_depth_rate = 0.0

        # probability for each block
        dpr = [x for x in np.linspace(0, stochastic_depth_rate, blocks)]

        self.drp_out = 0.3

        # Multi-scale input
        self.scale_img = nn.AvgPool2d(2, 2)

        # model
        self.block_1 = BlockEncoderBottleneck("first", 1, filters[0], att_heads[0], dpr[0])
        self.block_2 = BlockEncoderBottleneck("second", filters[0], filters[1], att_heads[1], dpr[1])
        self.block_3 = BlockEncoderBottleneck("third", filters[1], filters[2], att_heads[2], dpr[2])
        self.block_4 = BlockEncoderBottleneck("fourth", filters[2], filters[3], att_heads[3], dpr[3])
        self.block_5 = BlockEncoderBottleneck("bottleneck", filters[3], filters[4], att_heads[4], dpr[4])
        self.block_6 = BlockDecoder(filters[4], filters[5], att_heads[5], dpr[5])
        self.block_7 = BlockDecoder(filters[5], filters[6], att_heads[6], dpr[6])
        self.block_8 = BlockDecoder(filters[6], filters[7], att_heads[7], dpr[7])
        self.block_9 = BlockDecoder(filters[7], filters[8], att_heads[8], dpr[8])

        self.ds7 = DsOut(filters[6], 4, self.num_classes)
        self.ds8 = DsOut(filters[7], 4, self.num_classes)
        self.ds9 = DsOut(filters[8], 4, self.num_classes)

    def forward(self, x):
        # Multi-scale input
        scale_img_2 = self.scale_img(x)
        scale_img_3 = self.scale_img(scale_img_2)
        scale_img_4 = self.scale_img(scale_img_3)

        x = self.block_1(x)
        # print(f"Block 1 out -> {list(x.size())}")
        skip1 = x
        x = self.block_2(x, scale_img_2)
        # print(f"Block 2 out -> {list(x.size())}")
        skip2 = x
        x = self.block_3(x, scale_img_3)
        # print(f"Block 3 out -> {list(x.size())}")
        skip3 = x
        x = self.block_4(x, scale_img_4)
        # print(f"Block 4 out -> {list(x.size())}")
        skip4 = x
        x = self.block_5(x)
        # print(f"Block 5 out -> {list(x.size())}")
        x = self.block_6(x, skip4)
        # print(f"Block 6 out -> {list(x.size())}")
        x = self.block_7(x, skip3)
        # print(f"Block 7 out -> {list(x.size())}")
        skip7 = x
        x = self.block_8(x, skip2)
        # print(f"Block 8 out -> {list(x.size())}")
        skip8 = x
        x = self.block_9(x, skip1)
        # print(f"Block 9 out -> {list(x.size())}")
        skip9 = x

        out7 = self.ds7(skip7)
        # print(f"DS 7 out -> {list(out7.size())}")
        out8 = self.ds8(skip8)
        # print(f"DS 8 out -> {list(out8.size())}")
        out9 = self.ds9(skip9)
        # print(f"DS 9 out -> {list(out9.size())}")

        return out7, out8, out9


def init_weights(m):
    if isinstance(m, nn.Conv2d):
        torch.nn.init.kaiming_normal(m.weight)
        if m.bias is not None:
            torch.nn.init.zeros_(m.bias)


if __name__ == '__main__':
    fct = FCT(num_classes=2)
    print(fct)
    data = torch.rand((2, 1, 224, 224), dtype=torch.float)
    fct(data)

cannot reproduce tensorflow version's result

Hi!I ran the jupyter notebook and just changed the location of dataset and batch size.And I still can not reproduce the result.
Here is my notebook and result:
main_ACDC_92.84.md

Can you give a docker image or a big zip package contains code and data that I can reproduce the result ?

gradient vanishing

Hi! I am reproducing your impressive work.But I came ito gradient vanishing in the block_1.

And the dice stay at 87.

for details,plz check my FCT-Pytorch repo,and see the release.

mistake?

https://github.com/Thanos-DB/FullyConvolutionalTransformer/blob/6626aeac67bce5056716030d88b3b4b081ad0c99/PyTorch/main.py#L160C1-L160C17

Help

Dear Thanos,
I am sorry to disturb you in your busy schedule.I'm a postgraduate student from Shandong,China.
My research direction is also heart segmentation，I also used the ACDC database. Your paper gives me great inspiration. I am currently facing some graduation troubles. I see that the source code you posted is not complete, and I know it involves confidentiality. If possible,can you send me the source code? I will only use it for my graduation research and will not use it for any for-profit activities.
No matter whether you agree or not, best wishes to you. I would appreciate it if you could help me. My email is [email protected]
Thanks and regards.
Sincerely,
Zhai chunlin
2023/10/11

About the input format is RGB image

Hi! Thank you very much for open source your code, if the input is three-channel RGB image, then how should I modify the code?

Bad neural network

The pytorch version of the network you provided has extremely low training efficiency and requires too much time.

about deeply supervision function

Hello,I have a question about computing loss between [out7,out8,out9] and GT mask. In the original code，out7 has an output shape:(None, 56, 56, 4) , out8: (None, 112, 112, 4), out9:(None, 224, 224, 4). To my knowledge,when compute the loss between segmentaion map and mask, shouldn't the output and mask have the same dimension(all of them have same shape:(None, 224, 224, 4))?

Can I get prediction results of Synapse and ACDC?

I request the data because I want to see the qualitative results of your FCT model.
It's because when I trained FCT, it didn't come out as well as the paper's performance.

The segmentation result of Synapse

您好，我有Synapse的训练和测试集，但是我想知道作者您的Synapse可视化结果怎么得到的？是用哪种图吗？我没有拿来做可视化的原图。谢谢您。

low dice on test set

ACDC dataset now have two folders: training and testing

After using this split #6 to divide the training folder into 3 parts .I reproduced your result:

But when I tested the model on test folder which patient number from 101 to 150.the result is low:

Did all the results on the ACDC leaderboard test on the testing folder or on the training folder, like in your paper?
What's more,in my this repo https://github.com/kingo233/FCT-Pytorch. I trained using the training folder and get dice 90 on testing folder.But I can't get improvement any more...is this the upper limit of FCT?

environment configuration

Hello author, I am very interested in your paper and experiments. Could you please provide a detailed environment configuration file, proportional to requirements.txt. I see that your readme doesn't give too many operational details for experimental reproduction. Thank you very much if you can reply

Best regards,
Benedikt