askerlee / segtran Goto Github PK

View Code? Open in Web Editor NEW

214.0 214.0 49.0 131.05 MB

Medical Image Segmentation using Squeeze-and-Expansion Transformers

Python 99.94% MATLAB 0.05% Shell 0.01%

segtran's People

Contributors

Stargazers

Watchers

Forkers

repo-collection nkuhealong peterzhousz 13088992520 newstarsg abcxubu 601sung cvlinks chengjianhong hzfu cv-ip syedsajidhussain licj1 chaoxiang661 ema-rachmawati nqn 459737087 mraehanakbar blaxe05 kroszyk hirokinarita arunadevikaruppasamy zh-aidy shilei2403 drpinglu chaitanya-kaul thithaotran jana0601 haoshizhe cqlouis xw01 raovish6 lixiang007666 sinking8 lvzhp iff-0303 yogavicky 74587887 juampatronics eswn-chen olokevin vaibhav-23 jonathanribeiro92 cv-seg thevaliantthird darcstar-solutions-tech vonrafael standardgalactic jen4win

segtran's Issues

Why does your code do localization?

Hi! Quick question: why does your repo have logic for doing localization?

For instance:

segtran/code/train3d.py

Line 68 in 47cc13e

parser.add_argument("--locprob", dest='localization_prob', default=0.5,

segtran/code/dataloaders/datasets3d.py

Line 115 in 47cc13e

def localize(image, mask, min_output_size):

I looked at both papers and neither mentioned localization during training. Did you find that localization helped improve model performance?

chechpoint

hi
Thank you for great code
How can check points be accessed? I will be grateful if you provide me with the Google Drive link

any suggestions on how to adapt the model to work on dataset with much smaller tumors or areas of interest?

Your advice is appreciated.

how to test the model during training

Hi Dr. Lee,

Is test/validation implemented during model training? I want to test/validate the ongoing performance during training but did not figure out how to do that from the code. I did see that there are three data files( all.list, train.list and test.list), wondering how to test and report the perf info during training.

Best
Wendy

Where can we find the code that generates the prototype?

problem about data process

Baseline comparison for 2D dataset

Hi Dr. Lee, do you have the code that you used to compare with various baselines in section 5.2 (list of other models) for 2D dataset? Is there a similar comparison for the 3D dataset?

Problem about the mask of RIM dataset

In REFUGE dataset, the pixel in each channel of the mask can be like this:
channel 0 ---> {0: 243058, 255: 88718}
channel 1 ---> {0: 294881, 255: 36895}
channel 2 ---> {0: 331776}
It means the mask values can be either 1 or 255. ("key: is the pixel value, "value" is the number of pixels with that value)

BUT in RIM dataset, the pixel value in each channel of the mask can be:
channel 0 ---> {0: 281893, 138: 27, 158: 84, 98: 20, 222: 84, 250: 13, 254: 480, 242: 37, 221: 31, 3: 13, 31: 72, 114: 17, 253: 66, 255: 48551, 225: 21, 19: 21, 35: 13, 95: 61, 115: 9, 194: 26, 83: 39, 27: 29, 233: 21, 59: 49, 218: 17, 11: 23, 94: 5, 154: 10, 170: 21, 193: 13, 234: 6, 226: 4}
channel 1 ---> {0: 318085, 11: 14, 31: 42, 35: 10, 95: 46, 154: 7, 254: 248, 253: 42, 98: 10, 158: 44, 194: 14, 255: 12965, 193: 9, 222: 29, 242: 18, 221: 16, 27: 17, 59: 27, 170: 15, 233: 13, 138: 15, 218: 9, 19: 12, 83: 21, 250: 9, 225: 7, 94: 3, 114: 7, 3: 8, 234: 2, 226: 6, 115: 6}
channel 2 ---> {0: 331776}

We can see that the mask values in rim dataset can be any int between 0 and 255. My question is, how to deal with this?
For example, can we treat any value > 0 would be the target and == 0 will be the background?

about nnunet

Dear sir, we want to use the code of nnU-Net on fundus (2d images). We have some problems, could you give us some guidance, thank u!
We also notice you have mentioned' It is primarily designed for 3D tasks, but can also handle 2D images after converting them to pseudo-3D.'
Here are our settings and traceback:
The parameter are --nproc_per_node=1 --master_port=7152 /data/sementation/code/train2d.py --task fundus --ds train --split train --translayers 3 --layercompress 1,1,2,2 --net nnunet --bb resnet101 --maxiter 5000 --bs 32 --noqkbias

Traceback (most recent call last): File "<input>", line 1, in <module> File "/root/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/data/sementation/code/train2d.py", line 1434, in <module> outputs = net(image_batch) File "/data/hliu/anaconda3/install/envs/segtran-master1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/data/hliu/anaconda3/install/envs/segtran-master1/lib/python3.8/site-packages/nnunet/network_architecture/generic_UNet.py", line 400, in forward x = torch.cat((x, skips[-(u + 1)]), dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 6 but got size 5 for tensor number 1 in the list.

Own dataset

How can I train with my own dataset? I have prepared the masks and images folder (2d segmentation) as similar to fundus but I am not sure on how to generate the json file for my dataset before training and testing?

The file is damaged

The file is damaged and cannot be downloaded

error on testing

Hi askerlee,

I've tried your project with a small amount of images: 10 images.

Could you help me with this error?

Thanks!

problem about dataset

Thanks for your great project!
I downloaded REFUGE dataset but the name and number of image files are different from names in this repo! For example, training part has 360 files with names 0001.jpg, ...

Confirmation Question: Input to Model?

Hi! Quick question: what is the actual input to the network? Is the input the full 2D/3D image, or is the input smaller (perhaps random) sections of the full image?

I think the answer is the full image, but I want to confirm.

Are checkpoints of this model available?

Hi, thanks for your excellent work!

I want to re-implement this model, will you provide your trained models whose results achieve SOTA from your paper? Thus we can only use your model doing inference.

Especially model on REFUGE and BraTS, segtran is very extraordinary.

Best,

Brats iter-8000 checkpoint is not loading on test

Hi Dr Lee,

I got this error when I tried to run the test command using your recently updated Brats iter_8000 checkpoint, any advice?

python3 test3d.py --task brats --split all --bs 5 --ds 2019valid --net segtran --attractors 1024 --translayers 1 --cpdir ./ --iters 8000

Traceback (most recent call last):
File "test3d.py", line 426, in
allcls_avg_metric = test_calculate_metric(iter_nums)
File "test3d.py", line 350, in test_calculate_metric
load_model(net, args, checkpoint_path)
File "test3d.py", line 311, in load_model
if (k not in ignored_keys) and (args2.dict[k] != cp_args[k]):
KeyError: 'qk_have_bias'

The Error is on "load_model(net, args, checkpoint_path)" in the following:

for iter_num in iter_nums:
    if args.checkpoint_dir:
        checkpoint_path = os.path.join(args.checkpoint_dir, 'iter_' + str(iter_num) + '.pth')
        load_model(net, args, checkpoint_path)

dataset

hi
i download dataset from link but this names different from datas in data folder

Some questions about test-vcdr.py

Confirmation: Multiple classes possible simultaneously

Quick question regarding the following line:

https://github.com/askerlee/segtran/blob/master/code/train3d.py#L695

The tensor passed to sigmoid has shape (batch size, num classes, H, W, D). Because the sigmoid is applied element-wise, this suggests to me that the classes are not competitive. By this I mean, multiple classes can exist simultaneously. Can you confirmation that this is correct?

Thanks in advance!

Figure 2 in the paper

Thanks for your great work!
In figure2 in your paper, blue blobs and light-colored dots are used to indicate the gradients. Is there any difference between these dots of different colors?

train

Thanks for your great project! Sir,I want to train new 2D data with pretrained model,how could I do that?

Question of brats_processing.py usage

Hiiii askerlee! Thanks for your nice work and repo!!
I have a question for how to use brats_processing.py to process my brats19 or 20 dataset.
After reading the codes, I do sth like this for train and val dataset respectively:

python3 brats_processing.py h5 brats_trainset_root
python3 brats_processing.py label brats_valset_root

Is this usage right since label process seems no affects to data? I'm confused of what python3 brats_processing.py label brats_valset_root really does.

CUDA out of memory.

Hello,

I run this command with batch size as 2 on the sample data provided, and it got a "CUDA out of memory" error. Your advice is appreciated.

~/segtran/code$ python3 train3d.py --task brats --split all --bs 1 --maxiter 10000 --randscale 0.1 --net segtran --attractors 1024 --translayers 1

Traceback (most recent call last):
File "train3d.py", line 683, in
outputs = net(volume_batch)
File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/segtran/code/networks/segtran3d.py", line 541, in forward
vfeat_fused_fpn = self.out_fpn_forward(batch_base_feats, vfeat_fused)
File "/home/ubuntu/segtran/code/networks/segtran3d.py", line 428, in out_fpn_forward
align_corners=False)
File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/functional.py", line 3712, in interpolate
return torch._C._nn.upsample_trilinear3d(input, output_size, align_corners, scale_factors)
RuntimeError: CUDA out of memory. Tried to allocate 1.15 GiB (GPU 0; 15.78 GiB total capacity; 11.45 GiB already allocated; 900.00 MiB free; 13.54 GiB reserved in total by PyTorch)

Train Polyformer (source) Problem

I got the error AttributeError: 'EasyDict' object has no attribute 'use_mince_transformer' as shown. I tried to fix it but it didn't work. Please tell me how to fix it. Thank you.

Model training

Hello, I am trying to reproduce the model training with 2019 Brats datasets (LGG and HGG). I am using BS = 1 due to memory limitation.

I trained 10k+ iterations now, the loss functions look like this, is this trending in the right direction or expected?

brats dice seems low?

Dear Shaohua
while checking the train process I see the dice is around 9% after percent of total epochs.
the total training on 8 rtx 8000 nvidia takes around 25 hours estimated.
what is the cause of such low value for dice?
waiting to hear from you.
thanks

hardware

hi
Thanks for the great project
What is the hardware required to run the code?
And what is the approximate running time?

atria error

Hello, thank you for all the wonderful project. I tried to run the atria task and got the following error, any advice?

python3 train3d.py --task atria --split all --bs 2 --maxiter 10000 --randscale 0.1 --net segtran --attractors 1024 --translayers 1

Traceback (most recent call last):
File "train3d.py", line 549, in
xyz_permute=args.xyz_permute)
TypeError: init() got an unexpected keyword argument 'mask_num_classes'

patch size

I would like to try a smaller patch size (orig_patch_size) such as (80,80,80) so that I could possibly increase the batch_size to 2. It gave this error, any advice?
Traceback (most recent call last):
File "train3d.py", line 781, in
outputs = net(volume_batch)
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/segtran/code/networks/segtran3d.py", line 488, in forward
feats_dict = self.backbone.extract_features(fakeRGB_batch)
File "/home/ubuntu/segtran/code/networks/aj_i3d/aj_i3d.py", line 331, in extract_features
pooled_feat = self.avg_pool(x)
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 701, in forward
self.padding, self.ceil_mode, self.count_include_pad, self.divisor_override)
RuntimeError: input image (T: 10 H: 5 W: 5) smaller than kernel size (kT: 2 kH: 7 kW: 7)

about ModuleNotFoundError

Hi，sir. I move the project to the new computer. I meet ModuleNotFoundError: No module named 'networks.segtran2d'.It would have been nice if you had given me a hint. Thank u!

how to test on other brats dataset such as 2020?

Hi dear Askerlee,
first of all, greetin for your valuable code,
please explain how I can test this code over brats 2020 dataset,
as I replace that dataset in the brats path also setting the parameters in the train3d.py, but the .h5 files were not exist in the cases pathes.
waiting to hear from you,
thanks

Different results on zero-shot learning

Hi. Thx for the great work. I was trying to reproduce the baseline. I used REFUGE as a source domain and trained on the REFUGE train and valid data. Then I just tested this model on RIM-ONE w/o any adoption. To simplify the task I only did disk segmentation, i.e., I considered both cup and disk as the disk. In this setting, the upper bound I got on RIM-ONE (i.e., trained and tested both on RIM-ONE) is ~0.89 (DICE) and the lower bound (trained on REFUGE and tested on RIM-ONE) is only ~0.55. This in comparison with the results in the paper shows a big gap (the upper bound is lower than few-shot learning results and the lower bound is much lower than zero-shot learning results in the paper). I was wondering if you could provide more detail on data preprocessing, training details, etc.

About the index out of range

Hello! Sorry to bother u. I've just git the project down on the colab. And I upload 13 images and masks of the dataset, CVC-300, to the project. But when I run the main.py, it shows that:
Epoch : 1
3/3 [==============================] - 23s 2s/step - loss: 0.6957 - dice_coef: 0.2971 - jacard: 0.1746 - accuracy: 0.5558
1/1 [==============================] - 4s 4s/step
Traceback (most recent call last):
File "main.py", line 234, in
trainStep(model, X_train, Y_train, X_test, Y_test, epochs=150, batchSize=4)
File "main.py", line 216, in trainStep
evaluateModel(model,X_test, Y_test,batchSize)
File "main.py", line 147, in evaluateModel
plt.imshow(X_test[i])
IndexError: index 3 is out of bounds for axis 0 with size 3
I have no idea how to deal with that.
Hope you can help me with that. So sorry to bother you.🙏🙏🙏

about calling test from another file.py

Hi
Dear Shaohua

is there any part of this code to set just trained model path and new sample pathe from out file.oy then getting result by these parameters without going into args parsing?

Great project! But I encountered some problems.

When I train polyp datasets, the error occured, logs as before:

10 epochs, 1002 itertations each.
  0%|                                          | 0/10 [00:00<?, ?it/s]

Image scales: 8x8. Voxels: [1, 1600, 1792]
outputs shape: torch.Size([1, 2, 320, 320])
mask_batch shape: torch.Size([1, 3, 320, 320])
  0%|                                          | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train2d.py", line 1100, in <module>
    mask_batch.permute([0, 2, 3, 1]))
  File "/root/anaconda3/envs/python377/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/python377/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 617, in forward
    reduction=self.reduction)
  File "/root/anaconda3/envs/python377/lib/python3.7/site-packages/torch/nn/functional.py", line 2433, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([1, 320, 320, 3])) must be the same as input size (torch.Size([1, 320, 320, 2]))

Could you tell me how to solve this problem? thank u!

data problem on Polymorphic Transformers

Hi~please help me figure out some questions.

i found that " python3 train2d.py --task refuge --ds train,valid,test --split all --maxiter 10000 --net unet-scratch "
should be " --task fundus", otherwise will report errors.
the data for polyp downloaded from https://github.com/DengPingFan/PraNet (search for "testing data") is not complete. some image is missing, i guess it should include in their training data. But for training data, they have several datasets mixed together into two folders(image and mask)..so should i manually select image to our folders? Could you have a look?

many thanks in advance

test3d.py failure

Hi Dr Lee,

I tried to run test3d.py for a set of 110 test images a few times, but it somehow always exit unexpectedly around 70-ish images without any error. So far, I have never successfully finished the 110 images.

Any advice?

Best,
Wendy

Some questions about 3Ddataset

What's the meaning of BraTS19_CBICA_ASK_1_t1ce.nii?
What's the meaning of BraTS19_CBICA_ASK_1_t2.nii?
and what's the difference between BraTS19_CBICA_ASK_1_t1ce.nii and BraTS19_CBICA_ASK_1_t1.nii

test.py checking

Hi again.
I also tested the multi gpu train.py form and worked fine till now in training,
however I need to use trained model for testing on new samples.
did you checked test.py file?
Error shown as below
TypeError: init() got an unexpected keyword argument 'mask_num_classes'
there are some another errors like undefined name visualize_model, evalrobustness, testloader, ...
please check test.py
waiting to hear from you.
thanks

test result dice : 0.000

thanks for sharing this great project , and i met some problems when use it for LUNA16 dataset segmentation .the training loss is low ,but the test reaults are :dice: 0.000, jc: 0.000, hd: nan, asd: nan .can you give me some advice ?thanks.

checkpoint detail

Dear author：
REFUGE20的checkpoint使用的是什么预训练的CNN backbone呢？

Great project! But I encountered some problems about test

When I try test2d.py, the error occured:
python3 test2d.py --task fundus --split all --ds valid2 --net segtran --bb resnet101 --translayers 3 --layercompress 1,1,2,2 --cpdir ../model/segtran-fundus-train,valid,test,drishti,rim-05011826 --iters 9500 --outorigsize 'fundus' mean/std loaded from 'fundus-cropped-gray0.5-stats.json' 'all' 400 samples of size 576 chosen (total 800) in '../data/fundus/valid2' 'args' orig in-feat: 2048, in-feat: 2048, out-feat: 512, in-scheme: AN, out-scheme: AN, translayer_dims: [2048, 2048, 1024, 512] Namespace(ablate_multihead=False, attn_clip=500, backbone_type='resnet101', batch_size=8, bb_feat_upsize=True, binarize=False, calc_flop=False, checkpoint_dir='../model/segtran-fundus-train,valid,test,drishti,rim-05011826', debug=False, device='cuda', do_remove_frag=False, ds_class='SegCrop', ds_name='valid2', ds_split='all', eval_robustness=False, gpu='0', gray_alpha=0.5, has_FFN_in_squeeze=False, in_fpn_layers='34', in_fpn_scheme='AN', in_fpn_use_bn=False, iters='9500', job_name='fundus-valid2', mean=[0.578, 0.429, 0.318], mid_type='shared', mince_channel_props=None, mince_scales=None, net='segtran', num_attractors=256, num_classes=3, num_modalities=0, num_modes=4, num_translayers=3, num_workers=4, orig_input_size=(576, 576), out_fpn_layers='1234', out_fpn_scheme='AN', out_origsize=True, output_upscale=2.0, patch_size=(288, 288), polyformer_mode=None, pos_bias_radius=7, pos_code_type='lsinu', pos_code_weight=1.0, qk_have_bias=True, reload_mask=False, reshape_mask_type=None, robust_aug_degrees=[0.5, 1.5], robust_aug_types=None, robust_ref_cp_path=None, robust_sample_num=120, robustness_augs=None, sample_num=-1, save_ext='png', save_features_img_count=0, save_results=True, std=[0.184, 0.162, 0.144], task_name='fundus', test_interp=None, tie_qk_scheme='none', trans_output_type='private', translayer_compress_ratios=[1.0, 1.0, 2.0, 2.0], use_exclusive_masks=False, use_global_bias=False, use_mince_transformer=False, use_pretrained=True, use_squeezed_transformer=True, verbose_output=False, vis_layers=None, vis_mode=None) Segtran Fusion Encoder with 3 trans-layers Learnable Sinusoidal positional encoding Fusion0-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion0-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion0-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion0-squeeze-out in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion1-in-squeeze in_feat_dim: 2048, feat_dim: 2048, qk_have_bias: True Fusion1-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion1-squeeze-out in_feat_dim: 2048, feat_dim: 1024, qk_have_bias: True Fusion2-in-squeeze: v_has_bias: False, has_FFN: False, has_input_skip: False Fusion2-in-squeeze in_feat_dim: 1024, feat_dim: 1024, qk_have_bias: True Fusion2-squeeze-out: v_has_bias: False, has_FFN: True, has_input_skip: False Fusion2-squeeze-out in_feat_dim: 1024, feat_dim: 512, qk_have_bias: True Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /root/.cache/torch/hub/checkpoints/resnet101-5d3b4d8f.pth **resnet101 created** Parameter Count: 172737073 **args[backbone_type]=resnet101, checkpoint args[backbone_type]=eff-b4, inconsistent!**
I find in test2d：“parser.add_argument('--bb', dest='backbone_type', type=str, default='eff-b4', help='Segtran backbone'”
then find in resnet.py，when test2d.py is running，it downloads from the web101. But it has this error, I don't know how to solve this,Could you tell me how to solve this problem? thank u!

Settings of 'Extension of nnU-Net'

Could you provide the training parameters (max-iteration, batch-size, etc.) of 'Extension of nnU-Net' (shown in Table 5.)?

inference process

Can you explain your inference process? is it patch by patch then merge them together or on full image once?

Sensitivity / fp_per_case

Dr Lee, do you have or implemented metrics/analysis on how sensitivity changes with different fp_per_case? - Thanks

about multi GPU in tran.py

Nice job! but i want to use multi for training , Can you tell me which parameter can be modified to use multi GPU? THX!

TRANSUNET SAYS KEY ERROR

I am trying to use the transunet for as the --net argument. However it throws an error as depited in the image.The error reads as :"Traceback (most recent call last):
File "/content/segtran/code/train2d.py", line 976, in
transunet_config = TransUNet_CONFIGS[args.backbone_type]
KeyError: 'eff-b4'".

Even if I change the backbone to eff-b1 or resnet101, a new key error is raised for those backbones.Please help me out