shartoo / luna16_multi_size_3dcnn Goto Github PK

View Code? Open in Web Editor NEW

110.0 5.0 45.0 41 KB

An implement of paper "Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection"

Python 100.00%

luna16_multi_size_3dcnn's Introduction

Attention

DON'T STAR THIS REPO ANYMORE,IT'S A BAD IMPLEMENT

luna16_multi_size_3dcnn

An implement of paper "Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection"

The detail about the paper can be found luna16 3DCNN

0 required

numpy
PIL(or Pillow)
numpy
SimpleITK
pandas
matplotlib
Tensorflow >1.3

1 data

Or you can download from official website

1.1 data overview

The original data from luna16 are consist of below:

subset0.zip to subset9.zip: 10 zip files which contain all CT images
annotations.csv: csv file that contains the annotations used as reference standard for the 'nodule detection' track
sampleSubmission.csv: an example of a submission file in the correct format
candidates_V2.csv: csv file that contains the candidate locations for the ‘false positive reduction’ track

As you can know ,the positive sample data (annotations.csv) and the false sample data(candidates_V2.csv) are already annotated .What we need to do is just extracting them from medical format(sth like CT) to images.There is no need to worry about positive/negative data.

annotations.csv

seriesuid	coordX	coordY	coordZ	diameter_mm
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	-128.6994211	-175.3192718	-298.3875064	5.651470635
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	103.7836509	-211.9251487	-227.12125	4.224708481
1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793540579077826395208	69.63901724	-140.9445859	876.3744957	5.786347814
1.3.6.1.4.1.14519.5.2.1.6279.6001.100621383016233746780170740405	-24.0138242	192.1024053	-391.0812764	8.143261683

unit of coordX,coordY,coordZ,diameter_mm are mm and there are 1187 lines in this csv file.

candidates.csv

seriesuid	coordX	coordY	coordZ
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	68.42	-74.48	-288.7
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	-95.20936148	-91.80940617	-377.4263503
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	-24.76675476	-120.3792939	-273.3615387
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860	-63.08	-65.74	-344.24

Value of class column means postive(1) or negative(0). There are 754976 lines in this csv file.

The positive/negative sample ratio is 1187 vs 754976 ,nearly 1:636. Data enhancement is essential.

1.2 how to prepare data

We have center coordinates and diameter of every true positive nodule and huge number of false positive candidates(center coodinates without diameter),it's rather clear what we need to do,just extracting them out with multiscale method.

The paper imply that scale below are appropriate

$20\times 20\times 6$
$30\times 30\times 10$
$40\times 40\times 26$

As positive are annotated with diameters while negative not,we are using a simple and rude method to extract cubes on every nodule(both for real and fake ones).

There is a better way preparing positive sample .An idea borrowed from objection and location such as SSD or FasterRCNN is bounding box generation.We can generate cubes sliding whole 3D CT space and keep cubes whose IOU are greater than a threshold like 0.7 in FasterRCNN as positive samples . This idea comes from a teacher from Shanghai Jiaotong University.I'll implement soon.

1.3 Data enhancement

image flip: currently only a image flip with 90,180,270 degree was done for positive samples according to the paper.
data normalization: all radiation density are truncated in range -1000 to 400 and normalized into 0 to 1.

2 process step

First run data_prepare.py to extract cubic(both real nodule and fake ones) from raw CT files. This may take hours and the output of this step is

cubic_npy
cubic_normalization_npy
cubic_normalization_test

the total size of those file is around 100GB and take one night in my PC(16GB RAM,i5),please leave enough disk. There will be some ValueError like:

<class 'Exception'> : could not broadcast input array from shape (40,40,25) into shape (40,40,26)
  File "H:/workspace/luna16_multi_size_3dcnn/data_prepare.py", line 142, in extract_fake_cubic_from_mhd
    int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,25) into shape (40,40,26)
Traceback (most recent call last):

It's ok to go cause not all false positive candidates are need,reading the csv files and you'll know false positive data are much more than positive data.

Then run main.py to train model,inference step will be ran as follow,this step is rather slow cause of huge number of data.

luna16_multi_size_3dcnn's People

Contributors

Stargazers

Watchers

Forkers

94mia orientier7 unyqhz cyxuanwater mrjude liu3xing3long jameskry shubhampachori12110095 zionchen kristin-zlchen pandolph nick917 lavonne1213 qiuliwang github2016hdu zp678 weihaoxie jieshaoxxiansen zoonono wangyout stducc changguoji fortisaqua shuangte wangyage09 jiabintan zdstandup monjoybme dyerlee minghaobao lihaossu junxyu wxiaoman dgarden3388 briantmali zhourixin alikhubaib chris1992212 chencara palmland zy20030535 itsanewday samiulshuvo fitushar

luna16_multi_size_3dcnn's Issues

训练时间问题

请问您当初训练这个网络，用了多长时间呢

程序运行

您好！运行完，出现ValueError: Cannot feed value of shape (0,) for Tensor 'Placeholder_10:0', which has shape '(?, 6, 20, 20)'，请问该怎么弄

训练过程中准确率一直没有变化

在每个ct图像上我取了100个反例，跑了1000个epoch，准确率一直是0.593750，loss也一直基本没有什么变化。请问是什么问题呢？您试验的效果怎么样

IndexError: list index out of range程序运行出错

Traceback (most recent call last):
File "/home/ch/luna_1/luna16_multi_size_3dcnn/data_prepare.py", line 267, in get_test_batch
arr = np.load(npy)
File "/root/Anaconda/lib/python3.6/site-packages/numpy/lib/npyio.py", line 370, in load
fid = open(file, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 16, in
model.inference(normalazation_output_path,test_path,0,True)
File "/home/ch/luna_1/luna16_multi_size_3dcnn/model.py", line 178, in inference
test_batch,test_label = get_test_batch(test_path)
File "/home/ch/luna_1/luna16_multi_size_3dcnn/data_prepare.py", line 278, in get_test_batch
batch_array.append(batch_array[-1]) # some nodule process error leading nonexistent of the file, using the last file copy to fill
不知道什么原因，望解答！

About the paper implementation

Hi authors, thanks for the wonderful implementation. Could you please explain a little bit about Why this implementation is bad? Also, do you have any reliable implementation for the published paper? Thank you so much!

遇到InvalidArgumentError

您好，在运行代码main.py时，出现如下错误：
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholderdtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]]
不知道作者有没有遇见过这个问题？

new_model.py

new_model.py和model.py的区别是什么，是不是运行new_model.py就是训练模型了，就不用再运行model.py了。谢谢！

question about training

It seems that as well as in paper, this implementation supposes that the nodule is in the center of the cropped cube. No shifting the center is provided. But in real life we really do not know if nodule is in center of the crop that we make from the scan, correct me if I'm wrong, but this is not directly useful for diagnosing.

关于训练loss值问题

不好意思打扰一下，想问下作者之前说的loss值不收敛的问题是否仍然存在，现在这个项目的效果如何，可以用吗？@shartoo

Generating Data Size

How large the data will be generated? I run the code, it generates a huge number of data larger than 100G. However, it still running. So, can you tell me how larger the data will be generated? @shartoo

test

你好，用python main.py对模型进行训练，怎样进行测试呀（我想用数据集的十分之一进行测试，用什么代码进行测试呀），谢谢！

请问现在还可以对data_prepare.py提问题吗？

作者您好，我在学习data_prepare.py这部分代码，我对ndarray及其transpose部分有问题，还有truncate_hu，normalazation这俩函数有疑问。希望您有时间的话可以帮我解答一下。感谢。

test_path无法传入

您好，请问一下date_prepare中get_test_batch部分for npy in files: try: arr = np.load(npy)
这里的npy没有申明应该是指代测试路径此部分打印出来是 file not exists! E，应该如何解决？非常感谢！

您好，为什么第一个尺寸的模型可以跑通，其他两个尺寸的模型跑不通呀？错误如下：“InvalidArgumentError (see above for traceback): logits and labels must be broadcastable: logits_size=[128,2] labels_size=[32,2] [[node softmax_cross_entropy_with_logits_sg (defined at H:/自己的软件/代码文献/1/luna16_multi_size_3dcnn-master/new_model.py:267)。” 万分感谢，谢谢。

running data_prepare.py occured ValueError

process images 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492.mhd error...
(<type 'exceptions.Exception'>, ':', ValueError('could not broadcast input array from shape (40,40,25) into shape (40,40,26)',))
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,25) into shape (40,40,26)
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,0) into shape (40,40,26)
process images 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492.mhd error...
(<type 'exceptions.Exception'>, ':', ValueError('could not broadcast input array from shape (40,40,0) into shape (40,40,26)',))
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,22) into shape (40,40,26)

How to get the LUNA16 dataset?

I'm not join the LUNA16 challenge, Could I get the LUNA16 dataset? and how?
If you see this, tell me the answer please. Thank you!

读取图片的顺序是（z，y，x）还是（z，x，y）

我可视化图像并且圈出肺结节的时候，好像对应的应该是（z，x，y），但是data_prepare.py中transpose(2,1,0)

Unable to download dataset from Baidu Network

Unable to download dataset from Baidu Network. Could you please upload the datasets in somewhere else like google drive/Dropbox or any other website which can be accessed from the US.
I tried to download from official LUNA16 website, but the download is getting failed/showing error.

两点请求，模型&指标

##@shartoo 你好，看到大家之前提的问题，我有两个请求，希望你可以考虑：
###1、可否给出一个训过的模型，效果不太好或者只有一个尺度的也没关系；
###2、是否可以公布部分数据来看一下模型的表现效果（如lfz/DeepLung原始的csv的结果是84左右，分类之后的FROC是多少）；
##不管是否方便答复和采纳，都感谢你开源的工程与论文！

关于archi-1,archi-2,archi-3模型融合的问题

请问如果训练完不同尺度的模型，如何进一步将三个模型结果融合起来？

论文结果复现

up主您好，请问之前您复现过该论文的结果吗，运行速度如何？烦请回复，谢谢

FileNotFoundError

我在运行data_prepare.py时遇到这个问题：
FileNotFoundError：File b' d:/data/luna/CSVFILLES/annotation.csv' dose not exist
后来我自己在相应路径下建了了个文件夹也不可以，请问这是为什么呢？
作者在主页提供的那个数据集是对应代码的数据集吗？Imagenet32_train.zip

loss不收敛

我跑的loss不收敛呢，而且准确率一直为0，麻烦作者帮忙解答一下

代码可以跑通吗？

你好，我看到了有main_new文件，代码应该是可以跑通的吧？

ValueError: setting an array element with a sequence.

这是个很棒的项目，处理数据的部分没有太大问题就跑出来了，但是在训练过程中出现了一下问题：
Traceback (most recent call last):
File "main.py", line 16, in
model.inference(normalazation_output_path,test_path,0,True)
File "/home/wangqiuli/Documents/luna16_multi_size_3dcnn/model.py", line 145, in inference
_,summary = sess.run([train_step, merged],feed_dict =feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1104, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
请问作者能否帮忙解答一下

Error

When I run the data_prepare, it said that No such file or directory: '/.../cubic_npy'
Can you help me with this?