Code Monkey home page Code Monkey logo

luna16_multi_size_3dcnn's Introduction

Attention

DON'T STAR THIS REPO ANYMORE,IT'S A BAD IMPLEMENT

luna16_multi_size_3dcnn

An implement of paper "Multi-level Contextual 3D CNNs for False Positive Reduction in Pulmonary Nodule Detection"

The detail about the paper can be found luna16 3DCNN

0 required

  • numpy

  • PIL(or Pillow)

  • numpy

  • SimpleITK

  • pandas

  • matplotlib

  • Tensorflow >1.3

1 data

Or you can download from official website

1.1 data overview

The original data from luna16 are consist of below:

  • subset0.zip to subset9.zip: 10 zip files which contain all CT images
  • annotations.csv: csv file that contains the annotations used as reference standard for the 'nodule detection' track
  • sampleSubmission.csv: an example of a submission file in the correct format
  • candidates_V2.csv: csv file that contains the candidate locations for the ‘false positive reduction’ track

As you can know ,the positive sample data (annotations.csv) and the false sample data(candidates_V2.csv) are already annotated .What we need to do is just extracting them from medical format(sth like CT) to images.There is no need to worry about positive/negative data.

annotations.csv

seriesuid coordX coordY coordZ diameter_mm
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 -128.6994211 -175.3192718 -298.3875064 5.651470635
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 103.7836509 -211.9251487 -227.12125 4.224708481
1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793540579077826395208 69.63901724 -140.9445859 876.3744957 5.786347814
1.3.6.1.4.1.14519.5.2.1.6279.6001.100621383016233746780170740405 -24.0138242 192.1024053 -391.0812764 8.143261683

unit of coordX,coordY,coordZ,diameter_mm are mm and there are 1187 lines in this csv file.

candidates.csv

seriesuid coordX coordY coordZ class
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 68.42 -74.48 -288.7 0
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 -95.20936148 -91.80940617 -377.4263503 0
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 -24.76675476 -120.3792939 -273.3615387 0
1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860 -63.08 -65.74 -344.24 0

Value of class column means postive(1) or negative(0). There are 754976 lines in this csv file.

The positive/negative sample ratio is 1187 vs 754976 ,nearly 1:636. Data enhancement is essential.

1.2 how to prepare data

We have center coordinates and diameter of every true positive nodule and huge number of false positive candidates(center coodinates without diameter),it's rather clear what we need to do,just extracting them out with multiscale method.

The paper imply that scale below are appropriate

  • $20\times 20\times 6$
  • $30\times 30\times 10$
  • $40\times 40\times 26$

As positive are annotated with diameters while negative not,we are using a simple and rude method to extract cubes on every nodule(both for real and fake ones).

There is a better way preparing positive sample .An idea borrowed from objection and location such as SSD or FasterRCNN is bounding box generation.We can generate cubes sliding whole 3D CT space and keep cubes whose IOU are greater than a threshold like 0.7 in FasterRCNN as positive samples . This idea comes from a teacher from Shanghai Jiaotong University.I'll implement soon.

1.3 Data enhancement

  • image flip: currently only a image flip with 90,180,270 degree was done for positive samples according to the paper.
  • data normalization: all radiation density are truncated in range -1000 to 400 and normalized into 0 to 1.

2 process step

First run data_prepare.py to extract cubic(both real nodule and fake ones) from raw CT files. This may take hours and the output of this step is

  • cubic_npy
  • cubic_normalization_npy
  • cubic_normalization_test

the total size of those file is around 100GB and take one night in my PC(16GB RAM,i5),please leave enough disk. There will be some ValueError like:

<class 'Exception'> : could not broadcast input array from shape (40,40,25) into shape (40,40,26)
  File "H:/workspace/luna16_multi_size_3dcnn/data_prepare.py", line 142, in extract_fake_cubic_from_mhd
    int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,25) into shape (40,40,26)
Traceback (most recent call last):

It's ok to go cause not all false positive candidates are need,reading the csv files and you'll know false positive data are much more than positive data.

Then run main.py to train model,inference step will be ran as follow,this step is rather slow cause of huge number of data.

luna16_multi_size_3dcnn's People

Contributors

shartoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

luna16_multi_size_3dcnn's Issues

程序运行

您好!运行完,出现ValueError: Cannot feed value of shape (0,) for Tensor 'Placeholder_10:0', which has shape '(?, 6, 20, 20)',请问该怎么弄

训练过程中准确率一直没有变化

在每个ct图像上我取了100个反例,跑了1000个epoch,准确率一直是0.593750,loss也一直基本没有什么变化。请问是什么问题呢?您试验的效果怎么样

IndexError: list index out of range程序运行出错

image

Traceback (most recent call last):
File "/home/ch/luna_1/luna16_multi_size_3dcnn/data_prepare.py", line 267, in get_test_batch
arr = np.load(npy)
File "/root/Anaconda/lib/python3.6/site-packages/numpy/lib/npyio.py", line 370, in load
fid = open(file, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 16, in
model.inference(normalazation_output_path,test_path,0,True)
File "/home/ch/luna_1/luna16_multi_size_3dcnn/model.py", line 178, in inference
test_batch,test_label = get_test_batch(test_path)
File "/home/ch/luna_1/luna16_multi_size_3dcnn/data_prepare.py", line 278, in get_test_batch
batch_array.append(batch_array[-1]) # some nodule process error leading nonexistent of the file, using the last file copy to fill
不知道什么原因,望解答!

About the paper implementation

Hi authors, thanks for the wonderful implementation. Could you please explain a little bit about Why this implementation is bad? Also, do you have any reliable implementation for the published paper? Thank you so much!

new_model.py

new_model.py和model.py的区别是什么,是不是运行new_model.py就是训练模型了,就不用再运行model.py了。谢谢!

question about training

It seems that as well as in paper, this implementation supposes that the nodule is in the center of the cropped cube. No shifting the center is provided. But in real life we really do not know if nodule is in center of the crop that we make from the scan, correct me if I'm wrong, but this is not directly useful for diagnosing.

关于训练loss值问题

不好意思打扰一下,想问下作者之前说的loss值不收敛的问题是否仍然存在,现在这个项目的效果如何,可以用吗?@shartoo

Generating Data Size

How large the data will be generated? I run the code, it generates a huge number of data larger than 100G. However, it still running. So, can you tell me how larger the data will be generated? @shartoo

test

你好,用python main.py对模型进行训练,怎样进行测试呀(我想用数据集的十分之一进行测试,用什么代码进行测试呀),谢谢!

请问现在还可以对data_prepare.py提问题吗?

作者您好,我在学习data_prepare.py这部分代码,我对ndarray及其transpose部分有问题,还有truncate_hu,normalazation这俩函数有疑问。希望您有时间的话可以帮我解答一下。感谢。

test_path无法传入

您好,请问一下date_prepare中get_test_batch部分for npy in files: try: arr = np.load(npy)
这里的npy没有申明 应该是指代测试路径 此部分打印出来是 file not exists! E,应该如何解决?非常感谢!

running data_prepare.py occured ValueError

process images 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492.mhd error...
(<type 'exceptions.Exception'>, ':', ValueError('could not broadcast input array from shape (40,40,25) into shape (40,40,26)',))
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,25) into shape (40,40,26)
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,0) into shape (40,40,26)
process images 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492.mhd error...
(<type 'exceptions.Exception'>, ':', ValueError('could not broadcast input array from shape (40,40,0) into shape (40,40,26)',))
Traceback (most recent call last):
File "/home/.../luna16_multi_size_3dcnn-master/data_prepare.py", line 145, in extract_fake_cubic_from_mhd
int(v_center[2] - 13):int(v_center[2] + 13)]
ValueError: could not broadcast input array from shape (40,40,22) into shape (40,40,26)

How to get the LUNA16 dataset?

I'm not join the LUNA16 challenge, Could I get the LUNA16 dataset? and how?
If you see this, tell me the answer please. Thank you!

Unable to download dataset from Baidu Network

Unable to download dataset from Baidu Network. Could you please upload the datasets in somewhere else like google drive/Dropbox or any other website which can be accessed from the US.
I tried to download from official LUNA16 website, but the download is getting failed/showing error.

两点请求,模型&指标

##@shartoo 你好,看到大家之前提的问题,我有两个请求,希望你可以考虑:
###1、可否给出一个训过的模型,效果不太好或者只有一个尺度的也没关系;
###2、是否可以公布部分数据来看一下模型的表现效果(如lfz/DeepLung原始的csv的结果是84左右,分类之后的FROC是多少);
##不管是否方便答复和采纳,都感谢你开源的工程与论文!

论文结果复现

up主您好,请问之前您复现过该论文的结果吗,运行速度如何?烦请回复,谢谢

FileNotFoundError

我在运行data_prepare.py时遇到这个问题:
FileNotFoundError:File b' d:/data/luna/CSVFILLES/annotation.csv' dose not exist
后来我自己在相应路径下建了了个文件夹也不可以,请问这是为什么呢?
作者在主页提供的那个数据集是对应代码的数据集吗?Imagenet32_train.zip

loss不收敛

我跑的loss不收敛呢,而且准确率一直为0,麻烦作者帮忙解答一下

ValueError: setting an array element with a sequence.

这是个很棒的项目,处理数据的部分没有太大问题就跑出来了,但是在训练过程中出现了一下问题:
Traceback (most recent call last):
File "main.py", line 16, in
model.inference(normalazation_output_path,test_path,0,True)
File "/home/wangqiuli/Documents/luna16_multi_size_3dcnn/model.py", line 145, in inference
_,summary = sess.run([train_step, merged],feed_dict =feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1104, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py", line 492, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
请问作者能否帮忙解答一下

Error

When I run the data_prepare, it said that No such file or directory: '/.../cubic_npy'
Can you help me with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.