Code Monkey home page Code Monkey logo

kaggle_ndsb2's Introduction

This is the source code for the 3rd place solution to the Second National Data Science Bowl hosted by Kaggle.com. For documenation about the approach look here

Dependencies & data

I used the anaconda default distribution with all the libraries that came with it. Next to this I used opencv(cv2), pydicom and MxNet (20151228 but later version will most probably be fine). For more detailed windows 64 installation instructions look here.

The dicom data needs to be downloaded from Kaggle and must be extracted in the data_kaggle/train /validate and /test folders.

Adjust settings

In the settings.py you can adjust some parameters. The most important one is the special "quick mode". This makes training the model 5x faster at the expense of some datascience rigor. Instead of training different folds to calibrate upon to prevent overfitting we train only one fold. This overfits a bit in step 3 and 4 but still results in a solid 0.0105 score which is enough for a 3rd place on the LB. Not choosing quick mode takes much longer to train but will result in less overfit and gives 0.0101 on the LB. Which is almost 2nd place and maybe with some luck it is.

Run the solution

  1. python step0_preprocess.py
    As a result the /data_preprocessed_images folder will contain ~329.000 preprocessed images and some extra csv files will be generated in the root folder.
  2. python step1_train_segmenter.py
    As a result you will have (a) trained model(s) in the root folder. Depending on the fold RMSE should be around 0.049 (train) and 0.052 (validate).
  3. python step2_predict_volumes.py
    As a result you will have a csv containing raw predictions for all 1140 patients. Also the data_patient_predictions will contain all generated overlays and csv data per patient for debugging. In the logs the average error in ml should be around 10ml.
  4. python step3_calibrate.py
    As a result you will have a csv file containing all the calibrated predictions. In the logs the average error in ML should go down with +/- 1ml.
  5. python step4_submission.py
    As a result the /data_submission_files folder will contain a submission file. In the logs the crps should be around 0.010.

Hardware

The solution should be gentle on the GPU because of the small batchsize. Any recent GPU supported by MxNet should do the job I figure. The lowest card I tried (and that worked) was a GT740.

kaggle_ndsb2's People

Contributors

juliandewit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaggle_ndsb2's Issues

Keras Implementation

Hi,

Do you know if there is a keras implementation for the same and if there is something which is missing there prompting you to use mxnet?

Rahul.V.

determining the size of segmentation

I was Going through your Code ..
How do we

  1. Determine the Area Covered By Segmentation in the Complete image in Terms of Percentage
  2. Determine the size of segmentation which are detected...

Your Suggestion shall be highly appreciated.

step2_predict_volumes: errors when iterating an ndarray

Dear Julian,

thank you for your valuable work and your practical code.

I was running "step2_predict_volumes.py". in line 130:
predictions = pred_model.predict(pred_iter)

the code goes to models.py from mxnet .egg file. and it raises an error which says:
"d:\chhong\mxnet\include\mxnet./ndarray.h:217: Check failed: (shape_[0]) >= (end) Slice end index out of range"

I read about this problems but I couldn't fix it.

Would you please help me?

Several questions regarding your solution

Hi Julian,

Thanks for sharing the code. I have several questions after reading your solution document.

According to U-net paper, the output map is of size (row, column, 2), i.e., it has two feature channels. But looks like you only use 1 channel. Is that right? Would you like to explain more on this?

You once mentioned that "Segmentation nets are numerically unstable", would you like to elaborate more for this point? Are there any references discussing this?

You mentioned that "Note that I used relu activations and batch normalization after every convolution". With respect to "batch normalization" here, do you mean you will add
a normalization layer after convolution layer? If I remember correctly, I once heard that "normalization layer" may not be needed if we use batch normalization in the optimization method. In specific, I am not very clear what do you mean "use batch normalization after every convolution".

How many epochs do you use for training?

Thanks for the help.

Dropout placement

Hey, thanks for the nice code and blog post!

I don't know if it's important, though it confused me a bit. In the blog post the first dropout layer was placed after all downsampling layers, i.e. after pool5, but in the code I see that it was placed after pool4. I think it would be more consistent approach to put it after pool5 as in the post.

...
pool4 = convolution_module(net, kernel_size, pad_size, filter_count=filter_count * 4, down_pool=True)
net = pool4
net = mx.sym.Dropout(net)
pool5 = convolution_module(net, kernel_size, pad_size, filter_count=filter_count * 8, down_pool=True)
net = pool5
net = convolution_module(net, kernel_size, pad_size, filter_count=filter_count * 4, up_pool=True)
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.