Code Monkey home page Code Monkey logo

icu_hypotheses's Introduction

ICU Hypothesis Project

This is the repository for generating and testing complex hypotheses using the RETAIN model (Choi et al. 2017) and a custom attention-based LSTM generator.

Table of Contents

  1. Requirements
    1. Package requirements
    2. Hardware requirement
  2. Run code
    1. Data
    2. RETAIN Predictor
    3. LSTM Generator
  3. Completed Runs
  4. Contributors

Requirements

Package requirements

  • Python 3.x
  • Tensorflow 2.x
  • Keras 2.x
  • Scikit-learn
  • NumPy
  • Pandas
  • Matplotlib

Hardware requirement

The code was run using tigergpu. Training RETAIN locally is not recommended. Training the generator is feasible on the local machine.

Run code

Data

You can use the MIMIC-III dataset after getting approval and going through training. This must be parsed by running the process_mimic_modified.py file in the retain-keras folder. Running this file is explained in the commented header of the file itself.

In the mean time, you could use the fake dataset. There are *interact*.py files in the fake data folder.

python generate_fake_interact1.py /output/directory/foldername NUM TIME PROP

where NUM is the number of samples to make with maximum number of visits being TIME. PROP is the training proportion.

RETAIN Predictor

The RETAIN model used is an adaptation of the reimplementation of RETAIN on Keras. To see more detailed description of each field, you can refer to that repo. Below are the commands used for the Independent Work Project specifically. The relevant files can be found in the retain-keras folder

  1. Train:

python retain_train.py --num_codes=N --epochs=M --path_data_train=/path/to/train/data.pkl --path_data_test=/path/to/test/data.pkl --path_target_train=/path/to/train/target.pkl --path_target_test=/path/to/test/target.pkl --directory=/output/directory

where N is the number of codes (medical or fake) in the dataset, M is the number of epochs to train. There are other fields that could be specified. This is

  1. Evaluate

First, check the directory of the training and look at output.txt, which has all the losses per Epoch. Use the best model, which has weights stored in weight-xx.h5.

python retain_evaluation.py --path_model=/path/to/model/weight-xx.h5 --path_data=/path/to/test/data.pkl --path_target=/path/to/test/target.pkl

  1. Interpret

For now, interpretation will give you mortality probability, and visit and feature importance weights by patient/sample. Make sure you have a dictionary.pkl that maps each unique code to what it actually is in a string type. For instance, medical code 0 could be 'low medicine 1 dosage'

python retain_interpretation.py --path_model=/path/to/model/weight-xx.h5 --path_data=/path/to/test/data.pkl --path_dictionary=dictionary.pkl

LSTM Generator

These are the commands used to run the generator code. The files can be found in the generator folder here

  1. Train The code takes data and converts them into sequences of codes. A sentinel code is used for "nothing happening" status. This is padded in the front of the list in the case when there are not enough codes in one patient data (< maxlen).

python code_generator.py --num_codes=N --epochs=M --emb_size=P --maxlen=Q --path_data_train=/path/to/train/data.pkl --path_target_train=/path/to/train/target.pkl --directory=/output/directory --simple=True/False

Maxlen Q is the maximum length of an input sequence. For the study, we used Q=15, embedding size of M=10. Simple should be set to --simple=True if using interact3. If using interact2, use --simple=False (default).

  1. Generate Now, we take the best trained model and generate data. The folder has 2 files. code_generator_evaluation.py and code_generator_evaluation_med1.py. The two have the same file inputs. The latter is data generated when patients receive 2 dosages of med1. The former is receiving 2 dosage of med2, or 1 of med1 and 1 of med2 (mini-experiment).

python code_generator_evaluation.py --path_model=/path/to/model/weight-xx.h5 --directory=/output/of/data/folder --maxlen=Q --num_generate=N --max_visits=M

where Q is the same maxlen as before during training. N is the number of samples to make with up to M codes predicted per patient (less if patient expired before M codes).

Completed Runs

The RETAIN evaluations of various experiments have been completed and stored in the images folder. example_attempt2,example_mod1, and example_modrev are the RETAIN evaluations on MIMIC-III data using Bidirectional, Forward, and Reverse of input. fake_interactx has the run for RETAIN on interactx. The weight folders are the runs of RETAIN with specific weighting (you shouldn't need to worry about this). Lastly, the images/hypothesize_true3 folder has the runs of using generated data (with different number of samples used to train the generator) with interact3.

Contributors

  • Daniel Chae (Princeton Class of 2020), advised by Michael Guerzhoy

icu_hypotheses's People

Contributors

dchaebae avatar

Watchers

James Cloos avatar paper2code - bot avatar

Forkers

morgannewellsun

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.