Code Monkey home page Code Monkey logo

premier_analysis's Introduction

Premier Analysis

Sean Browning, Scott Lee, Karen Wong

Deep models using EHR data.

Data

The required data is contained in a private submodule which can be recursively cloned when you pull down the repository (if you have access):

git clone --recursive [email protected]:cdcai/premier_analysis.git

Alternatively, if you've already cloned the repository and didn't pull the submodule, you can update it from the project root:

git submodule update --init --recursive

Data/Model flow

1. python/feature_extraction.py

The main pre-processing for all EHR data.

Input: Several EHR views in parquet format

Output:

  • A pandas data frame with separate columns of tokenized features by source table aggregated to either days, hours, or seconds from index date.
  • A feature dictionary containing the abbreviated feature names and their descriptions.

Steps:

  • Join all visit-level data across EHR views
  • Discretize continuous values by quantile (labs, vitals)
  • Abbreviate categorical values to tokens (billing data, qualitative lab result)
  • Aggregate all features to day, hour, or seconds from medical record index date

2. python/features_to_integers.py

Additional aggregation and conversion of dataframe of string tokens by time step to integer coded nested list.

Input: Pandas dataframe from feature_extraction.py

Output:

  • Nested list-of-lists with all features encoded into integer representation
  • Dictionary of tokens and their encoded values
  • dictionary of visits and their LOS and outcome indicators

Steps:

  • Join all feature columns together into single string aggregated by timestep in visit
  • Use sklearn CountVectorizer to encode tokens to integer representation (minimum document frequency: 5)
  • Save list-of-lists
  • Compute LOS, outcomes for each visit, save to dictionary

3. python/model_prep.py

Trim sequences to the appropriate lookback period according to which COVID visit is of interest.

Per the working definition, this is the first visit to occur. Also labels the sequences according to the outcome of interest.

Input:

  • List-of-lists from features_to_integers
  • Dictionary of visits and their LOS + outcome indicators from features_to_integers

Output:

  • A Tuple, (trimmed list-of-lists, list of labels)

Steps:

  • Using dictionary and global settings, compute appropriate lookback period (tp.find_cutpoints)
  • Trim sequences to the appropriate length determined, combine with labels to form tuple

4. Keras modelling / Hyperparameter tuning

Either python/model.py or python/hp_tune.py

Splits data into train, validation, and test generators and converts to RaggedTensor representation. This is then fed into the keras model as defined in the script and the metrics evaluated.

premier_analysis's People

Contributors

beansrowning avatar cdctnk6 avatar dependabot[bot] avatar karenkailun avatar scotthlee avatar too9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

premier_analysis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.