bd4h-project-mimic-extract's Introduction

BD4H-Project-MIMIC-Extract

About

Georgia Tech
CSE 6250
Spring 2023
William Hoynes and Peter Berryman

Based on research paper MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III By Shirly Wang, Matthew B.A. McDermott, Geeticka Chauhan, Marzyeh Ghassemi, Michael C. Hughes, Tristan Naumann and Git Repo: https://github.com/MLforHealth/MIMIC_Extract

Purpose is to use original work as bases for easier downstream manipulation of MIMIC-III datasets using Physionet BigQuery Database in Scala/Spark

Original MIMIC Extract Pipeline

Simplified Updated Physionet Dataset

Instructions - For Grading TA Only

Request Editor access to the Google Cloud project found here: https://console.cloud.google.com/iam-admin/iam?project=bdh6250-380417
Install the Google Cloud CLI using the instructions found here: https://cloud.google.com/sdk/docs/install-sdk.
Run the following command from the command line:
<path to Google Cloud CLI installation>/google-cloud-sdk/bin/gcloud auth application-default login
Choose your Google account with credentials to the project
Run the following command from the command line:
<path to Google Cloud CLI installation>/google-cloud-sdk/bin/gcloud config set project bdh6250-380417
Run src/main/scala/bdh_mimic/main/Main.scala from this project.

GCP BigQuery

Request access to the BigQuery database provided by Physionet (requires credentialed access): https://physionet.org/content/mimiciii/1.4/
This will give access to the database used in this pipeline named MIMIC_Extract with corresponding queries

Instructions - For Other Users

Request access to the BigQuery database provided by Physionet (requires credentialed access): https://physionet.org/content/mimiciii/1.4/
See file Bigquery.txt for queries to create databases for use in Scala Code

Recommend Projects