Georgia Tech
CSE 6250
Spring 2023
William Hoynes and Peter Berryman
Based on research paper MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III By Shirly Wang, Matthew B.A. McDermott, Geeticka Chauhan, Marzyeh Ghassemi, Michael C. Hughes, Tristan Naumann and Git Repo: https://github.com/MLforHealth/MIMIC_Extract
Purpose is to use original work as bases for easier downstream manipulation of MIMIC-III datasets using Physionet BigQuery Database in Scala/Spark
- Request Editor access to the Google Cloud project found here: https://console.cloud.google.com/iam-admin/iam?project=bdh6250-380417
- Install the Google Cloud CLI using the instructions found here: https://cloud.google.com/sdk/docs/install-sdk.
- Run the following command from the command line:
<path to Google Cloud CLI installation>/google-cloud-sdk/bin/gcloud auth application-default login
- Choose your Google account with credentials to the project
- Run the following command from the command line:
<path to Google Cloud CLI installation>/google-cloud-sdk/bin/gcloud config set project bdh6250-380417
- Run
src/main/scala/bdh_mimic/main/Main.scala
from this project.
- Request access to the BigQuery database provided by Physionet (requires credentialed access): https://physionet.org/content/mimiciii/1.4/
- This will give access to the database used in this pipeline named MIMIC_Extract with corresponding queries
- Request access to the BigQuery database provided by Physionet (requires credentialed access): https://physionet.org/content/mimiciii/1.4/
- See file Bigquery.txt for queries to create databases for use in Scala Code