Code Monkey home page Code Monkey logo

mimic-iv-data-pipeline's People

Contributors

bgallamoza avatar mehak25 avatar ncutrona avatar udpranjal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mimic-iv-data-pipeline's Issues

No license is declared

Could you please declare a license in your repo? Without a LICENSE file or declaration of the license terms in the README.md, the code is technically "all rights reserved" by the authors and reuse of the code is limited. If you are unopinionated on the matter, I suggest the MIT license as it is highly compatible--particularly with university IP licenses (such as this one).

If an MIT license is acceptable I'd be happy to submit a PR to include it.

Thanks!

non-ICU data

I have some problem whenever I want to preprocess non-ICU data.
ValueError: could not broadcast input array from shape (36,13) into shape (35,13)

Add time information to procedures

For procedures data in data\long_format\proc\long_proc_icd10_norm.csv.gz.
Add time column such that it tells if a particular procedure happened at which hour of the admission.
For example, a certain procedure was performed 3 hours after admit time, than its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.

create a function to preprocess data

Inputs:

  1. whether to group codes
    1. outlier removal.
  2. time series smoothing.
  3. If all admission data is needed or only last 24 or last 48 hours of data.

Output:
Stored files and summary

TypeError: preprocess_features_icu() missing 1 required positional argument: 'left_thresh'

Dear all,

When I ran the following codes from 'mainPipeline.ipynb', I got an error in ''preprocess_features_icu''. It seems 'left_thresh' was not defined. Is there any requirement for ''left_thresh'?

if data_icu:
if diag_flag:
group_diag=radio_input4.value
preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
else:
if diag_flag:
group_diag=radio_input4.value
if med_flag:
group_med=radio_input5.value
if proc_flag:
group_proc=radio_input6.value
preprocess_features_hosp(cohort_output, diag_flag,proc_flag,med_flag,False,group_diag,group_med,group_proc,False,False,0)


TypeError Traceback (most recent call last)
/tmp/ipykernel_867107/3036563163.py in
5 if diag_flag:
6 group_diag=radio_input4.value
----> 7 preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
8 else:
9 if diag_flag:

TypeError: preprocess_features_icu() missing 1 required positional argument: 'left_thresh'

image

Using hospital labs for ICU prediction tasks?

Hi,

Thank you for this useful pipeline.
One suggestion: as you may know, many papers use hospital lab measurements to predict outcomes of ICU stays. In particular e.g. this paper uses many items in the "hosp/labevents.csv.gz" file as features for the ICU stay (linking via the hospital admission ID, "hadm_id", which is a column in the ICU stay matrix). However, I noticed that your pipeline does not natively allow the user to include lab events if they select the ICU flag.

It shouldn't be too hard to support that with the code you've already written, so just wanted to flag this point. Let me know if I'm misunderstanding something and this is already an option. Thanks!

Option to load raw notes?

Thanks for releasing this wonderful pipeline! I am interested in using this pipeline, but also adding features extracted from the raw text of the notes to help prediction.

Is it possible to optionally load the raw text of the note alongside the current features?

Thanks!

System hardware configuration suggested for the pipeline?

Hi,

Thank you all for creating this useful and flexible pipeline.

Can you please suggest the System hardware configuration suggested for the pipeline and what will be the time taken to pre-process the data for version1. Thanks!

Issue of prediction window

Hi there,

Thanks for the amazing work!

When selecting the prediction window using the jupyter notebook (mainPipeline.ipynb), it seems that the corresponding code should be:
if (radio_input6.value=='Custom'): predW=int(text3.value) else: predW=int(radio_input6.value[0].strip())

for 'section 7. Time-Series Representation', cell 2. The current code seems to mix up the inputs of the prediction window and the bucket.

Best wishes

Create Function to extract data

Please refer UserInterface file to understand this.
We need to create a function such as extract() which takes the following inputs:

  1. Type of data - ICU or Non-ICU
  2. Prediction task - 30-day, 60-day readmission or mortality

Output is data files stored according to chosen options and summary of data.

Add time for labs

For procedures data in data\long_format\labs\long_labs_units_cleaned_norm.csv.gz.
Add time column such that it tells if particular labs happened at which hour of the admission.
For example, a certain lab was performed 3 hours after admit time, then its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.
Also, find hadm_id for labs with missing hadm_id by seeing charttime of labs and calculate the time for labs in each admission.

Outlier Detection

Check the previous pipeline to replicate the outlier detection tasks in our pipeline.

How to get exact item code for specific disease

I found in your project for specific diseases you let the icd_code to have different value (like for CKD, the icd_code is N18), I just wonder do you offer exact item ids for the specific disease? Where can I find them? Or can you tell me how do you define the kidney disease? Thanks.

Add time to Medications

For procedures data in data\long_format\meds\long_med_nonproprietaryname_norm.csv.gz.
Add time column such that it tells if particular meds happened at which hour of the admission.
For example, a certain med was given 3 hours after admit time, then its time column should say 3 and if it is given 30 hours after admit time then its time column should say 30.
Similarly, create a time_end column to get the end time of each med.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.