sunlabuiuc / pyhealth Goto Github PK

A Deep Learning Python Toolkit for Healthcare Applications.

Home Page: https://pyhealth.readthedocs.io

License: MIT License

Python 94.58% Jupyter Notebook 3.57% Cython 1.80% Dockerfile 0.06%

healthcare data-mining deep-learning preprocessing clinical-data clinical-research electronic-medical-record medical-code electronic-health-record

pyhealth's Introduction

Welcome to PyHealth!

Citing PyHealth 🤝

Yang, Chaoqi, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. 2023. “PyHealth: A Deep Learning Toolkit for Healthcare Applications.” In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5788–89. KDD ’23. New York, NY, USA: Association for Computing Machinery.

@inproceedings{pyhealth2023yang,
    author = {Yang, Chaoqi and Wu, Zhenbang and Jiang, Patrick and Lin, Zhen and Gao, Junyi and Danek, Benjamin and Sun, Jimeng},
    title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
    url = {https://github.com/sunlabuiuc/PyHealth},
    booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023},
    year = {2023}
}

Checkout Our KDD'23 Tutorial https://sunlabuiuc.github.io/PyHealth/ -----------------------------------------------------------------

PyHealth is a comprehensive deep learning toolkit for supporting clinical predictive modeling, which is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to deploy and more flexible and customizable. [Tutorials]

[News!] We are continueously implemeting good papers and benchmarks into PyHealth, checkout the [planned List]. Welcome to pick one from the list and send us a PR or add more influential and new papers into the plan list.

1. Installation 🚀

You could install from PyPi:

pip install pyhealth

or from github source:

pip install .

2. Introduction 📖

pyhealth provides these functionalities (we are still enriching some modules):

You can use the following functions independently:

Dataset: MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, customized EHR datasets, etc.
Tasks: diagnosis-based drug recommendation, patient hospitalization and mortality prediction, length stay forecasting, etc.
ML models: CNN, LSTM, GRU, LSTM, RETAIN, SafeDrug, Deepr, etc.

Building a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.

3. Build ML Pipelines 🏆

All healthcare tasks in our package follow a five-stage pipeline:

Module 1: <pyhealth.datasets>

pyhealth.datasets provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV and eICU, etc. The output (mimic3base) is a multi-level dictionary structure (see illustration below).

from pyhealth.datasets import MIMIC3Dataset

mimic3base = MIMIC3Dataset(
    # root directory of the dataset
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/", 
    # raw CSV table name
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    # map all NDC codes to CCS codes in these tables
    code_mapping={"NDC": "CCSCM"},
)

Module 2: <pyhealth.tasks>

pyhealth.tasks defines how to process each patient's data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation and length of stay prediction. It is easy to customize your own tasks following our template.

from pyhealth.tasks import readmission_prediction_mimic3_fn

mimic3sample = mimic3base.set_task(task_fn=readmission_prediction_mimic3_fn) # use default task
mimic3sample.samples[0] # show the information of the first sample
"""
{
    'visit_id': '100183',
    'patient_id': '175',
    'conditions': ['5990', '4280', '2851', '4240', '2749', '9982', 'E8499', '42831', '34600'],
    'procedures': ['0040', '3931', '7769'],
    'drugs': ['N06DA02', 'V06DC01', 'B01AB01', 'A06AA02', 'R03AC02', 'H03AA01', 'J01FA09'],
    'label': 0
}
"""

from pyhealth.datasets import split_by_patient, get_dataloader

train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

Module 3: <pyhealth.models>

pyhealth.models provides different ML models with very similar argument configs.

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
    feature_keys=["conditions", "procedures", "drug"],
    label_key="label",
    mode="binary",
)

Module 4: <pyhealth.trainer>

pyhealth.trainer can specify training arguments, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc_samples",
)

Module 5: <pyhealth.metrics>

pyhealth.metrics provides several common evaluation metrics (refer to Doc and see what are available).

# method 1
trainer.evaluate(test_loader)

# method 2
from pyhealth.metrics.binary import binary_metrics_fn

y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])

4. Medical Code Map 🏥

pyhealth.codemap provides two core functionalities. This module can be used independently.

For code ontology lookup within one medical coding system (e.g., name, category, sub-concept);

from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# `Congestive heart failure, unspecified`
icd9cm.get_ancestors("428.0")
# ['428', '420-429.99', '390-459.99', '001-999.99']

atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# `ibuprofen, combinations`
atc.lookup("M01AE51", "drugbank_id")
# `DB01050`
atc.lookup("M01AE51", "description")
# Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) derived ...
atc.lookup("M01AE51", "indication")
# Ibuprofen is the most commonly used and prescribed NSAID. It is very common over the ...

For code mapping between two coding systems (e.g., ICD9CM to CCSCM).

from pyhealth.medcode import CrossMap

codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("428.0")
# ['108']

codemap = CrossMap.load("NDC", "RxNorm")
codemap.map("50580049698")
# ['209387']

codemap = CrossMap.load("NDC", "ATC")
codemap.map("50090539100")
# ['A10AC04', 'A10AD04', 'A10AB04']

5. Medical Code Tokenizer 💬

pyhealth.tokenizer is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used independently.

from pyhealth.tokenizer import Tokenizer

# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
        'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
        'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])

# 2d encode 
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens) 
# [[8, 9, 10, 11], [12, 1, 1, 0]]

# 2d decode 
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices)
# [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]

# 3d encode
tokens = [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], \
    [['A04A', 'B035', 'C129']]]
indices = tokenizer.batch_encode_3d(tokens)
# [[[8, 9, 10, 11], [24, 25, 0, 0]], [[12, 1, 1, 0], [0, 0, 0, 0]]]

# 3d decode
indices = [[[8, 9, 10, 11], [24, 25, 0, 0]], \
    [[12, 1, 1, 0], [0, 0, 0, 0]]]
tokens = tokenizer.batch_decode_3d(indices)
# [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], [['A04A', '<unk>', '<unk>']]]

6. Tutorials 🧑‍🏫

Tutorial 0: Introduction to pyhealth.data [Video]

Tutorial 1: Introduction to pyhealth.datasets [Video]

Tutorial 2: Introduction to pyhealth.tasks [Video]

Tutorial 3: Introduction to pyhealth.models [Video]

Tutorial 4: Introduction to pyhealth.trainer [Video]

Tutorial 5: Introduction to pyhealth.metrics [Video]

Tutorial 6: Introduction to pyhealth.tokenizer [Video]

Tutorial 7: Introduction to pyhealth.medcode [Video]

The following tutorials will help users build their own task pipelines.

Pipeline 1: Drug Recommendation [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 2: Length of Stay Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 3: Readmission Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 4: Mortality Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 5: Sleep Staging [Video]

We provided the advanced tutorials for supporting various needs.

Advanced Tutorial 1: Fit your dataset into our pipeline [Video]

Advanced Tutorial 2: Define your own healthcare task

Advanced Tutorial 3: Adopt customized model into pyhealth [Video]

Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models [Video]

7. Datasets 🏔️

We provide the processing files for the following open EHR datasets:

Dataset	Module	Year	Information
MIMIC-III	`pyhealth.datasets.MIMIC3Dataset`	2016	MIMIC-III Clinical Database
MIMIC-IV	`pyhealth.datasets.MIMIC4Dataset`	2020	MIMIC-IV Clinical Database
eICU OMOP	`pyhealth.datasets.eICUDataset` `pyhealth.datasets.OMOPDataset`	2018	eICU Collaborative Research Database OMOP-CDM schema based dataset
SleepEDF	`pyhealth.datasets.SleepEDFDataset`	2018	Sleep-EDF dataset
SHHS	`pyhealth.datasets.SHHSDataset`	2016	Sleep Heart Health Study dataset
ISRUC	`pyhealth.datasets.ISRUCDataset`	2016	ISRUC-SLEEP dataset

8. Machine/Deep Learning Models and Benchmarks ✈️

Model Name	Type	Module	Year	Summary	Reference
Multi-layer Perceptron	deep learning	`pyhealth.models.MLP`	1986	MLP treats each feature as static	Backpropagation: theory, architectures, and applications
Convolutional Neural Network (CNN)	deep learning	`pyhealth.models.CNN`	1989	CNN runs on the conceptual patient-by-visit grids	Handwritten Digit Recognition with a Back-Propagation Network
Recurrent Neural Nets (RNN)	deep Learning	`pyhealth.models.RNN`	2011	RNN (includes LSTM and GRU) can run on any sequential level (e.g., visit by visit sequences)	Recurrent neural network based language model
Transformer	deep Learning	`pyhealth.models.Transformer`	2017	Transformer can run on any sequential level (e.g., visit by visit sequences)	Attention Is All You Need
RETAIN	deep Learning	`pyhealth.models.RETAIN`	2016	RETAIN uses two RNN to learn patient embeddings while providing feature-level and visit-level importance.	RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
GAMENet	deep Learning	`pyhealth.models.GAMENet`	2019	GAMENet uses memory networks, used only for drug recommendation task	GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination
MICRON	deep Learning	`pyhealth.models.MICRON`	2021	MICRON predicts the future drug combination by instead predicting the changes w.r.t. the current combination, used only for drug recommendation task	Change Matters: Medication Change Prediction with Recurrent Residual Networks
SafeDrug	deep Learning	`pyhealth.models.SafeDrug`	2021	SafeDrug encodes drug molecule structures by graph neural networks, used only for drug recommendation task	SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations
MoleRec	deep Learning	`pyhealth.models.MoleRec`	2023	MoleRec encodes drug molecule in a substructure level as well as the patient's information into a drug combination representation, used only for drug recommendation task	MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning
Deepr	deep Learning	`pyhealth.models.Deepr`	2017	Deepr is based on 1D CNN. General purpose.	Deepr : A Convolutional Net for Medical Records
ContraWR Encoder (STFT+CNN)	deep Learning	`pyhealth.models.ContraWR`	2021	ContraWR encoder uses short time Fourier transform (STFT) + 2D CNN, used for biosignal learning	Self-supervised EEG Representation Learning for Automatic Sleep Staging
SparcNet (1D CNN)	deep Learning	`pyhealth.models.SparcNet`	2023	SparcNet is based on 1D CNN, used for biosignal learning	Development of Expert-level Classification of Seizures and Rhythmic and Periodic Patterns During EEG Interpretation
TCN	deep learning	`pyhealth.models.TCN`	2018	TCN is based on dilated 1D CNN. General purpose	An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
AdaCare	deep learning	`pyhealth.models.AdaCare`	2020	AdaCare uses CNNs with dilated filters to learn enriched patient embedding. It uses feature calibration module to provide the feature-level and visit-level interpretability	AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration
ConCare	deep learning	`pyhealth.models.ConCare`	2020	ConCare uses transformers to learn patient embedding and calculate inter-feature correlations.	ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context
StageNet	deep learning	`pyhealth.models.StageNet`	2020	StageNet uses stage-aware LSTM to conduct clinical predictive tasks while learning patient disease progression stage change unsupervisedly	StageNet: Stage-Aware Neural Networks for Health Risk Prediction
Dr. Agent	deep learning	`pyhealth.models.Agent`	2020	Dr. Agent uses two reinforcement learning agents to learn patient embeddings by mimicking clinical second opinions	Dr. Agent: Clinical predictive model via mimicked second opinions
GRASP	deep learning	`pyhealth.models.GRASP`	2021	GRASP uses graph neural network to identify latent patient clusters and uses the clustering information to learn patient	GRASP: Generic Framework for Health Status Representation Learning Based on Incorporating Knowledge from Similar Patients

Check the interactive map on benchmark EHR predictive tasks.

pyhealth's People

Contributors

Stargazers

Watchers

Forkers

wishyoulikebefore xmy1990 luotailong lingxuanxiao ml-lab y-kanan gpsbird deeplearning2012 aspirincode zhaoxiangsimoncai yangsenwxy suvrajeet01 qxiaobu shuxiangzhang amirunpri2018 burakakrishna binbenliu zheng7ai310 pickleyang bidron88 medicine-and-algorithm vinod3888 solarsys haipinglu gordondoo scxsunchenxi zsb87 outformatics zyh1234 teamubuntu binfnstats reloadbrain xiaowodaweiwang hchen1995 quinnxu mole-bai chaoshengt kaixinhuaihuai regi-na jishanling mutual-ai dw31 zhangxiaowbl mishkail jessicascorpio119 prashant118 manik-500 animesh stjordanis antranttu rameshkrsah yl2565 chengding0713 ztyreg zabehcruz milkigit aprilfly pain-magement-dashboard-ia freostudio stenw jeffgan99 uuuuf9 annie983284450-1 xiangbu ranxiao sourbhkatoch manjushanair propelwise pragyanaischool xdotproduct vishalbelsare yingtaoluo jojojocelyn ccforeverqing abhinav43 gregoryperkins dankhap bondarenkoartur nnosheen sadatian tataphani rkalahasty toyinfawole ed-ortizm mikosl krakhit jidiazhernandez ccanxue ngocdung03 pshinoj eric-jon-severson kour-git xutianhan htic-medical-imaging ryanwangzf jamontol mshabdiz m1cloud cleancoindev poehavshi

pyhealth's Issues

How to generate x_data as the data in datasets?

Hi, thanks for your great work! I am trying to run your code using the mimic-iii-demo dataset. The problem I met is that I don't know how to generate the x_data as the data in datasets (mimic or cms). I followed your instructions but only got the y_data after running generate_mortality_prediction_mimic_demo.py. Is this because the data in the datasets folder used the full variables of mimic-iii data while only a few of them existed in mimic-iii-demo data? Thanks!

The results obtained with pyhealth are much lower than in the paper.

Thanks for your great job!

But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.

Below are the results reported in the SafeDrug paper,

KeyError: 'logit'

Dear Sir/Madam,

I got an error when I ran the following codes (data is from Pipeline 5: Sleep Staging):

cal_model = HistogramBinning(model)

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

cal_model.calibrate(cal_dataset=val_dataset)
from pyhealth.trainer import Trainer
print(Trainer(model=cal_model, metrics=['cwECEt_adapt', 'accuracy']).evaluate(test_loader))

Any advice? Thank you.

pyhealth.calib.calibration.

Dear Sir/Madam,

I got an error in the following commands:

from pyhealth.calib.calibration.hb import HistogramBinning
cal_model = HistogramBinning(model)
cal_model.calibrate(cal_dataset=val_dataset)

val_dataset is from torch subset from BaseEHRDataset. Could you give some advice?

a question about metrics

Is there a way to parameterize DDI rate as metrics in Trainer? And output the final DDI rate in trainer.evaluate().

Add contributor list (tmp3)

@all-contributors please add @ycq091044 for code

ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets'

Hi,

When I was testing the example in https://pyhealth.readthedocs.io/en/latest/examples_bak.html#step-3-build-deep-learning-models, I found I could not load the dataset. Could you help look into this issue? Thank you!

> python test_retain.py
Traceback (most recent call last):
  File "test_retain.py", line 1, in <module>
    from pyhealth.datasets import MIMIC3BaseDataset
ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets' (/Users/anaconda3/envs/ehr/lib/python3.8/site-packages/pyhealth/datasets/__init__.py)

I also tried to search the class MIMIC3BaseDataset over this repo but could not find it. Any help would be appreciated!

Question about MIMIC-iii dataset

Hi, I found that in the MoleRec paper, the processed mimic-iii dataset has 6, 350 patients and 14, 995 visits. However, I only got 5, 449 patients and 14, 141 visits when I using PyHealth to process this dataset. Here is my screenshot.

Entering deadlock when parsing prescriptions

I have tested some basic code from the tutorial with the MIMIC-4 dataset. But the process hanged. I press ctrl-C to exit the program and it gives the following call stacks. Seems like parallel_apply get into a deadlock or something else when parsing prescriptions.

reproducing code

import logging
from pyhealth.datasets import MIMIC4Dataset

logger = logging.getLogger("pyhealth")
logger.setLevel(logging.DEBUG)

dataset = MIMIC4Dataset(
    "/home/featurize/data/mimic-iv-2.2/hosp",
    tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
    code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)

Callstacks after ctrl-C

Loaded NDC->ATC mapping from /home/featurize/.cache/pyhealth/medcode/NDC_to_ATC.pkl                                                                                                                                
Loaded NDC code from /home/featurize/.cache/pyhealth/medcode/NDC.pkl                                     
Loaded ATC code from /home/featurize/.cache/pyhealth/medcode/ATC.pkl                                                                                                                                               
Processing MIMIC4Dataset base dataset...            
INFO: Pandarallel will run on 6 workers.                                                                 
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.                                                                                                               
finish basic patient information parsing : 80.05470561981201s                                            
finish parsing diagnoses_icd : 134.23406291007996s                                                       
finish parsing procedures_icd : 57.97325396537781s                                                       
                                                    
^CTraceback (most recent call last):                                                                                                                                                                               
  File "main.py", line 7, in <module>
    dataset = MIMIC4Dataset(                                                                             
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 130, in __init__
    patients = self.parse_tables()                                                                                                                                                                                 
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 190, in parse_tables
    patients = getattr(self, f"parse_{table.lower()}")(patients)                                                                                                                                                   
  File "/home/featurize/work/PyHealth/pyhealth/datasets/mimic4.py", line 307, in parse_prescriptions
    group_df = group_df.parallel_apply(                                                                                                                                                                            
  File "/environment/miniconda3/envs/py38/lib/python3.8/site-packages/pandarallel/core.py", line 307, in closure
Process ForkPoolWorker-28:                                                                               
Process ForkPoolWorker-31:        
Process ForkPoolWorker-30:                                                                                                                                                                                         
Process ForkPoolWorker-32:          
Process ForkPoolWorker-33:                                                                                                                                                                                         
Process ForkPoolWorker-29:                   
    message: Tuple[int, WorkerStatus, Any] = master_workers_queue.get()                                                                                                                                            
  File "<string>", line 2, in get   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod  
Traceback (most recent call last):
Traceback (most recent call last):                                                                                                                                                                                 
Traceback (most recent call last):  
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                                                       
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                               
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                                                              
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker                                                                                                              
    task = get()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker    
    task = get()                                                                                                                                                                                                   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker            
    task = get()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

Use Pretrained Model

Hi,
How can I do training from a pre-trained model?

For example, instead of using:

from pyhealth.models import Transformer, RNN, RETAIN

model = Transformer(
    dataset=mimic3_task_ds,
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures", "drugs"],
    label_key="label",
    mode="binary",

I would like to use a pre-trained Transformer or pre-trained bert model instead.
Is it possible?
@pat-jj

RxNorm codes hierarchy

Hi, I noticed something that might seem strange in the hierarchy of RxNorm (and perhaps other vocabularies).
For instance, the code 1000001 in RxNorm doesn't have any parents or children in the PyHealth hierarchy.

However, according Athena it looks this code has parents and children:

This isn't specific to this code alone; it applies to many others as well. I just used 1000001 as an example.
I would like to use PyHealth for getting the hierarchy of RxNorm .
Can you please check this? How can I get the hierarchy of RxNorm correctly?
Thank you!
@pat-jj

The Demo for LSTM on Phenotyping Prediction with GPU

cur_dataset = expdata_generator(exp_id=exp_id)
should change to
cur_dataset = expdata_generator(expdata_id=expdata_id)
Thanks~

HALO and EHR synthetic task

I found that HALO is combined in the branch 'main Halo 2'. Will it be added to the main branch along with the EHR synthetic task? #278

Does models support text?

Dear Sir or Madam,

I have input text report as features into the transformer and it works. But I dont know if it is meaningful to do so. If it can learn from the text as language models do?

I follow the case 2 for the transformer model, each code is a radiology report.

"case 2. [[code1, code2]] or [[code1, code2], [code3, code4, code5], …]"

pip install version

Hello, I recently installed PyHealth using the command ```pip install pyhealth''', and it installed version 1.1.4. However, I noticed some discrepancies between this version and the latest code available on GitHub. For example, the multilabel_metrics_fn seems to be different.

Bug in GAMNET

Dear Sir/Madam,

When I run 'drug_recommendation_mimic4_gamenet.py' in tutorials, I get an error.

Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]queries shape torch.Size([64, 10, 128])
prev_drugs shape torch.Size([64, 10, 147])
curr_drugs shape torch.Size([64, 147])
a_s shape torch.Size([64, 9])
DM_values shape torch.Size([64, 10, 147])
Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 103, in <module>
    model, trainer = train_gamenet(data, train_loader, val_loader)
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 77, in train_gamenet
    trainer.train(
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/trainer.py", line 195, in train
    output = self.model(**data)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 410, in forward
    loss, y_prob = self.gamenet(queries, prev_drugs, curr_drugs, mask)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 211, in forward
    a_m = torch.einsum("bv,bvz->bz", a_s, DM_values.float())
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: einsum(): subscript v has size 10 for operand 1 which does not broadcast with previously seen size 9

It seems it can be addressed like in the following figure. Please have a look.

Descriptive info on the example data needed

I found your PyHealth package a valuable resource. I am trying the test_sequence_data.ipynb notebook with example dataset. While the csv files in /datasets/mimic/y_data/ folder seems to be clear because the column names are self-explanatory, but not the ones in /datasets/mimic/x_data/ folder, which has no column names. I’ve read the readme files and online documentation, couldn’t find anything. Can you help me on this?

BTW, it would help a lot if you could add some minimal description on the data, data processing or training steps in the notebook. That would help the users a lot, because they don’t have to spend a lot of time finding the info everywhere.

question about MIMIC-III in drug recommendation task

The MIMIC-III dataset used in many of the papers (eg. SafeDrug, GAMENet, MoleRec) consists of 50,206 medical encounter records. By filtering out the patients with only one visit, they would contain 14,995 visits and 6,350 patients, In the code of drug_recommendation_mimic3_fn, they appear to have the same task as in the paper, but using "mimic3_ds= mimic3_ds.set_task(task_fn=drug_recommendation_mimic3_fn)" would only produce 911 patients and 1858 Visits, why is this?

API doc link seems to be broken

https://pyhealth.readthedocs.io/en/latest/pyhealth.html

How to use any of model for mimic-iii code prediction?

I am trying mimic-iii code prediction task, how I can feed data, pretrained embeddings, vocab, etc. Also, How to use models after that, a simple minimal example would be beneficial.

Thank you!

Bug in SafeDrug

The program crashes when I run the SafeDrug for drug recommendation with the real MIMIC-III dataset. When I looked into the source code, I found the bug occurred in the function generate_molecule_info(), specifically, adjacency = Chem.GetAdjacencyMatrix(mol). https://github.com/sunlabuiuc/PyHealth/blob/5592d437abf6a06df7d41204cf56971f45e98a47/pyhealth/models/safedrug.py#L529C20-L529C20

I don't know what happened. Please provide some suggestions. Thank you.
I provide a bug case here: smile = '[F-].[Na+]'

wrong code in pyhealth\metrics\drug_recommendation.py

Hello, when you update the code, you wrote the wrong code to calculate ddi, the original code is correct. It bothered me all morning. ^ ^

if ddi_matrix[i, j] == 1 or ddi_matrix[j, i] == 1: # wrong code
if ddi_matrix[med_i, med_j] == 1 or ddi_matrix[med_j, med_i] == 1: # Old and correct code

Does the inner map cotains all the icd code id?

I use the following code to get all the icd tokens.

base_dataset2 = MIMIC4Dataset(
       root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp",  # 2.2 不大行
       tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
       code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
       dev=False,
       refresh_cache=False, # 第一次用True
   )
   sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
   tokenizer2 = Tokenizer(
       tokens=sample_dataset2.get_all_tokens(key='conditions'),
       special_tokens=["<pad>", "<unk>"],
   )
   tokens2 = list(tokenizer2.vocabulary.idx2token.values())
   print(tokens2)
   diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
   diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')

but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
   icd9cm = InnerMap.load("ICD9CM")
   icd10cm = InnerMap.load("ICD10CM")
   print(icd9cm.lookup('H4011X0'))
   print(icd10cm.lookup('H4011X0'))

sequential_drug_recommendation

Dear sir/Madam,

In 'Advanced Case 2: Work on customized healthcare task' , I have questions about 'sequential_drugs' and 'drugs' (see following codes). It seems 'sequential_drugs' is always empty. What ' sequential_drugs[-1] = drugs' is used for at the final row?

def sequential_drug_recommendation(patient):
samples = []

sequential_conditions = []
sequential_procedures = []
sequential_drugs = [] # not include the drugs now
for visit in patient:

    # step 1: obtain feature information
    conditions = visit.get_code_list(table="DIAGNOSES_ICD")
    procedures = visit.get_code_list(table="PROCEDURES_ICD")
    drugs = visit.get_code_list(table="PRESCRIPTIONS")

    sequential_conditions.append(conditions)
    sequential_procedures.append(drugs)
    sequential_drugs.append([])

    # step 2: exclusion criteria: visits without drug
    if len(drugs) == 0: 
        sequential_drugs[-1] = drugs
        continue

    # step 3: assemble the samples
    samples.append(
        {
            "visit_id": visit.visit_id,
            "patient_id": patient.patient_id,
            # the following keys can be the "feature_keys" or "label_key" for initializing downstream ML model
            "sequential_conditions": sequential_drugs.copy(),
            "sequential_procedures": sequential_procedures.copy(),
            "sequential_drugs": sequential_drugs.copy(),
            "label": drugs,
        }
    )
    sequential_drugs[-1] = drugs

return samples

a question about ddi rate

Hi! sorry for bothering you again.

I added ddi rate as a metric and checked the ddi rate, but this value was much smaller than it should be. Besides, it seems that the ddi matrix of PyHealth is different from SafeDrug and GAMEnet. Could you please give me some help about it?

DDI of eICU dataset and omop dataset

Hi. Can we use pyhealth to caculate DDI rate in these two dataset?

Performance of SafeDrug and Molerec

Hello, esteemed author. I noticed that when running the safedrug and molerec algorithms from your library, their performance falls far short of what is claimed in your papers and benchmarks. I would like to inquire whether you used any specific parameters or techniques during testing. Thank you for your response.

drugrec for OMOP datasets doesn't work

from pyhealth.datasets import OMOPDataset
omop_base = OMOPDataset(
    root="https://storage.googleapis.com/pyhealth/synpuf1k_omop_cdm_5.2.2",
    tables=["condition_occurrence", "procedure_occurrence"],
    code_mapping={},
)
from pyhealth.tasks import drug_recommendation_omop_fn
omop_sample = omop_base.set_task(drug_recommendation_eicu_fn)

Getting error while loading OMOP dataset

While loading the OMOP dataset I am getting the following error.

question about eicu in drug recommendation task

Thank you so much for your work!
when I use eicu data for drug recommendation, I meet an error as:
Key drugs has mixed nested list levels across samples.

could you please tell me how to solve this problem?

Thanks in advance

How-to-contribute section is still missing

https://github.com/yzhao062/PyHealth#how-to-contribute

too keen on contributing 😂

Hello, the question about repositorie "Pandarallel"

When I perform prediction tasks on the mimic-iv dataset, due to the amount of mimic-iv, my code is always in deadlock, like the before issue.

I want to know the specific version about 'pandas' and 'pandarallel', thanks!

no such thing SampleDataset

pyhealth 1.1.4 version
no such thing
i think u should either update ur docu or code

The results of SafeDrug model differ significantly from those in the paper.

Hi! sorry for bothering you again.
I ran the code for GAMENet, SafeDrug and MoleRec locally. The results of the three models are as follows:

Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?
Additionally, why the results obtained with pyhealth are lower than in the paper? I note that the sample dataset contains 14,142 visits and 5,449 patients, which is different from the papers that contain 6,350 patients and 14,995 visits. Is it because of this?

Looking forward to and thank you for your reply!

Add contributor list (tmp2)

@all-contributors
please add @ycq091044 for code.
please add @zzachw for code.
please add @pat-jj for code.
please add @zlin7 for code.
please add @v1xerunt for code.
please add @BPDanek for code.
please add @solarsys for code.

Cannot download 'https://storage.googleapis.com/pyhealth/resource/NDC_to_ATC.csv'

When initialize the MIMIC3Dataset() class, I get urllib.error.URLError. And I checked the call stack, I found the problem lies in the function download_and_read_csv() of the CrossMap class. I think it's because of my own Internet connection, while I hope to open the local download permission for these files and alternate network download with local file reading.

sunlabuiuc / pyhealth Goto Github PK

pyhealth's Introduction

Welcome to PyHealth!

Citing PyHealth 🤝

1. Installation 🚀

2. Introduction 📖

3. Build ML Pipelines 🏆

Module 1: <pyhealth.datasets>

Module 2: <pyhealth.tasks>

Module 3: <pyhealth.models>

Module 4: <pyhealth.trainer>

Module 5: <pyhealth.metrics>

4. Medical Code Map 🏥

5. Medical Code Tokenizer 💬

6. Tutorials 🧑‍🏫

7. Datasets 🏔️

8. Machine/Deep Learning Models and Benchmarks ✈️

pyhealth's People

Contributors

Stargazers

Watchers

Forkers

pyhealth's Issues

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

Recommend Projects

Recommend Topics

Recommend Org