Code Monkey home page Code Monkey logo

pyhealth's Introduction

Welcome to PyHealth!

PyPI version

Documentation status

GitHub stars

GitHub forks

Downloads

Tutorials

YouTube

Citing PyHealth 🤝

Yang, Chaoqi, Zhenbang Wu, Patrick Jiang, Zhen Lin, Junyi Gao, Benjamin P. Danek, and Jimeng Sun. 2023. “PyHealth: A Deep Learning Toolkit for Healthcare Applications.” In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 5788–89. KDD ’23. New York, NY, USA: Association for Computing Machinery.

@inproceedings{pyhealth2023yang,
    author = {Yang, Chaoqi and Wu, Zhenbang and Jiang, Patrick and Lin, Zhen and Gao, Junyi and Danek, Benjamin and Sun, Jimeng},
    title = {{PyHealth}: A Deep Learning Toolkit for Healthcare Predictive Modeling},
    url = {https://github.com/sunlabuiuc/PyHealth},
    booktitle = {Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023},
    year = {2023}
}

Checkout Our KDD'23 Tutorial https://sunlabuiuc.github.io/PyHealth/ -----------------------------------------------------------------

PyHealth is a comprehensive deep learning toolkit for supporting clinical predictive modeling, which is designed for both ML researchers and medical practitioners. We can make your healthcare AI applications easier to deploy and more flexible and customizable. [Tutorials]

[News!] We are continueously implemeting good papers and benchmarks into PyHealth, checkout the [planned List]. Welcome to pick one from the list and send us a PR or add more influential and new papers into the plan list.

image

1. Installation 🚀

  • You could install from PyPi:
pip install pyhealth
  • or from github source:
pip install .

2. Introduction 📖

pyhealth provides these functionalities (we are still enriching some modules):

image

You can use the following functions independently:

  • Dataset: MIMIC-III, MIMIC-IV, eICU, OMOP-CDM, customized EHR datasets, etc.
  • Tasks: diagnosis-based drug recommendation, patient hospitalization and mortality prediction, length stay forecasting, etc.
  • ML models: CNN, LSTM, GRU, LSTM, RETAIN, SafeDrug, Deepr, etc.

Building a healthcare AI pipeline can be as short as 10 lines of code in PyHealth.

3. Build ML Pipelines 🏆

All healthcare tasks in our package follow a five-stage pipeline:

image

Module 1: <pyhealth.datasets>

pyhealth.datasets provides a clean structure for the dataset, independent from the tasks. We support MIMIC-III, MIMIC-IV and eICU, etc. The output (mimic3base) is a multi-level dictionary structure (see illustration below).

from pyhealth.datasets import MIMIC3Dataset

mimic3base = MIMIC3Dataset(
    # root directory of the dataset
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/", 
    # raw CSV table name
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    # map all NDC codes to CCS codes in these tables
    code_mapping={"NDC": "CCSCM"},
)

image

Module 2: <pyhealth.tasks>

pyhealth.tasks defines how to process each patient's data into a set of samples for the tasks. In the package, we provide several task examples, such as drug recommendation and length of stay prediction. It is easy to customize your own tasks following our template.

from pyhealth.tasks import readmission_prediction_mimic3_fn

mimic3sample = mimic3base.set_task(task_fn=readmission_prediction_mimic3_fn) # use default task
mimic3sample.samples[0] # show the information of the first sample
"""
{
    'visit_id': '100183',
    'patient_id': '175',
    'conditions': ['5990', '4280', '2851', '4240', '2749', '9982', 'E8499', '42831', '34600'],
    'procedures': ['0040', '3931', '7769'],
    'drugs': ['N06DA02', 'V06DC01', 'B01AB01', 'A06AA02', 'R03AC02', 'H03AA01', 'J01FA09'],
    'label': 0
}
"""

from pyhealth.datasets import split_by_patient, get_dataloader

train_ds, val_ds, test_ds = split_by_patient(mimic3sample, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=32, shuffle=False)
test_loader = get_dataloader(test_ds, batch_size=32, shuffle=False)

Module 3: <pyhealth.models>

pyhealth.models provides different ML models with very similar argument configs.

from pyhealth.models import Transformer

model = Transformer(
    dataset=mimic3sample,
    feature_keys=["conditions", "procedures", "drug"],
    label_key="label",
    mode="binary",
)

Module 4: <pyhealth.trainer>

pyhealth.trainer can specify training arguments, such as epochs, optimizer, learning rate, etc. The trainer will automatically save the best model and output the path in the end.

from pyhealth.trainer import Trainer

trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc_samples",
)

Module 5: <pyhealth.metrics>

pyhealth.metrics provides several common evaluation metrics (refer to Doc and see what are available).

# method 1
trainer.evaluate(test_loader)

# method 2
from pyhealth.metrics.binary import binary_metrics_fn

y_true, y_prob, loss = trainer.inference(test_loader)
binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"])

4. Medical Code Map 🏥

pyhealth.codemap provides two core functionalities. This module can be used independently.

  • For code ontology lookup within one medical coding system (e.g., name, category, sub-concept);
from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# `Congestive heart failure, unspecified`
icd9cm.get_ancestors("428.0")
# ['428', '420-429.99', '390-459.99', '001-999.99']

atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# `ibuprofen, combinations`
atc.lookup("M01AE51", "drugbank_id")
# `DB01050`
atc.lookup("M01AE51", "description")
# Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) derived ...
atc.lookup("M01AE51", "indication")
# Ibuprofen is the most commonly used and prescribed NSAID. It is very common over the ...
  • For code mapping between two coding systems (e.g., ICD9CM to CCSCM).
from pyhealth.medcode import CrossMap

codemap = CrossMap.load("ICD9CM", "CCSCM")
codemap.map("428.0")
# ['108']

codemap = CrossMap.load("NDC", "RxNorm")
codemap.map("50580049698")
# ['209387']

codemap = CrossMap.load("NDC", "ATC")
codemap.map("50090539100")
# ['A10AC04', 'A10AD04', 'A10AB04']

5. Medical Code Tokenizer 💬

pyhealth.tokenizer is used for transformations between string-based tokens and integer-based indices, based on the overall token space. We provide flexible functions to tokenize 1D, 2D and 3D lists. This module can be used independently.

from pyhealth.tokenizer import Tokenizer

# Example: we use a list of ATC3 code as the token
token_space = ['A01A', 'A02A', 'A02B', 'A02X', 'A03A', 'A03B', 'A03C', 'A03D', \
        'A03F', 'A04A', 'A05A', 'A05B', 'A05C', 'A06A', 'A07A', 'A07B', 'A07C', \
        'A12B', 'A12C', 'A13A', 'A14A', 'A14B', 'A16A']
tokenizer = Tokenizer(tokens=token_space, special_tokens=["<pad>", "<unk>"])

# 2d encode 
tokens = [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', 'B035', 'C129']]
indices = tokenizer.batch_encode_2d(tokens) 
# [[8, 9, 10, 11], [12, 1, 1, 0]]

# 2d decode 
indices = [[8, 9, 10, 11], [12, 1, 1, 0]]
tokens = tokenizer.batch_decode_2d(indices)
# [['A03C', 'A03D', 'A03E', 'A03F'], ['A04A', '<unk>', '<unk>']]

# 3d encode
tokens = [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], \
    [['A04A', 'B035', 'C129']]]
indices = tokenizer.batch_encode_3d(tokens)
# [[[8, 9, 10, 11], [24, 25, 0, 0]], [[12, 1, 1, 0], [0, 0, 0, 0]]]

# 3d decode
indices = [[[8, 9, 10, 11], [24, 25, 0, 0]], \
    [[12, 1, 1, 0], [0, 0, 0, 0]]]
tokens = tokenizer.batch_decode_3d(indices)
# [[['A03C', 'A03D', 'A03E', 'A03F'], ['A08A', 'A09A']], [['A04A', '<unk>', '<unk>']]]

6. Tutorials 🧑‍🏫

image

Tutorial 0: Introduction to pyhealth.data [Video]

Tutorial 1: Introduction to pyhealth.datasets [Video]

Tutorial 2: Introduction to pyhealth.tasks [Video]

Tutorial 3: Introduction to pyhealth.models [Video]

Tutorial 4: Introduction to pyhealth.trainer [Video]

Tutorial 5: Introduction to pyhealth.metrics [Video]

Tutorial 6: Introduction to pyhealth.tokenizer [Video]

Tutorial 7: Introduction to pyhealth.medcode [Video]

The following tutorials will help users build their own task pipelines.

Pipeline 1: Drug Recommendation [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 2: Length of Stay Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 3: Readmission Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 4: Mortality Prediction [Video] <https:// www.youtube.com/watch?v=GGP3Dhfyisc&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=12>__

Pipeline 5: Sleep Staging [Video]

We provided the advanced tutorials for supporting various needs.

Advanced Tutorial 1: Fit your dataset into our pipeline [Video]

Advanced Tutorial 2: Define your own healthcare task

Advanced Tutorial 3: Adopt customized model into pyhealth [Video]

Advanced Tutorial 4: Load your own processed data into pyhealth and try out our ML models [Video]

7. Datasets 🏔️

We provide the processing files for the following open EHR datasets:

Dataset Module Year Information
MIMIC-III pyhealth.datasets.MIMIC3Dataset 2016 MIMIC-III Clinical Database
MIMIC-IV pyhealth.datasets.MIMIC4Dataset 2020 MIMIC-IV Clinical Database

eICU OMOP

pyhealth.datasets.eICUDataset pyhealth.datasets.OMOPDataset

2018

eICU Collaborative Research Database OMOP-CDM schema based dataset

SleepEDF pyhealth.datasets.SleepEDFDataset 2018 Sleep-EDF dataset
SHHS pyhealth.datasets.SHHSDataset 2016 Sleep Heart Health Study dataset
ISRUC pyhealth.datasets.ISRUCDataset 2016 ISRUC-SLEEP dataset

8. Machine/Deep Learning Models and Benchmarks ✈️

Model Name Type Module Year Summary Reference
Multi-layer Perceptron deep learning pyhealth.models.MLP 1986 MLP treats each feature as static Backpropagation: theory, architectures, and applications
Convolutional Neural Network (CNN) deep learning pyhealth.models.CNN 1989 CNN runs on the conceptual patient-by-visit grids Handwritten Digit Recognition with a Back-Propagation Network
Recurrent Neural Nets (RNN) deep Learning pyhealth.models.RNN 2011 RNN (includes LSTM and GRU) can run on any sequential level (e.g., visit by visit sequences) Recurrent neural network based language model
Transformer deep Learning pyhealth.models.Transformer 2017 Transformer can run on any sequential level (e.g., visit by visit sequences) Attention Is All You Need
RETAIN deep Learning pyhealth.models.RETAIN 2016 RETAIN uses two RNN to learn patient embeddings while providing feature-level and visit-level importance. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism
GAMENet deep Learning pyhealth.models.GAMENet 2019 GAMENet uses memory networks, used only for drug recommendation task GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination
MICRON deep Learning pyhealth.models.MICRON 2021 MICRON predicts the future drug combination by instead predicting the changes w.r.t. the current combination, used only for drug recommendation task Change Matters: Medication Change Prediction with Recurrent Residual Networks
SafeDrug deep Learning pyhealth.models.SafeDrug 2021 SafeDrug encodes drug molecule structures by graph neural networks, used only for drug recommendation task SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations
MoleRec deep Learning pyhealth.models.MoleRec 2023 MoleRec encodes drug molecule in a substructure level as well as the patient's information into a drug combination representation, used only for drug recommendation task MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning
Deepr deep Learning pyhealth.models.Deepr 2017 Deepr is based on 1D CNN. General purpose. Deepr : A Convolutional Net for Medical Records
ContraWR Encoder (STFT+CNN) deep Learning pyhealth.models.ContraWR 2021 ContraWR encoder uses short time Fourier transform (STFT) + 2D CNN, used for biosignal learning Self-supervised EEG Representation Learning for Automatic Sleep Staging
SparcNet (1D CNN) deep Learning pyhealth.models.SparcNet 2023 SparcNet is based on 1D CNN, used for biosignal learning Development of Expert-level Classification of Seizures and Rhythmic and Periodic Patterns During EEG Interpretation
TCN deep learning pyhealth.models.TCN 2018 TCN is based on dilated 1D CNN. General purpose An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
AdaCare deep learning pyhealth.models.AdaCare 2020 AdaCare uses CNNs with dilated filters to learn enriched patient embedding. It uses feature calibration module to provide the feature-level and visit-level interpretability AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration
ConCare deep learning pyhealth.models.ConCare 2020 ConCare uses transformers to learn patient embedding and calculate inter-feature correlations. ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context
StageNet deep learning pyhealth.models.StageNet 2020 StageNet uses stage-aware LSTM to conduct clinical predictive tasks while learning patient disease progression stage change unsupervisedly StageNet: Stage-Aware Neural Networks for Health Risk Prediction
Dr. Agent deep learning pyhealth.models.Agent 2020 Dr. Agent uses two reinforcement learning agents to learn patient embeddings by mimicking clinical second opinions Dr. Agent: Clinical predictive model via mimicked second opinions
GRASP deep learning pyhealth.models.GRASP 2021 GRASP uses graph neural network to identify latent patient clusters and uses the clustering information to learn patient GRASP: Generic Framework for Health Status Representation Learning Based on Incorporating Knowledge from Similar Patients

pyhealth's People

Contributors

bpdanek avatar chadyuu avatar conlinm avatar danicaxiao avatar dependabot[bot] avatar louis-she avatar mhermon avatar namkyeong avatar nitaagarwal2022 avatar parthapratimbanik avatar pat-jj avatar poehavshi avatar qxiaobu avatar solarsys avatar sphtkr avatar windszzlang avatar yangnianzu0515 avatar ycq091044 avatar yhzhu99 avatar yzhao062 avatar zengkaipeng avatar zlin7 avatar zzachw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyhealth's Issues

How to generate x_data as the data in datasets?

Hi, thanks for your great work! I am trying to run your code using the mimic-iii-demo dataset. The problem I met is that I don't know how to generate the x_data as the data in datasets (mimic or cms). I followed your instructions but only got the y_data after running generate_mortality_prediction_mimic_demo.py. Is this because the data in the datasets folder used the full variables of mimic-iii data while only a few of them existed in mimic-iii-demo data? Thanks!

The results obtained with pyhealth are much lower than in the paper.

Thanks for your great job!

But I wonder why the results reported on the PyHealth homepage are much lower than those reported in the paper of SafeDrug. And according to the results reported by PyHealth, GAMENet performs better than Safedrug, which contradicts the paper's results.
截屏2023-08-23 19 33 22

Below are the results reported in the SafeDrug paper,

截屏2023-08-23 19 46 27

KeyError: 'logit'

Dear Sir/Madam,

I got an error when I ran the following codes (data is from Pipeline 5: Sleep Staging):

cal_model = HistogramBinning(model)

cal_model =KCal(model)

cal_model =TemperatureScaling(model)

cal_model.calibrate(cal_dataset=val_dataset)
from pyhealth.trainer import Trainer
print(Trainer(model=cal_model, metrics=['cwECEt_adapt', 'accuracy']).evaluate(test_loader))

image
image

Any advice? Thank you.

pyhealth.calib.calibration.

Dear Sir/Madam,

I got an error in the following commands:

from pyhealth.calib.calibration.hb import HistogramBinning
cal_model = HistogramBinning(model)
cal_model.calibrate(cal_dataset=val_dataset)

val_dataset is from torch subset from BaseEHRDataset. Could you give some advice?
image
image

a question about metrics

Is there a way to parameterize DDI rate as metrics in Trainer? And output the final DDI rate in trainer.evaluate().

ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets'

Hi,

When I was testing the example in https://pyhealth.readthedocs.io/en/latest/examples_bak.html#step-3-build-deep-learning-models, I found I could not load the dataset. Could you help look into this issue? Thank you!

> python test_retain.py
Traceback (most recent call last):
  File "test_retain.py", line 1, in <module>
    from pyhealth.datasets import MIMIC3BaseDataset
ImportError: cannot import name 'MIMIC3BaseDataset' from 'pyhealth.datasets' (/Users/anaconda3/envs/ehr/lib/python3.8/site-packages/pyhealth/datasets/__init__.py)

I also tried to search the class MIMIC3BaseDataset over this repo but could not find it. Any help would be appreciated!

Question about MIMIC-iii dataset

Hi, I found that in the MoleRec paper, the processed mimic-iii dataset has 6, 350 patients and 14, 995 visits. However, I only got 5, 449 patients and 14, 141 visits when I using PyHealth to process this dataset. Here is my screenshot.
image

Entering deadlock when parsing prescriptions

I have tested some basic code from the tutorial with the MIMIC-4 dataset. But the process hanged. I press ctrl-C to exit the program and it gives the following call stacks. Seems like parallel_apply get into a deadlock or something else when parsing prescriptions.

reproducing code

import logging
from pyhealth.datasets import MIMIC4Dataset

logger = logging.getLogger("pyhealth")
logger.setLevel(logging.DEBUG)

dataset = MIMIC4Dataset(
    "/home/featurize/data/mimic-iv-2.2/hosp",
    tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
    code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
)

Callstacks after ctrl-C

Loaded NDC->ATC mapping from /home/featurize/.cache/pyhealth/medcode/NDC_to_ATC.pkl                                                                                                                                
Loaded NDC code from /home/featurize/.cache/pyhealth/medcode/NDC.pkl                                     
Loaded ATC code from /home/featurize/.cache/pyhealth/medcode/ATC.pkl                                                                                                                                               
Processing MIMIC4Dataset base dataset...            
INFO: Pandarallel will run on 6 workers.                                                                 
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.                                                                                                               
finish basic patient information parsing : 80.05470561981201s                                            
finish parsing diagnoses_icd : 134.23406291007996s                                                       
finish parsing procedures_icd : 57.97325396537781s                                                       
                                                    
^CTraceback (most recent call last):                                                                                                                                                                               
  File "main.py", line 7, in <module>
    dataset = MIMIC4Dataset(                                                                             
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 130, in __init__
    patients = self.parse_tables()                                                                                                                                                                                 
  File "/home/featurize/work/PyHealth/pyhealth/datasets/base_ehr_dataset.py", line 190, in parse_tables
    patients = getattr(self, f"parse_{table.lower()}")(patients)                                                                                                                                                   
  File "/home/featurize/work/PyHealth/pyhealth/datasets/mimic4.py", line 307, in parse_prescriptions
    group_df = group_df.parallel_apply(                                                                                                                                                                            
  File "/environment/miniconda3/envs/py38/lib/python3.8/site-packages/pandarallel/core.py", line 307, in closure
Process ForkPoolWorker-28:                                                                               
Process ForkPoolWorker-31:        
Process ForkPoolWorker-30:                                                                                                                                                                                         
Process ForkPoolWorker-32:          
Process ForkPoolWorker-33:                                                                                                                                                                                         
Process ForkPoolWorker-29:                   
    message: Tuple[int, WorkerStatus, Any] = master_workers_queue.get()                                                                                                                                            
  File "<string>", line 2, in get   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod  
Traceback (most recent call last):
Traceback (most recent call last):                                                                                                                                                                                 
Traceback (most recent call last):  
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap                                                                                                       
    self.run()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()                               
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run                                                                                                              
    self._target(*self._args, **self._kwargs)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker                                                                                                              
    task = get()                    
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker    
    task = get()                                                                                                                                                                                                   
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 114, in worker            
    task = get()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 356, in get
    res = self._reader.recv_bytes()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/environment/miniconda3/envs/py38/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

Use Pretrained Model

Hi,
How can I do training from a pre-trained model?

For example, instead of using:

from pyhealth.models import Transformer, RNN, RETAIN

model = Transformer(
    dataset=mimic3_task_ds,
    # look up what are available for "feature_keys" and "label_keys" in dataset.samples[0]
    feature_keys=["conditions", "procedures", "drugs"],
    label_key="label",
    mode="binary",

I would like to use a pre-trained Transformer or pre-trained bert model instead.
Is it possible?
@pat-jj

RxNorm codes hierarchy

Hi, I noticed something that might seem strange in the hierarchy of RxNorm (and perhaps other vocabularies).
For instance, the code 1000001 in RxNorm doesn't have any parents or children in the PyHealth hierarchy.

image
However, according Athena it looks this code has parents and children:
image

This isn't specific to this code alone; it applies to many others as well. I just used 1000001 as an example.
I would like to use PyHealth for getting the hierarchy of RxNorm .
Can you please check this? How can I get the hierarchy of RxNorm correctly?
Thank you!
@pat-jj

HALO and EHR synthetic task

I found that HALO is combined in the branch 'main Halo 2'. Will it be added to the main branch along with the EHR synthetic task? #278

Does models support text?

Dear Sir or Madam,

I have input text report as features into the transformer and it works. But I dont know if it is meaningful to do so. If it can learn from the text as language models do?

I follow the case 2 for the transformer model, each code is a radiology report.

"case 2. [[code1, code2]] or [[code1, code2], [code3, code4, code5], …]"

pip install version

Hello, I recently installed PyHealth using the command ```pip install pyhealth''', and it installed version 1.1.4. However, I noticed some discrepancies between this version and the latest code available on GitHub. For example, the multilabel_metrics_fn seems to be different.

Bug in GAMNET

Dear Sir/Madam,

When I run 'drug_recommendation_mimic4_gamenet.py' in tutorials, I get an error.

Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]queries shape torch.Size([64, 10, 128])
prev_drugs shape torch.Size([64, 10, 147])
curr_drugs shape torch.Size([64, 147])
a_s shape torch.Size([64, 9])
DM_values shape torch.Size([64, 10, 147])
Epoch 0 / 20:   0%|                                                                                               | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 103, in <module>
    model, trainer = train_gamenet(data, train_loader, val_loader)
  File "/home/featurize/PyHealth/examples/drug_recommendation_mimic4_gamenet.py", line 77, in train_gamenet
    trainer.train(
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/trainer.py", line 195, in train
    output = self.model(**data)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 410, in forward
    loss, y_prob = self.gamenet(queries, prev_drugs, curr_drugs, mask)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/featurize/work/py38/lib/python3.8/site-packages/pyhealth/models/gamenet.py", line 211, in forward
    a_m = torch.einsum("bv,bvz->bz", a_s, DM_values.float())
  File "/home/featurize/work/py38/lib/python3.8/site-packages/torch/functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: einsum(): subscript v has size 10 for operand 1 which does not broadcast with previously seen size 9

It seems it can be addressed like in the following figure. Please have a look.
1690209021253

Descriptive info on the example data needed

I found your PyHealth package a valuable resource. I am trying the test_sequence_data.ipynb notebook with example dataset. While the csv files in /datasets/mimic/y_data/ folder seems to be clear because the column names are self-explanatory, but not the ones in /datasets/mimic/x_data/ folder, which has no column names. I’ve read the readme files and online documentation, couldn’t find anything. Can you help me on this?

BTW, it would help a lot if you could add some minimal description on the data, data processing or training steps in the notebook. That would help the users a lot, because they don’t have to spend a lot of time finding the info everywhere.

question about MIMIC-III in drug recommendation task

The MIMIC-III dataset used in many of the papers (eg. SafeDrug, GAMENet, MoleRec) consists of 50,206 medical encounter records. By filtering out the patients with only one visit, they would contain 14,995 visits and 6,350 patients, In the code of drug_recommendation_mimic3_fn, they appear to have the same task as in the paper, but using "mimic3_ds= mimic3_ds.set_task(task_fn=drug_recommendation_mimic3_fn)" would only produce 911 patients and 1858 Visits, why is this?

Bug in SafeDrug

The program crashes when I run the SafeDrug for drug recommendation with the real MIMIC-III dataset. When I looked into the source code, I found the bug occurred in the function generate_molecule_info(), specifically, adjacency = Chem.GetAdjacencyMatrix(mol). https://github.com/sunlabuiuc/PyHealth/blob/5592d437abf6a06df7d41204cf56971f45e98a47/pyhealth/models/safedrug.py#L529C20-L529C20

I don't know what happened. Please provide some suggestions. Thank you.
I provide a bug case here: smile = '[F-].[Na+]'

wrong code in pyhealth\metrics\drug_recommendation.py

Hello, when you update the code, you wrote the wrong code to calculate ddi, the original code is correct. It bothered me all morning. ^ ^

if ddi_matrix[i, j] == 1 or ddi_matrix[j, i] == 1: # wrong code
if ddi_matrix[med_i, med_j] == 1 or ddi_matrix[med_j, med_i] == 1: # Old and correct code

Does the inner map cotains all the icd code id?

I use the following code to get all the icd tokens.

base_dataset2 = MIMIC4Dataset(
       root="/home/czhaobo/KnowHealth/data/physionet.org/files/mimiciv/2.0/hosp",  # 2.2 不大行
       tables=["diagnoses_icd", "procedures_icd", "prescriptions", "labevents"],
       code_mapping={"NDC": ("ATC", {"target_kwargs": {"level": 3}})},
       dev=False,
       refresh_cache=False, # 第一次用True
   )
   sample_dataset2 = base_dataset2.set_task(drug_recommendation_mimic4_fn)
   tokenizer2 = Tokenizer(
       tokens=sample_dataset2.get_all_tokens(key='conditions'),
       special_tokens=["<pad>", "<unk>"],
   )
   tokens2 = list(tokenizer2.vocabulary.idx2token.values())
   print(tokens2)
   diag_sys1, proc_sys1, med_sys1 = get_stand_system('MIMIC-III')
   diag_sys2, proc_sys2, med_sys2 = get_stand_system('MIMIC-IV')

but when i try to find their name via Innermap.lookup, i always get a key error. For example, H4011X0 is a id in tokens2,
```python
if __name__ == "__main__":
   icd9cm = InnerMap.load("ICD9CM")
   icd10cm = InnerMap.load("ICD10CM")
   print(icd9cm.lookup('H4011X0'))
   print(icd10cm.lookup('H4011X0'))

sequential_drug_recommendation

Dear sir/Madam,

In 'Advanced Case 2: Work on customized healthcare task' , I have questions about 'sequential_drugs' and 'drugs' (see following codes). It seems 'sequential_drugs' is always empty. What ' sequential_drugs[-1] = drugs' is used for at the final row?

def sequential_drug_recommendation(patient):
samples = []

sequential_conditions = []
sequential_procedures = []
sequential_drugs = [] # not include the drugs now
for visit in patient:

    # step 1: obtain feature information
    conditions = visit.get_code_list(table="DIAGNOSES_ICD")
    procedures = visit.get_code_list(table="PROCEDURES_ICD")
    drugs = visit.get_code_list(table="PRESCRIPTIONS")

    sequential_conditions.append(conditions)
    sequential_procedures.append(drugs)
    sequential_drugs.append([])

    # step 2: exclusion criteria: visits without drug
    if len(drugs) == 0: 
        sequential_drugs[-1] = drugs
        continue

    # step 3: assemble the samples
    samples.append(
        {
            "visit_id": visit.visit_id,
            "patient_id": patient.patient_id,
            # the following keys can be the "feature_keys" or "label_key" for initializing downstream ML model
            "sequential_conditions": sequential_drugs.copy(),
            "sequential_procedures": sequential_procedures.copy(),
            "sequential_drugs": sequential_drugs.copy(),
            "label": drugs,
        }
    )
    sequential_drugs[-1] = drugs

return samples

a question about ddi rate

Hi! sorry for bothering you again.

I added ddi rate as a metric and checked the ddi rate, but this value was much smaller than it should be. Besides, it seems that the ddi matrix of PyHealth is different from SafeDrug and GAMEnet. Could you please give me some help about it?

Performance of SafeDrug and Molerec

Hello, esteemed author. I noticed that when running the safedrug and molerec algorithms from your library, their performance falls far short of what is claimed in your papers and benchmarks. I would like to inquire whether you used any specific parameters or techniques during testing. Thank you for your response.

drugrec for OMOP datasets doesn't work

from pyhealth.datasets import OMOPDataset
omop_base = OMOPDataset(
    root="https://storage.googleapis.com/pyhealth/synpuf1k_omop_cdm_5.2.2",
    tables=["condition_occurrence", "procedure_occurrence"],
    code_mapping={},
)
from pyhealth.tasks import drug_recommendation_omop_fn
omop_sample = omop_base.set_task(drug_recommendation_eicu_fn)

question about eicu in drug recommendation task

Thank you so much for your work!
when I use eicu data for drug recommendation, I meet an error as:
Key drugs has mixed nested list levels across samples.

could you please tell me how to solve this problem?

Thanks in advance

Hello, the question about repositorie "Pandarallel"

When I perform prediction tasks on the mimic-iv dataset, due to the amount of mimic-iv, my code is always in deadlock, like the before issue.

I want to know the specific version about 'pandas' and 'pandarallel', thanks!

The results of SafeDrug model differ significantly from those in the paper.

Hi! sorry for bothering you again.
I ran the code for GAMENet, SafeDrug and MoleRec locally. The results of the three models are as follows:
394bb702054f7c354d1e4074b664a4a
Here is the problem: the jaccard_samples of my local SafeDrug can only reach about 0.33. Theoretically, the jaccard_samples of SafeDrug should be similar to GAMENet. Why is there such a big gap?
Additionally, why the results obtained with pyhealth are lower than in the paper? I note that the sample dataset contains 14,142 visits and 5,449 patients, which is different from the papers that contain 6,350 patients and 14,995 visits. Is it because of this?
image
Looking forward to and thank you for your reply!

Cannot download 'https://storage.googleapis.com/pyhealth/resource/NDC_to_ATC.csv'

When initialize the MIMIC3Dataset() class, I get urllib.error.URLError. And I checked the call stack, I found the problem lies in the function download_and_read_csv() of the CrossMap class. I think it's because of my own Internet connection, while I hope to open the local download permission for these files and alternate network download with local file reading.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.