Code Monkey home page Code Monkey logo

drg-llama's Introduction

DRG-LLaMA

This repository contains the code used for DRG-LLaMA : Tuning LLaMA Model to Predict Diagnosis-related Group for Hospitalized Patients and implementation instructions.

Local setup

Install dependencies. We used conda environment.

conda env create -f environment.yml

Activate conda environment.

conda activate DRG-LLaMA

MIMIC-IV pre-processing

  1. You must have obtained access to MIMIC-IV database: https://physionet.org/content/mimiciv/.
  2. Download "discharge.csv" and "drgcodes.csv" from MIMIC-IV and update "dc_summary_path" in paths.json to the file locations. We provided mapping rule file in the data folder ("my_mapping_path").
  3. We provided "DRG_34.csv" in the data folder, which is the official DRG v34.0 codes (https://www.cms.gov/icd10m/version34-fullcode-cms/fullcode_cms/P0372.html).
  4. We provided "DRG34_Mapping.csv", which is a mapping rule to unify MS-DRGs over years to a single version -- MS-DRG v34.0. Details of the method can be found in Supplemental Method 1 of the paper.
  5. In your terminal, navigate to the project directory, then type the following commands:
python -m data.MIMIC_Preprocessing

The script will generate files in "train_set_path", "test_set_path" and "id2label_path". These will be used for single label DRGs prediction.

  1. Then run the pre-processing scripts for two-lable DRGs prediction.
python -m data.Two_Label_DRG_Preprocessing

The script will generate files in "multi_train_set_path", "multi_test_set_path" and "drg_34_dissection_path".

Running the models

We provided llama_single.py and llama_two.py, which implement fine-tuning of LLaMA with LoRA for the single label and two-label approaches of DRGs prediction, respectively. We largely adopted the framework from https://github.com/tloen/alpaca-lora.

Example usaige:

python -m llama_single --base_model 'decapoda-research/llama-7b-hf' --model_size '7b'

Hyperparameters can be adjusted such as:

python -m llama_single \
    --base_model 'decapoda-research/llama-7b-hf' \
    --model_size '7b' \
    --micro_batch_size 8 \
    --num_epochs 3 \
    --learning_rate 2e-4 \
    --cutoff_len 1024 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \

clinicalBERT_single.py implements the fine-tuning of clinicalBERT for the single label DRGs prediciton. It can be run as:

python -m clinicalBERT_single.py --base_model ""emilyalsentzer/Bio_ClinicalBERT""

Please refer to https://github.com/JHLiu7/EarlyDRGPrediction for the implementation of CAML. We adopted evaluation functions in CAML to compute performance metrics (utils/eval_utils.py). The details on the inference of MS-DRG from predicted base DRG and CC/MCC status (funciton 'map_rule' in eval_utils.py) can be found in Supplemental Method 2 of the paper.

drg-llama's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.