Code Monkey home page Code Monkey logo

llms_in_health_sciences's Introduction

CSE 635 Project - LLMs in Health Sciences

Team Name - Context Clan

Team Members

  • Leela Srija Alla
    • UBIT Name: lalla
  • Vishnu Teja jampala
    • UBIT NAME: vjampala

Dataset

link

Example Training Data CTR

{
    "Clinical Trial ID": "NCT01537029",
    "Intervention": [
        "INTERVENTION 1: ",
        "  Doxorubicin and Cyclophosphamide",
        "  Doxorubicin: Dosed by the patient's treating physician according to local standard of care.",
        "  Cyclophosphamide: dosage form: IV, Dosage, frequency, and duration: According to local standard of care"
    ],
    "Eligibility": [
        "Inclusion Criteria:",
        "  WHO performance status 0 or 1",
        "Exclusion Criteria:",
        "  Participants unwilling to comply with study procedures.",
        "  CrCl < 10 ml/min"
    ],
    "Results": [
        "Outcome Measurement: ",
        "  Clearance (Cl) for Doxorubicin and Cyclophosphamide",
        "  Time frame: 0-48 hours",
        "Results 1: ",
        "  Arm/Group Title: Doxorubicin and Cyclophosphamide",
        "  Arm/Group Description: Doxorubicin: Dosed by the patient's treating physician according to local standard of care."
        
    ],
    "Adverse Events": [
        "Adverse Events 1:",
        "  Total: 0/15 (0.00%)"
    ]
}

Training Data Labels

Single

   "40f1d3ce-2ff8-4177-9b11-0bf10b7f6591": {
        "Type": "Single",
        "Section_id": "Results",
        "Primary_id": "NCT00259090",
        "Statement": "the primary trial studies the impact of Fulvestrant, Anastrozole on Oestrogen Receptor H-score.",
        "Label": "Entailment",
        "Primary_evidence_index": [
            1
        ]
    }

Comparison

  "20545360-b2a1-4be9-997a-97040866b239": {
        "Type": "Comparison",
        "Section_id": "Eligibility",
        "Primary_id": "NCT00880464",
        "Secondary_id": "NCT00458237",
        "Statement": "Patients with AIDS are eligible for both the secondary trial and the primary trial.",
        "Label": "Contradiction",
        "Primary_evidence_index": [
            14,
            18
        ],
        "Secondary_evidence_index": [
            12,
            26
        ]
    }

Architecture of Model

Local Image

Evidence Selection

Local Image

Entailment Task

Local Image

Code Execution

Baseline Model Implementations

We considered three Large Language Models (LLMs) and Sebis as our baseline models. The files are provided in the directory ./src/code/milestone_2.

LLM Notebooks

  • biobert-base-cased-v1-2-results.ipynb
  • deberta-v3-small-mnli-fever-docnli-ling-2c-results.ipynb
  • debertav3small-nli4ct_results.ipynb

These notebooks utilize CSV files located in /data/Training Data csv.zip.

CSV File Paths

The paths specified in the above notebooks need to be updated when reading the following files:

  • train_hypothesis_evidences.csv
  • dev_hypothesis_evidences.csv
  • Numerical_Statements_hypothesis_evidences.csv
  • Non_Numerical_Statements_hypothesis_evidences.csv
  • Single_hypothesis_evidences.csv
  • Comparison_hypothesis_evidences.csv
  • AdverseEvents_hypothesis_evidences.csv
  • Results_hypothesis_evidences.csv
  • Eligibility_hypothesis_evidences.csv
  • Intervention_hypothesis_evidences.csv

Additionally, the paths to save .pt files in these notebooks should also be changed accordingly.

Sebis Notebooks

  • Pipeline Model -notfinetuned-pipeline-whole_results.ipynb
  • Joint Model - sebis-joint-debertav3.ipynb

These notebooks use files located in Training DATA json.zip.

JSON File Paths

The paths to .json files in the above two notebooks should be updated as needed.

Error Analysis

  • Error Analysis was done for Sebis Pipeline as it gave the best results and the code is provided in ./src/code/milestone_2/error-analysis-milestone2.ipynb
  • Save the model.safetensors and config.json after running src/code/milestone_2 /notfinetuned-pipeline-whole_results.ipynb and use this model in ./src/code/milestone_2/error-analysis-milestone2.ipynb by changing path - model_nli_path
# trained model checkpoint
model_nli_path = "/kaggle/input/sebis-not-finetuned"
DEV_PATH = "/kaggle/input/data-json/data/dev.json"

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_nli_path, model_max_length=1024)
  • sebis-not-finetuned should contain model.safetensors and config.json
  • The paths to .json files in this notebook should be updated as needed.

Fine-tuned Models

The finetuned models are provided in the directory ./src/code/milestone_3.

For finetuning -

  • Us the code in ./src/code/milesone_3/finetuned-deberta-v3-small-drop.ipynb.
  • Save the model.safetensors and config.json after running this and use this model in Sebis. Here's the updated train function which includes changes to the path for loading a model:
def train(model_name):
   
    # Load the models. Adjust max instance length to fit your machine.
    tokenizer = AutoTokenizer.from_pretrained(model_name, model_max_length=1024, use_safetensors=True)
    model = AutoModelForSequenceClassification.from_pretrained('./path/to/directory/containing model.safetensors and config.json',
                                                                num_labels=2, ignore_mismatched_sizes=True)

Sebis finetuned Notebooks

  • Finetuned with Dataset 1 - finetuned-pipeline-results-small-dataset.ipynb
  • Finetuned with Dataset 2 - finetuned-pipeline-results-large-dataset.ipynb

These notebooks use files located in Training DATA json.zip.

JSON File Paths

The paths to .json files in the above two notebooks should be updated as needed.

Error Analysis

  • Error Analysis was done for Sebis Pipeline fine-tuned on Dataset 2 as it gave the best results and the code is provided in src/code/milestone_3 /error-analysis-milestone3.ipynb
  • Save the model.safetensors and config.json after running /src/code/milestone_3 /finetuned-pipeline-results-large-dataset.ipynb and use this model in src/code/milestone_3 /error-analysis-milestone3.ipynb by changing path - model_nli_path
# trained model checkpoint
model_nli_path = "/kaggle/input/sebis-final-model"
DEV_PATH = "/kaggle/input/data-json/data/dev.json"

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_nli_path, model_max_length=1024)
  • The paths to .json files in this notebook should be updated as needed.

llms_in_health_sciences's People

Contributors

vishnujampalaub avatar srijaalla avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.