Code Monkey home page Code Monkey logo

thesis-project's Introduction

Thesis-project ๐Ÿ˜

This repository contains R code written for my thesis project.

The thesis project consists of a case-control study with several sensitivity analysis using RAMQ administrative data. For a brief introduction to the project, please read the Abstract

This document explains the order of files that manipulate and analyze the data

Data Cleaning & Manipulation

  1. Data import: read_files.R. This file reads the raw csv files, namely, the demo, intervention, diagnostic dataset and the code book for ICD and CCI. Study population age range were defined from the demo table.

  2. Case selection: Case_selection.R. This file defines the cases for the case-control study in the main analysis. Cases were defined as individuals having been diagnosed AS during hospitalization or having underwent a SAVR procedure. Patients with previous history of AS, mitral stenosis, congenital AS, rheumatic AS were excluded from the study population. After running this script, you will have a file called case_final.csv and another one called complete_set.csv These two files are the input to a SAS algorithm developed by Michael for case-control matching.

Once the study cohort was created, covariates predefined in the study protocol were ascertained from in-hospital diagnostic data and prescription data

  1. Load and clean prescription data: 2A.clean_pharma.R. The files reads all 10 year prescription data and cdenom code book. also, prescription history were defined by ahf code

  2. Ascertain prescription history for study cohort: 2B.covariate_pharma.R This files contains a function written to ascertain whether individuals in the study cohort has been exposed to certain classes of drugs before cohort entry

  3. Ascertain disease diagnosis history for study cohort: 2C.covariate_diagnostic.R This file contains a script to define disease history for individuals in the study cohort using in-hospital diagnostic data.

  4. Ascertain covariates: confounding_ascertainment.R This script combines information from diagnsotic data and prescription history for each individual in the study cohort to ascertain covariates status for logistic regression model

  5. Calcualte charlson index charlson_index.R This script prepares the data from charlson index calculation for each individuals in the study cohort using Lyne's SAS script. The returned dataset is the cleaned dataset for analysis.

Data Exploration and Analysis

  1. EDA_case_control.Rmd is a Rmarkdown files which serves as a template for exploratory data analysis and regression model analysis for the main analysis and some sensitivity analysis. The reports generated from this Rmarkdown files includes descriptive analysis of the study cohort and results from conditional logistic regression model. Notice, script for exposure definition were also included in this Rmarkdown file.

Sensitivity analysis

For the thesis project, a number of sensitivity analysis were conducted to test the robustness of the study result.

Sensitivity analysis 1

In this analysis, case definition was altered to include information from physician billing code (ie outpatient visit were taken into consideration to define cases)

load_billing_data.R reads all physician billing data into R.

Cases were defined from AS_bill_selection.R All the following data manipulation steps were the same for this analysis

Sensitivity analysis 2&3

In this analysis Death due to AS or Death of any cause was included in the case. The goal of this analysis was to capture cases from death due to AS. Since reason of death is not thoroughly reported, to investigate the extend of missing report, another analysis was conducted to include death of all causes.

Death cases were ascertained from the death registry deces_as.R Cases were defined using script AS&death.R

For reporting, Competing_risk_as.Rmd was used as the template.

Sensitivity analysis 4

Aortic stenosis features a long period time during which a patient remains asymptomatic. A patient could have developed AS long before being diagnosed. To study the long term effect of LTRAs on AS development, we re-defined cases when they have longer follow-up time

In supplement to case_selection.R, define_case_by_fu(sensitivity).R constructs the study cohort for this analysis.

Sensitivity analysis 5

From exploratory data analysis from the main analysis, we realized an over-sampling issue with our case-control cohort. Since the source population of the study contains a relatively old population, many of our cases had short follow-up time in the study, thus controls who entered close to study end date were over-sampled to match on study follow-up time.

To study the effect, we conducted another sensitivity analysis where case and controls were matched on study entry year and follow-up time. The case-control matching algorithm can be found here All other data manipulation steps were the same as the main analysis.

The reporting template for this analysis can be found in EDA_case_control-match on calendar time.Rmd

thesis-project's People

Forkers

brophyj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.