glenmartin31 / predrupdate Goto Github PK

R package to test/ validate the predictive performance of an existing prediction/ prognostic model, and apply model updating and aggregation methods

Home Page: https://glenmartin31.github.io/predRupdate/

License: Other

R 100.00%

predrupdate's Introduction

predRupdate

The goal of predRupdate is to provide a suite of functions for validating a existing (i.e. previously developed) prediction/ prognostic model, and for applying model updating methods to said model, according to an available dataset.

Installation

The package can be installed from CRAN as follows:

install.packages("predRupdate")

Development version

You can install the development version of predRupdate from GitHub with::

# install.packages("devtools")
devtools::install_github("GlenMartin31/predRupdate")

Example

One main use of this package is to externally validate an existing (previously developed) prediction model. This can be achieved with the following code:

# create a data.frame of the model coefficients, with columns being variables
coefs_table <- data.frame("Intercept" = -3.4,
                          "SexM" = 0.306,
                          "Smoking_Status" = 0.628,
                          "Diabetes" = 0.499,
                          "Creatinine" = 0.538)

#pass this into pred_input_info()
Existing_Logistic_Model <- pred_input_info(model_type = "logistic",
                                           model_info = coefs_table)
summary(Existing_Logistic_Model)

#validate this model against an available dataset
pred_validate(x = Existing_Logistic_Model,
              new_data = SYNPM$ValidationData,
              binary_outcome = "Y")

Getting help

If you encounter a bug, please file an issue with a minimal reproducible example on GitHub.

predrupdate's People

Contributors

Stargazers

Watchers

Forkers

david-a-jenkins danielegiardiello edbonneville

predrupdate's Issues

Create pm_update()

Overview

Create functions to allow one to apply model updating methods to an existing prediction model

Intended Outcome

This function should take an existing clinical prediction model (could be of class "pminfo" or simply a vector of linear predictors and vector of outcomes in the validation/updating dataset), allow the user to specify the type of update (intercept update, recalibration, individual term revision, new terms, full model refit), and then output the new prediction model.

Tasks

Create individual functions that apply different types of model updating
Create a wrapper function, that takes two possible methods: either an object of class "pminfo" or a vector of linear predictors and vector of outcomes in the validation/updating dataset (the latter assuming the user has calculated the linear predictor in the validation/update dataset themselves)
Output the updated prediction model (coefficients, parameter names, and possibly predicted risks from the new model)
Define the output as class "pminfo" such that print/summary methods for that class apply to the updated model

Testing the external validation of a model containing stratification

Describe the bug
Dear {predRupdate} team,

I have a prediction model that was created containing four predictors and another variable which was used as a stratification variable. My aim is to investigate the external validation of this model using a new sample. I have included all the predictors of the new sample as the new_data including the stratification variable (that is, pain (see the screenshot below). However, I'm not quite sure how to include this stratification variable in the {pred_input_info} or in the {pred_validate}. These were the commands that I used without the stratification variable:

coefs_table <- data.frame("change" = 0.31644,
"previous" = -0.07831,
"duration" = -0.12918,
"depressao" = -0.16246)
Existing_TTE_Model <- pred_input_info(model_type = "survival",
model_info = coefs_table,
cum_hazard = NULL)
data <- read.csv("external_data.csv",sep=',')
validation_results <- pred_validate(x = Existing_TTE_Model,
new_data = data,
survival_time = "time",
event_indicator = "event",
time_horizon = 3)

Expected behavior
I was able to have the calibration slope to this model, but I believe that the results would be different when including the stratification variable:

Calibration Measures

                    Estimate Std. Err Lower 95% Confidence Interval

Observed:Expected Ratio NA NA NA
Calibration Slope -0.5715 0.2906 -1.1411
Upper 95% Confidence Interval
Observed:Expected Ratio NA
Calibration Slope -0.002

Also examine the calibration plot, if produced.

Discrimination Measures

      Estimate Std. Err Lower 95% Confidence Interval

Harrell C 0.4387 0.0312 0.3775
Upper 95% Confidence Interval
Harrell C 0.4999

Also examine the histogram of predicted risks.

Screenshots
This is the screenshot of the dataframe (new_data):

Desktop (please complete the following information):

Additional context
Thanks for the help.
Cheers.

Release predRupdate 0.2.0

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Create Model Aggregation Functions

Overview

Create functions that apply model aggregation (based on stacked regression and Martin et al)

Intended Outcome

This function should take a set/list of existing models (e.g. a list where each element is of class "pminfo"), and allow the user to apply stacked regression to these existing models to combine them. As an advanced feature, the function should also allow multiple model updating as outlined by Martin et al https://onlinelibrary.wiley.com/doi/full/10.1002/sim.7586

Tasks

Create stacked regression function
Extend to allow for multiple-model updating (https://onlinelibrary.wiley.com/doi/full/10.1002/sim.7586)

Advance the pm_input_info function

Overview

The current version of the pm_input_info() function can take a (named) vector of coefficients, a functional form of the existing prediction model, and a new dataset, and standardise the inputs into a common "blueprint" for further evaluation in other functions in the package. We now need to advance this key function in several ways.

Intended Outcome

A full version of the pm_input_info() function that can handle both logistic regression and time-to-event existing prediction models. It also needs to correctly handle any non-linear/spline terms within the existing model (or variable transformations), such that the function ensures compatibility between all the inputs, in the most user friendly way possible.

Tasks

Using model terms in pred_input_info() + extension to competing risks

Dear {predRupdate} team,

Thank you for developing this (much needed) package!

Not an issue but more of a suggestion (sorry if this is the wrong place..): as part of a paper on validating competing risks prediction models, we also thought about this question of validating an existing model/how one would go about sharing an existing model (without sharing the data the model was developed on).. so we made a short script showing how this could be done.

The idea is pretty much the same as the one in this package, with our model_info list analogous to a pred_input_info object, and predictRisk_shared_CSC() is basically what you would need inside pred_predict() if you were to extend to models based on cause-specific proportional hazards. We also thought that sharing the terms attribute of a model was better than sharing the formula, since the former is much more useful when splines are used in the model (it has the knots locations, and is easy to use with model.matrix on new data).

I guess it is a bit early in the development on the package, but hopefully some of this is useful if you are eventually considering extending to accommodate competing risks.

Thanks!

Add survival option to pm_predict()

Overview

The currrent version of pm_predict() can make predictions (given a supplied object of class pminfo) where the model_type="logistic". Need to extend this to also make predictions for a time-to-event based prediction model.

Intended Outcome

The output from the function should be a vector of predicted risks - given the time-to-event prediction model specified in a pminfo object - for each observation at a given follow-up time. If multiple prediction horizons (times at which S(t) should be calculated), then the returned output should be a matrix, with n (observation) rows and t (time points) columns; each cell is then S(t) for that individual at time=t.

Tasks

Create a function pm_calculate_survival to do the calculates of S(t) given an object of class pminfo with model_type="survival"
Output the predictions from pm_predict() - currently embeded in the pminfo object on output, which is not ideal; consider outputting as a data.frame, with any "true" outcomes so makes it easier when testing performance (for outcomes might need to pass these into pm_input_info as an optional argument to pull from newdata - would need to make clear that such outcomes are not used for predictions)

Release predRupdate 0.1.1

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
usethis::use_dev_version(push = TRUE)

Release predRupdate 0.1.0

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

Submit to CRAN:

usethis::use_version('minor')
Final devtools::check(remote = TRUE, manual = TRUE) check
devtools::submit_cran()
Approve email

Wait for CRAN...

Create function to 'input' information about an existing prediction model

Overview

The first function needed is one that takes a vector of coefficients (e.g. as published by the existing model) and a new dataset (the local data on which one wishes to apply the previous/existing prediction model) to calculate the predicted risks

Intended Outcome

Input the relevant information of an existing prediction model (e.g. coefficients), and the ‘new data’, in such a way that the two are compatible with each other, and use these inputs to output each individuals (in the new data) predicted risk and linear predictor.

Tasks

Create function outline
Create tests within function that ensures inputs are compatible
Add matrix multiplication of coefficients and new data to calculate the linear predictor; convert the LP into predicted risk and output both of these quantities

pm_input_info outstanding tasks

Overview

Suggest the following tasks for pm_input_info()

The current implementation of pm_input_info() assumes that the functional form of the existing prediction model is a linear sum of each column of model_info input. Suggest to provide an optional argument (e.g., formula) that allows users to specify a different functional form
Currently, for model_type is either "logistic" or "survival" - latter is a bit too generic; consider adapting such that users can say the parametric form of the model (e.g. Weibull baseline) - baseline hazard parameter could then be defined using parametric estimates rather than users needing to define a data.frame with set times and H_0(t).

Create pm_validate()

Overview

Create a function that can take, as inputs, a vector of predicted risks and a vector of outcomes, to calculate the predictive performance (calibration, discrimination, overall accuracy) of the model. Do this initially for logistic regression, then extend to performance measures for time-to-event based prediction models.

Intended Outcome

Function called pm_validate() that takes predicted risks and observed outcomes (in the validation data) as inputs, and outputs a table with all the performance metrics relevant to the underlying analytical model (e.g. logistic regression or time-to-event).

Tasks

Create separate functions for each of calibration (calibration plot, calibration-in-the-large, calibration slope), discrimination (area under ROC curve) and overall accuracy (Brier score, R-squared) - focus initially on logistic regression based prediction models
Create a wrapper function for each of the separate performance metric functions - it is this wrapper that will be exported in the package
Set a class for the output of the function, and create print and summary options
Add calibration plot to validate_probabilities() function, with options for user to specify if the plot is created or not; consider both binned and flexible (spline-based) calibration plot
Extend the function to time-to-event based prediction models - create a separate wrapper function called pm_validate_survival()

Add functionality for linear models

Overview

Add functionality for linear (continuous outcome) models

Tasks

Update all package functions to handle prediction models for continuous outcomes

glenmartin31 / predrupdate Goto Github PK

predrupdate's Introduction

predRupdate

Installation

Development version

Example

Getting help

predrupdate's People

Contributors

Stargazers

Watchers

Forkers

predrupdate's Issues

Overview

Intended Outcome

Tasks

Calibration Measures

Discrimination Measures

Overview

Intended Outcome

Tasks

Overview

Intended Outcome

Tasks

Overview

Intended Outcome

Tasks

Overview

Intended Outcome

Tasks

Overview

Overview

Intended Outcome

Tasks

Overview

Tasks

Recommend Projects

Recommend Topics

Recommend Org