Code Monkey home page Code Monkey logo

predrupdate's Introduction

predRupdate

R-CMD-check Codecov test coverage CRAN status metacran downloads

The goal of predRupdate is to provide a suite of functions for validating a existing (i.e. previously developed) prediction/ prognostic model, and for applying model updating methods to said model, according to an available dataset.

Installation

The package can be installed from CRAN as follows:

install.packages("predRupdate")

Development version

You can install the development version of predRupdate from GitHub with::

# install.packages("devtools")
devtools::install_github("GlenMartin31/predRupdate")

Example

One main use of this package is to externally validate an existing (previously developed) prediction model. This can be achieved with the following code:

# create a data.frame of the model coefficients, with columns being variables
coefs_table <- data.frame("Intercept" = -3.4,
                          "SexM" = 0.306,
                          "Smoking_Status" = 0.628,
                          "Diabetes" = 0.499,
                          "Creatinine" = 0.538)

#pass this into pred_input_info()
Existing_Logistic_Model <- pred_input_info(model_type = "logistic",
                                           model_info = coefs_table)
summary(Existing_Logistic_Model)

#validate this model against an available dataset
pred_validate(x = Existing_Logistic_Model,
              new_data = SYNPM$ValidationData,
              binary_outcome = "Y")

Getting help

If you encounter a bug, please file an issue with a minimal reproducible example on GitHub.

predrupdate's People

Contributors

glenmartin31 avatar david-a-jenkins avatar mattsperrin avatar

Stargazers

 avatar Yiran Zhang avatar Owain  gaunders avatar Daniele Giardiello avatar  avatar Hao Liang avatar  avatar

Watchers

 avatar  avatar

predrupdate's Issues

Create pm_update()

Overview

Create functions to allow one to apply model updating methods to an existing prediction model

Intended Outcome

This function should take an existing clinical prediction model (could be of class "pminfo" or simply a vector of linear predictors and vector of outcomes in the validation/updating dataset), allow the user to specify the type of update (intercept update, recalibration, individual term revision, new terms, full model refit), and then output the new prediction model.

Tasks

  • Create individual functions that apply different types of model updating
  • Create a wrapper function, that takes two possible methods: either an object of class "pminfo" or a vector of linear predictors and vector of outcomes in the validation/updating dataset (the latter assuming the user has calculated the linear predictor in the validation/update dataset themselves)
  • Output the updated prediction model (coefficients, parameter names, and possibly predicted risks from the new model)
  • Define the output as class "pminfo" such that print/summary methods for that class apply to the updated model

Testing the external validation of a model containing stratification

Describe the bug
Dear {predRupdate} team,

I have a prediction model that was created containing four predictors and another variable which was used as a stratification variable. My aim is to investigate the external validation of this model using a new sample. I have included all the predictors of the new sample as the new_data including the stratification variable (that is, pain (see the screenshot below). However, I'm not quite sure how to include this stratification variable in the {pred_input_info} or in the {pred_validate}. These were the commands that I used without the stratification variable:

coefs_table <- data.frame("change" = 0.31644,
"previous" = -0.07831,
"duration" = -0.12918,
"depressao" = -0.16246)
Existing_TTE_Model <- pred_input_info(model_type = "survival",
model_info = coefs_table,
cum_hazard = NULL)
data <- read.csv("external_data.csv",sep=',')
validation_results <- pred_validate(x = Existing_TTE_Model,
new_data = data,
survival_time = "time",
event_indicator = "event",
time_horizon = 3)

Expected behavior
I was able to have the calibration slope to this model, but I believe that the results would be different when including the stratification variable:

Calibration Measures

                    Estimate Std. Err Lower 95% Confidence Interval

Observed:Expected Ratio NA NA NA
Calibration Slope -0.5715 0.2906 -1.1411
Upper 95% Confidence Interval
Observed:Expected Ratio NA
Calibration Slope -0.002

Also examine the calibration plot, if produced.

Discrimination Measures

      Estimate Std. Err Lower 95% Confidence Interval

Harrell C 0.4387 0.0312 0.3775
Upper 95% Confidence Interval
Harrell C 0.4999

Also examine the histogram of predicted risks.

Screenshots
This is the screenshot of the dataframe (new_data):

image

Desktop (please complete the following information):

  • OS

Additional context
Thanks for the help.
Cheers.

Release predRupdate 0.2.0

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • Finish & publish blog post
  • Add link to blog post in pkgdown news menu
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)
  • Tweet

Create Model Aggregation Functions

Overview

Create functions that apply model aggregation (based on stacked regression and Martin et al)

Intended Outcome

This function should take a set/list of existing models (e.g. a list where each element is of class "pminfo"), and allow the user to apply stacked regression to these existing models to combine them. As an advanced feature, the function should also allow multiple model updating as outlined by Martin et al https://onlinelibrary.wiley.com/doi/full/10.1002/sim.7586

Tasks

Advance the pm_input_info function

Overview

The current version of the pm_input_info() function can take a (named) vector of coefficients, a functional form of the existing prediction model, and a new dataset, and standardise the inputs into a common "blueprint" for further evaluation in other functions in the package. We now need to advance this key function in several ways.

Intended Outcome

A full version of the pm_input_info() function that can handle both logistic regression and time-to-event existing prediction models. It also needs to correctly handle any non-linear/spline terms within the existing model (or variable transformations), such that the function ensures compatibility between all the inputs, in the most user friendly way possible.

Tasks

  • Update to deal with factor variables correctly - i.e. create dummy variables to match naming given in existing coefficients
  • Update to allow for variable transformations/ splines within the model, while the function checks compatibility
  • Update to deal with variable interactions
  • Update to allow for time-to-event based prediction models
  • Add rigorous error-checking statements at the start of the function to ensure all inputs are entered as required (while also ensuring that the format of the inputs is the most user-friendly as possible)
  • Add testthat - check error messages and some simple outputs
  • change the options for input "model_type" to be more informative for the time-to-event based prediction models (i.e. current option of "survival" is too generic)
  • Within the input "baselinehazard", allow an option for a parametric distribution to be specified along with values for the associated parameters
  • input "event_indicator" could be extended to account for competing risks?
  • The input of "survivival_time" has a typo, which need correcting
  • Change the handling of factor variables; more logical to handle this within pre_processing input, rather than users needing to do this separately
  • Create internal function to do the pre_processing steps to remove clutter in the main pm_input_info function - will make this part easier to maintain
  • Think about how best to generalise for multiple models as inputs. Either multiple pm_input_info() calls, one for each model, or change the code such that the inputs for this function become lists with an element for each pre-existing model?
  • Change the class of the output to depend on model type rather than a generic "pminfo" class. Specifically, have child classes for each kind of model (i.e. survival, logistic) that are then handled as appropriate by pm_predict's UseMethod dispatch (and ultimately similar for pm_validate etc). This will make it easier to add additional methods over time
  • Create internal function to do input checks for pm_input_info(), to help readability of main code. Also create internal functions per model type, rather than repeated if statements - will make it easier to add modelling methods later if more modulated
  • Add handling of "+." in formula such that all variables of newdata are considered in the functional form of the regression model. Need to think how this will interact with pre_processing

Using model terms in pred_input_info() + extension to competing risks

Dear {predRupdate} team,

Thank you for developing this (much needed) package!

Not an issue but more of a suggestion (sorry if this is the wrong place..): as part of a paper on validating competing risks prediction models, we also thought about this question of validating an existing model/how one would go about sharing an existing model (without sharing the data the model was developed on).. so we made a short script showing how this could be done.

The idea is pretty much the same as the one in this package, with our model_info list analogous to a pred_input_info object, and predictRisk_shared_CSC() is basically what you would need inside pred_predict() if you were to extend to models based on cause-specific proportional hazards. We also thought that sharing the terms attribute of a model was better than sharing the formula, since the former is much more useful when splines are used in the model (it has the knots locations, and is easy to use with model.matrix on new data).

I guess it is a bit early in the development on the package, but hopefully some of this is useful if you are eventually considering extending to accommodate competing risks.

Thanks!

Add survival option to pm_predict()

Overview

The currrent version of pm_predict() can make predictions (given a supplied object of class pminfo) where the model_type="logistic". Need to extend this to also make predictions for a time-to-event based prediction model.

Intended Outcome

The output from the function should be a vector of predicted risks - given the time-to-event prediction model specified in a pminfo object - for each observation at a given follow-up time. If multiple prediction horizons (times at which S(t) should be calculated), then the returned output should be a matrix, with n (observation) rows and t (time points) columns; each cell is then S(t) for that individual at time=t.

Tasks

  • Create a function pm_calculate_survival to do the calculates of S(t) given an object of class pminfo with model_type="survival"
  • Output the predictions from pm_predict() - currently embeded in the pminfo object on output, which is not ideal; consider outputting as a data.frame, with any "true" outcomes so makes it easier when testing performance (for outcomes might need to pass these into pm_input_info as an optional argument to pull from newdata - would need to make clear that such outcomes are not used for predictions)

Release predRupdate 0.1.1

Prepare for release:

  • git pull
  • Check current CRAN check results
  • Polish NEWS
  • urlchecker::url_check()
  • devtools::build_readme()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • revdepcheck::revdep_check(num_workers = 4)
  • Update cran-comments.md
  • git push

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version(push = TRUE)

Release predRupdate 0.1.0

First release:

Prepare for release:

  • git pull
  • Check if any deprecation processes should be advanced, as described in Gradual deprecation
  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • devtools::check_mac_release()
  • rhub::check_for_cran()
  • Draft blog post
  • git push

Submit to CRAN:

  • usethis::use_version('minor')
  • Final devtools::check(remote = TRUE, manual = TRUE) check
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • git push
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • git push
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Create function to 'input' information about an existing prediction model

Overview

The first function needed is one that takes a vector of coefficients (e.g. as published by the existing model) and a new dataset (the local data on which one wishes to apply the previous/existing prediction model) to calculate the predicted risks

Intended Outcome

Input the relevant information of an existing prediction model (e.g. coefficients), and the ‘new data’, in such a way that the two are compatible with each other, and use these inputs to output each individuals (in the new data) predicted risk and linear predictor.

Tasks

  • Create function outline
  • Create tests within function that ensures inputs are compatible
  • Add matrix multiplication of coefficients and new data to calculate the linear predictor; convert the LP into predicted risk and output both of these quantities

pm_input_info outstanding tasks

Overview

Suggest the following tasks for pm_input_info()

  • The current implementation of pm_input_info() assumes that the functional form of the existing prediction model is a linear sum of each column of model_info input. Suggest to provide an optional argument (e.g., formula) that allows users to specify a different functional form
  • Currently, for model_type is either "logistic" or "survival" - latter is a bit too generic; consider adapting such that users can say the parametric form of the model (e.g. Weibull baseline) - baseline hazard parameter could then be defined using parametric estimates rather than users needing to define a data.frame with set times and H_0(t).

Create pm_validate()

Overview

Create a function that can take, as inputs, a vector of predicted risks and a vector of outcomes, to calculate the predictive performance (calibration, discrimination, overall accuracy) of the model. Do this initially for logistic regression, then extend to performance measures for time-to-event based prediction models.

Intended Outcome

Function called pm_validate() that takes predicted risks and observed outcomes (in the validation data) as inputs, and outputs a table with all the performance metrics relevant to the underlying analytical model (e.g. logistic regression or time-to-event).

Tasks

  • Create separate functions for each of calibration (calibration plot, calibration-in-the-large, calibration slope), discrimination (area under ROC curve) and overall accuracy (Brier score, R-squared) - focus initially on logistic regression based prediction models
  • Create a wrapper function for each of the separate performance metric functions - it is this wrapper that will be exported in the package
  • Set a class for the output of the function, and create print and summary options
  • Add calibration plot to validate_probabilities() function, with options for user to specify if the plot is created or not; consider both binned and flexible (spline-based) calibration plot
  • Extend the function to time-to-event based prediction models - create a separate wrapper function called pm_validate_survival()

Add functionality for linear models

Overview

Add functionality for linear (continuous outcome) models

Tasks

  • Update all package functions to handle prediction models for continuous outcomes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.