Code Monkey home page Code Monkey logo

wxwx1993 / gpsmatching Goto Github PK

View Code? Open in Web Editor NEW
28.0 3.0 16.0 12.8 MB

R Package for "Matching on generalized propensity scores with continuous exposures". An innovative approach for estimating causal effects using observational data in settings with continuous exposures, and a new framework for GPS caliper matching that jointly matches on both the estimated GPS and exposure levels to fully adjust for confounding bias.

Home Page: https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2144737

R 96.94% C++ 1.56% Dockerfile 0.33% TeX 1.16%
continuous-exposures causal-inference

gpsmatching's Introduction

CausalGPS

Resource Github Actions Code Coverage
Platforms Windows, macOS, Linux codecov
R CMD check R build status codecov

Matching on generalized propensity scores with continuous exposures

Summary

An R package for implementing matching on generalized propensity scores with continuous exposures. We developed an innovative approach for estimating causal effects using observational data in settings with continuous exposures, and introduce a new framework for GPS caliper matching that jointly matches on both the estimated GPS and exposure levels to fully adjust for confounding bias.

Installation

  • Installing from source
library("devtools")
install_github("NSAPH-Software/CausalGPS")
library("CausalGPS")
  • Installing from CRAN
install.packages("CausalGPS")
  • Setting up docker environment

Developing Docker image can be downloaded from Docker Hub. See more details in docker_singularity.

Usage

Input parameters:

Y A vector of observed outcome variable.
w A vector of observed continuous exposure variable.
c A data.frame or matrix of observed covariates variable.
ci_appr The causal inference approach. Possible values are:

  • "matching": Matching by GPS
  • "weighting": Weighting by GPS
    gps_model Model type which is used for estimating GPS value, including parametric (default) and non-parametric.
    use_cov_transform If TRUE, the function uses transformer to meet the covariate balance.
    transformers A list of transformers. Each transformer should be a unary function. You can pass name of customized function in the quotes.
    Available transformers:
  • pow2: to the power of 2
  • pow3: to the power of 3
    bin_seq Sequence of w (treatment) to generate pseudo population. If NULL is passed the default value will be used, which is seq(min(w)+delta_n/2,max(w), by=delta_n).
    trim_quantiles A numerical vector of two. Represents the trim quantile level. Both numbers should be in the range of [0,1] and in increasing order (default: c(0.01,0.99)).
    optimized_compile If TRUE, uses counts to keep track of number of replicated pseudo population.
    params Includes list of params that is used internally. Unrelated parameters will be ignored.
    sl_lib: A vector of prediction algorithms. nthread An integer value that represents the number of threads to be used by internal packages.
    ... Additional arguments passed to different models.

Additional parameters

Causal Inference Approach (ci.appr)

  • if ci.appr = 'matching':

    • matching_fun: Matching function. Available options:
      • matching_l1: Manhattan distance matching
    • delta_n: caliper parameter.
    • scale: a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS.
    • covar_bl_method: covariate balance method. Available options:
      • 'absolute'
    • covar_bl_trs: covariate balance threshold
    • covar_bl_trs_type: covariate balance type (mean, median, maximal)
    • max_attempt: maximum number of attempt to satisfy covariate balance.
    • See [create_matching()] for more details about the parameters and default values.
  • if ci.appr = 'weighting':

    • covar_bl_method: Covariate balance method.
    • covar_bl_trs: Covariate balance threshold
    • max_attempt: Maximum number of attempt to satisfy covariate balance.
  • Generating Pseudo Population

pseudo_pop <- generate_pseudo_pop(Y,
                                  w,
                                  c,
                                  ci_appr = "matching",
                                  gps_model = "parametric",
                                  use_cov_transform = TRUE,
                                  transformers = list("pow2", "pow3"),
                                  sl_lib = c("m_xgboost"),
                                  params = list(xgb_nrounds = 50,
                                                xgb_max_depth = 6,
                                                xgb_eta = 0.3,
                                                xgb_min_child_weight = 1),
                                  nthread = 1,
                                  covar_bl_method = "absolute",
                                  covar_bl_trs = 0.1,
                                  trim_quantiles = c(0.01,0.99),
                                  optimized_compile = TRUE,
                                  max_attempt = 1,
                                  matching_fun = "matching_l1",
                                  delta_n = 1,
                                  scale = 1)

matching_l1 is Manhattan distance matching approach. For prediction model we use SuperLearner package. SuperLearner supports different machine learning methods and packages. params is a list of hyperparameters that users can pass to the third party libraries in the SuperLearner package. All hyperparameters go into the params list. The prefixes are used to distinguished parameters for different libraries. The following table shows the external package names, their equivalent name that should be used in sl_lib, the prefixes that should be used for their hyperparameters in the params list, and available hyperparameters.

Package name sl_lib name prefix available hyperparameters
XGBoost m_xgboost xgb_ nrounds, eta, max_depth, min_child_weight
ranger m_ranger rgr_ num.trees, write.forest, replace, verbose, family

nthread is the number of available threads (cores). XGBoost needs OpenMP installed on the system to parallel the processing. use_covariate_transform activates transforming covariates in order to achieve covariate balance. Users can pass custom function name in a list to be included in the processing. At each iteration, which is set by the users using max_attempt, the column that provides the worst covariate balance will be transformed.

  • Estimating GPS
data_with_gps <- estimate_gps(Y,
                              w,
                              c,
                              gps_model = "parametric",
                              internal_use = FALSE,
                              params = list(xgb_nrounds = 50,
                                            xgb_max_depth = 6,
                                            xgb_eta = 0.3,
                                            xgb_min_child_weight = 1),
                              nthread = 1,                                
                              sl_lib = c("m_xgboost")
                              )

If internal_use is set to be TRUE, the program will return additional vectors to be used by the selected causal inference approach to generate a pseudo population. See ?estimate_gps for more details.

  • Estimating Exposure Rate Function
estimate_npmetric_erf<-function(matched_Y,
                                matched_w,
                                matched_counter = NULL,
                                bw_seq=seq(0.2,2,0.2),
                                w_vals,
                                nthread)
  • Generating Synthetic Data
syn_data <- generate_syn_data(sample_size=1000,
                              outcome_sd = 10,
                              gps_spec = 1,
                              cova_spec = 1)

Contribution

For more information about reporting bugs and contribution, please read the contribution page from the package web page.

References

  1. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2018. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association. (https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2144737)

gpsmatching's People

Contributors

daniellebraun avatar jennyjyounglee avatar m-qin avatar mbsabath avatar naeemkh avatar wxwx1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.