Code Monkey home page Code Monkey logo

medshift's Introduction

hello, i'm Nima

I'm an academic (bio)statistician working at the interface of causal inference, machine learning, and non- and semi-parametric statistics. I'm passionate about building open-source software tools to improve the accessibility of modern, model-agnostic and assumption-lean methods for statistical inference and causal machine learning, and I'm especially excited by the applications of statistics to the biomedical and public health sciences.

Are you looking for open source software for targeted causal machine learning? Maybe you should check out the tlverse project and browse our free open-source handbook!

Nima's github stats

medshift's People

Contributors

jeremyrcoyle avatar nhejazi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

medshift's Issues

test indexing approaches

A test should be written to ensure that indexing is done correctly for Dy, that is, A * component and (1 - A) * component should be sufficient for indexing

Functionality for true continuous treatment

Hey Nima,

Is there work underway to make this package work for continuous treatment? I've tested it under simple continuous A situations and get errors related to eif_component_names for onestep estimation and estimator_args[["max_iter"]] error for tmle.

propensity score truncation

In the case of a binary intervention, both of the nuisance parameters G = P(A | W) and E = P(A | Z, W) are propensity scores, which may be susceptible to instability in the case of (near) practical violations of the assumption of positivity. It would be best to implement a flexible approach to automatically truncate estimated propensity scores, perhaps by default to the range (0.01, 0.99).

Scaling transformation of outcome variable

It's generally the case that a continuous-valued outcome variable is re-scaled to fall in the interval [0, 1] via the transformation Y_scaled = Y - min(Y) / max(Y) - min(Y) for the purposes of estimation. Upon completion of the estimation procedure, the results should then be back-transformed to the original scale. A pair of functions for performing this transformation should be implemented. Note that, once this change is made, it will be necessary to edit the sl3_Task objects for each nuisance parameter regression task to manually specify family = "quasibinomial" in order to indicate that the Y values are not truly binary (\in {0, 1}) but rather simply fall in the interval.

Check that CV-folds is greater than 1

There should be a check, implemented with assertthat::assert_that(), to make sure that the number of folds specified for cross-validation is greater than 1. In the current implementation, origami::cross_validate will fail for V = 1 with an ambiguous/confusing error message, due to the way in which make_folds generates the structure of class folds. There's really no good reason that the case V = 1 needs to be supported, since using cross-validation / cross-fitting in constructing the AIPW estimator has theoretical benefits anyway.

allow ensemble learning for phi

Allow option for fitting the nuisance parameter Phi via arbitrary algorithms, as already implemented for other nuisance parameters

Multiple Mediators

Hi Nima,

First of all, thank you for your amazing work!
I am trying to decompose the effect of multiple mediators ideally using xgboost. On the documentation, I see that medshift should be able to work with multiple mediators but I keep getting the estimation of only one mediator, even using the default example.

Best wishes,
Ahmed


library(medshift)
library(data.table)

make_simple_mediation_data <- function(n_obs = 1000) {
  W <- rbinom(n_obs, 1, prob = 0.50) 
  A <- as.numeric(rbinom(n_obs, 1, prob = W / 4 + 0.1))
  z1_prob <- 1 - plogis((A^2 + W) / (A + W^3 + 0.5))
  Z <- rbinom(n_obs, 1, prob = z1_prob)
  Y <- Z + A - 0.1 * W + rnorm(n_obs, mean = 0, sd = 0.25)
  
  data <- as.data.table(cbind(Y, Z, A, W))
  setnames(data, c("Y", "Z", "A", "W"))
  return(data)
}
set.seed(75681)
example_data <- make_simple_mediation_data()
example_data$ZZ<-sample(c(0,1),dim(example_data)[1],replace = TRUE)

os_medshift <- medshift(W = example_data$W, A = example_data$A,
                        Z = cbind(example_data$Z,example_data$ZZ), Y = example_data$Y,
                        delta = 3, 
                        g_learners =sl3::Lrnr_xgboost$new() ,
                        e_learners =sl3::Lrnr_xgboost$new() ,
                        m_learners =sl3::Lrnr_xgboost$new() ,
                        phi_learners =sl3::Lrnr_xgboost$new() ,
                        estimator = "onestep",
                        estimator_args = list(cv_folds = 3))
summary(os_medshift)


utility function for IP weights

used in the re-weighted estimator and the efficient estimator, should just be a utility function rather than manual computation

Arbitrary fold structures for one-step estimator

Currently, this line in the function est_onestep, which implements the one-step estimator, restricts the use of origami::make_folds. This should be generalized to allow the number of folds and the specific fold function used to be set arbitrarily by the user. A default might be folds <- origami::make_folds(data, fold_fun = folds_vfold, V = 10).

clearer documentation and naming

The documentation and organization of some parts of the package leave something to be desired. In particular, several new functions require documenting. Also, the naming of the estimators should be slightly revised for clarity --- e.g., "reweighted" -> "ipw", "efficient" -> "onestep"

weight stabilization

Note that the expectation of the weights g / e is equal to one. A good way to stabilize the AIPW estimator is to divide the weights by their empirical sample average. We should implement this.

TODO

  • TMLE for the binary intervention case (should be easy since the exponential tilt defining D_A gives it a nice form --- no integration needed, unlike the case for continuous A) --- should do this using tmle3
  • Re-organization of package contents to begin accommodating continuous interventions; some machinery can be easily borrowed from txshift

missing outcome support

We should implement a procedure that estimates the full data parameter in the presence of a censoring process, e.g., a data structure like O = (W, A, Z, C, CY), for censoring indicator C. Such an approach would be based on the joint intervention setting C = 1 and the joint intervention on {A, Z} that defines our causal parameters. The estimation procedures would then simply incorporate an extra set of IP weights, specifically to address this intervention, i.e., 1/g(C = 1 | โ€ฆ).

Support for observation-level IDs

In some applications, it may be useful to support the presence of hierarchical structures in which individual units belong to clusters (e.g., families, hospitals, schools). To support valid estimation and inference in such settings, observation-level IDs must be passed to nuisance regression estimators (such that cross-validation respects these) and to the inferential machinery (averaging EIF estimates at the cluster level).

utility function for Dzw/substitution

The procedure to compute the Dzw component of the EIF and the substitution estimator is nearly identical (with the substitution estimator simply requiring an empirical mean be taken over the Dzw vector for that component of the EIF values). This should be abstracted into a single utility function to be used in both situations.

TML estimator for binary interventions

TMLE for the binary intervention case should be easy since the exponential tilt defining D_A gives it a nice form (no integration needed, unlike the case for continuous A). We should implement this using the framework exposed in tmle3.

Nuisance parameter phi should use training data for one-step

Currently, computing the nuisance parameter phi does not make use of the training-validation split necessary for computing a cross-fitted one-step estimator. That is, phi is computed on only a single data set https://github.com/nhejazi/medshift/blob/master/R/fit_mechanisms.R#L280-L299. In computing the cross-fitted one-step estimator, only the validation data is used for phi https://github.com/nhejazi/medshift/blob/master/R/estim_helpers.R#L163-L167.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.