Code Monkey home page Code Monkey logo

Comments (7)

ngreifer avatar ngreifer commented on July 28, 2024

Coarsened exact matching is a method of stratification. That means that the entire dataset is carved up based on the coarsened covariates. Any stratum without both a treated and control unit is discarded, leaving strata that have both treated and control units with the same value of the coarsened covariates. No pairing is done. It doesn't make sense to talk about replacement because no units are "used up" and need to be replaced. They are simply assigned to the stratum they fall in. This is the default use of method = "cem". You could implement this yourself by coarsening the covariates, creating an interaction between all the covariates (e.g., using interaction()), and discarding any units that are in strata without both a treated and control unit. Here's an example of how you could do that:

X1c <- cut(data$X1, 4)
X2c <- cut(data$X2, 3)

strata <- interaction(X1c, X2c)

strata_with_both <- intersect(strata[treat==1], strata[treat==0])

strata[!strata %in% strata_with_both] <- NA

This is how coarsened exact matching is implemented in MatchIt. (Exact matching is implemented the same way without coarsening the covariates). Seeing it this way might be instructive to help you understand the method.

An optional second step is to perform matching within the strata, which you can do by setting k2k = TRUE. All this does is discard data. It is not recommended except when you need to discard data (e.g., because it's too expensive t collect outcome data on all units). What k2k = TRUE does is drop units in the strata until the number of treated units is equal to the number of control units within each stratum. It chooses the ones to discard by running nearest neighbor matching without replacement and discarding the units that are not matched. The pairs that are kept are returned as matched pairs.

It doesn't make sense to talk about coarsened exact matching with replacement because the purpose of the second stage matching is to prune units from the strata, not to create optimally matched pairs (which is the purpose of nearest neighbor matching). This is why replace is ignored with method = "cem".

You can do nearest neighbor matching with replacement with strata of the coarsened variables by creating the coarsened version of the variables yourself and supplying them to the exact argument with method = "nearest". For example, you could run

matchit(treat ~ X1 + X2 + X3, data = data, method = "nearest", replace = TRUE,
              exact = ~cut(X1, 4) +  cut(X2, 3))

This would run nearest neighbor propensity score matching with replacement within strata of coarsened versions of X1 and X2. With nearest neighbor matching, the pairs are the primary output, and the coarsened exact matching is used to limit which units can be paired with each other. In general, it makes more sense to place a caliper on the variables you want close matches on rather than using exact, e.g., caliper = c(X1 = .05, X2 = .1).

from matchit.

yceny avatar yceny commented on July 28, 2024

Thank you so much for the detailed explanation. One more question about 1 to 1 matching. In CEM, k2k = TRUE means 1 to 1 matching, right? If I would like to implement 1 to many matching, shall I set k2k = FALSE? How about in nearest neighbor method in terms of 1 to 1/many matching?

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

With method = "cem", you cannot implement one-to-many matching. As I mentioned, no pairing takes in CEM with k2k = FALSE. If you don't want to drop many units, you should just use the CEM output as-is. There is no reason to additionally do pairing after the stratification. I see almost no reason to set k2k = TRUE.

Using method = "nearest", the ratio argument determines the number of control units paired with each treated unit. This is explained in the ?matchit and ?method_nearest documentation.

from matchit.

yceny avatar yceny commented on July 28, 2024

Got you. How does method = nearest deal with categorical variables?

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

The default is to do propensity score matching. The covariates are included in a logistic regression of the treatment on the covariates and the predicted values are used as the propensity scores. The difference between two units' propensity scores is the distance between the units. So covariates don't feature in nearest neighbor matching, since only the propensity score is used. The fact that a covariate is categorical has no bearing on how it used; it is simply a covariate in the logistic regression model for the propensity score, and logistic regression handles categorical covariates as all regression models do. Propensity score matching is agnostic to the covariates used in the propensity score.

Categorical variables can be supplied to the exact argument to do exact matching on them. They can also feature in the Mahalanobis distance if requested.

from matchit.

yceny avatar yceny commented on July 28, 2024

Thanks. Also, in cem, does the dependent variable have to be 0 and 1? Can the dependent variable be like 0,1,2,3?

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

The cem package can handle non-binary treatments, but MatchIt cannot.

from matchit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.