Is it possible to use CEM with sampling with replacement? I am aware that there is no

CEM with replacement? about matchit HOT 7 CLOSED

kosukeimai commented on July 28, 2024

CEM with replacement?

from matchit.

Comments (7)

ngreifer commented on July 28, 2024

Coarsened exact matching is a method of stratification. That means that the entire dataset is carved up based on the coarsened covariates. Any stratum without both a treated and control unit is discarded, leaving strata that have both treated and control units with the same value of the coarsened covariates. No pairing is done. It doesn't make sense to talk about replacement because no units are "used up" and need to be replaced. They are simply assigned to the stratum they fall in. This is the default use of method = "cem". You could implement this yourself by coarsening the covariates, creating an interaction between all the covariates (e.g., using interaction()), and discarding any units that are in strata without both a treated and control unit. Here's an example of how you could do that:

X1c <- cut(data$X1, 4)
X2c <- cut(data$X2, 3)

strata <- interaction(X1c, X2c)

strata_with_both <- intersect(strata[treat==1], strata[treat==0])

strata[!strata %in% strata_with_both] <- NA

This is how coarsened exact matching is implemented in MatchIt. (Exact matching is implemented the same way without coarsening the covariates). Seeing it this way might be instructive to help you understand the method.

An optional second step is to perform matching within the strata, which you can do by setting k2k = TRUE. All this does is discard data. It is not recommended except when you need to discard data (e.g., because it's too expensive t collect outcome data on all units). What k2k = TRUE does is drop units in the strata until the number of treated units is equal to the number of control units within each stratum. It chooses the ones to discard by running nearest neighbor matching without replacement and discarding the units that are not matched. The pairs that are kept are returned as matched pairs.

It doesn't make sense to talk about coarsened exact matching with replacement because the purpose of the second stage matching is to prune units from the strata, not to create optimally matched pairs (which is the purpose of nearest neighbor matching). This is why replace is ignored with method = "cem".

You can do nearest neighbor matching with replacement with strata of the coarsened variables by creating the coarsened version of the variables yourself and supplying them to the exact argument with method = "nearest". For example, you could run

matchit(treat ~ X1 + X2 + X3, data = data, method = "nearest", replace = TRUE,
              exact = ~cut(X1, 4) +  cut(X2, 3))

This would run nearest neighbor propensity score matching with replacement within strata of coarsened versions of X1 and X2. With nearest neighbor matching, the pairs are the primary output, and the coarsened exact matching is used to limit which units can be paired with each other. In general, it makes more sense to place a caliper on the variables you want close matches on rather than using exact, e.g., caliper = c(X1 = .05, X2 = .1).

from matchit.

yceny commented on July 28, 2024

Thank you so much for the detailed explanation. One more question about 1 to 1 matching. In CEM, k2k = TRUE means 1 to 1 matching, right? If I would like to implement 1 to many matching, shall I set k2k = FALSE? How about in nearest neighbor method in terms of 1 to 1/many matching?

from matchit.

ngreifer commented on July 28, 2024

With method = "cem", you cannot implement one-to-many matching. As I mentioned, no pairing takes in CEM with k2k = FALSE. If you don't want to drop many units, you should just use the CEM output as-is. There is no reason to additionally do pairing after the stratification. I see almost no reason to set k2k = TRUE.

Using method = "nearest", the ratio argument determines the number of control units paired with each treated unit. This is explained in the ?matchit and ?method_nearest documentation.

from matchit.

yceny commented on July 28, 2024

Got you. How does method = nearest deal with categorical variables?

from matchit.

ngreifer commented on July 28, 2024

The default is to do propensity score matching. The covariates are included in a logistic regression of the treatment on the covariates and the predicted values are used as the propensity scores. The difference between two units' propensity scores is the distance between the units. So covariates don't feature in nearest neighbor matching, since only the propensity score is used. The fact that a covariate is categorical has no bearing on how it used; it is simply a covariate in the logistic regression model for the propensity score, and logistic regression handles categorical covariates as all regression models do. Propensity score matching is agnostic to the covariates used in the propensity score.

Categorical variables can be supplied to the exact argument to do exact matching on them. They can also feature in the Mahalanobis distance if requested.

from matchit.

yceny commented on July 28, 2024

Thanks. Also, in cem, does the dependent variable have to be 0 and 1? Can the dependent variable be like 0,1,2,3?

from matchit.

ngreifer commented on July 28, 2024

The cem package can handle non-binary treatments, but MatchIt cannot.

from matchit.

CEM with replacement? about matchit HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent