Comments (7)
Coarsened exact matching is a method of stratification. That means that the entire dataset is carved up based on the coarsened covariates. Any stratum without both a treated and control unit is discarded, leaving strata that have both treated and control units with the same value of the coarsened covariates. No pairing is done. It doesn't make sense to talk about replacement because no units are "used up" and need to be replaced. They are simply assigned to the stratum they fall in. This is the default use of method = "cem"
. You could implement this yourself by coarsening the covariates, creating an interaction between all the covariates (e.g., using interaction()
), and discarding any units that are in strata without both a treated and control unit. Here's an example of how you could do that:
X1c <- cut(data$X1, 4)
X2c <- cut(data$X2, 3)
strata <- interaction(X1c, X2c)
strata_with_both <- intersect(strata[treat==1], strata[treat==0])
strata[!strata %in% strata_with_both] <- NA
This is how coarsened exact matching is implemented in MatchIt
. (Exact matching is implemented the same way without coarsening the covariates). Seeing it this way might be instructive to help you understand the method.
An optional second step is to perform matching within the strata, which you can do by setting k2k = TRUE
. All this does is discard data. It is not recommended except when you need to discard data (e.g., because it's too expensive t collect outcome data on all units). What k2k = TRUE
does is drop units in the strata until the number of treated units is equal to the number of control units within each stratum. It chooses the ones to discard by running nearest neighbor matching without replacement and discarding the units that are not matched. The pairs that are kept are returned as matched pairs.
It doesn't make sense to talk about coarsened exact matching with replacement because the purpose of the second stage matching is to prune units from the strata, not to create optimally matched pairs (which is the purpose of nearest neighbor matching). This is why replace
is ignored with method = "cem"
.
You can do nearest neighbor matching with replacement with strata of the coarsened variables by creating the coarsened version of the variables yourself and supplying them to the exact
argument with method = "nearest"
. For example, you could run
matchit(treat ~ X1 + X2 + X3, data = data, method = "nearest", replace = TRUE,
exact = ~cut(X1, 4) + cut(X2, 3))
This would run nearest neighbor propensity score matching with replacement within strata of coarsened versions of X1
and X2
. With nearest neighbor matching, the pairs are the primary output, and the coarsened exact matching is used to limit which units can be paired with each other. In general, it makes more sense to place a caliper on the variables you want close matches on rather than using exact
, e.g., caliper = c(X1 = .05, X2 = .1)
.
from matchit.
Thank you so much for the detailed explanation. One more question about 1 to 1 matching. In CEM, k2k = TRUE
means 1 to 1 matching, right? If I would like to implement 1 to many matching, shall I set k2k = FALSE
? How about in nearest neighbor method in terms of 1 to 1/many matching?
from matchit.
With method = "cem"
, you cannot implement one-to-many matching. As I mentioned, no pairing takes in CEM with k2k = FALSE
. If you don't want to drop many units, you should just use the CEM output as-is. There is no reason to additionally do pairing after the stratification. I see almost no reason to set k2k = TRUE
.
Using method = "nearest"
, the ratio
argument determines the number of control units paired with each treated unit. This is explained in the ?matchit
and ?method_nearest
documentation.
from matchit.
Got you. How does method = nearest
deal with categorical variables?
from matchit.
The default is to do propensity score matching. The covariates are included in a logistic regression of the treatment on the covariates and the predicted values are used as the propensity scores. The difference between two units' propensity scores is the distance between the units. So covariates don't feature in nearest neighbor matching, since only the propensity score is used. The fact that a covariate is categorical has no bearing on how it used; it is simply a covariate in the logistic regression model for the propensity score, and logistic regression handles categorical covariates as all regression models do. Propensity score matching is agnostic to the covariates used in the propensity score.
Categorical variables can be supplied to the exact
argument to do exact matching on them. They can also feature in the Mahalanobis distance if requested.
from matchit.
Thanks. Also, in cem
, does the dependent variable have to be 0 and 1? Can the dependent variable be like 0,1,2,3?
from matchit.
The cem
package can handle non-binary treatments, but MatchIt
cannot.
from matchit.
Related Issues (20)
- after update to Version 4.5.1 matchit(method = "nearest", exact = c("race", "age") results in reproducible errors HOT 3
- sum of distance of matched pairs are different across platforms
- package 'Matchit' is not available for R version 4.2.3 HOT 2
- Weights from Coarsened exact matching (CEM) appears to be wrong HOT 9
- Small typos in documentation HOT 1
- contradictory information on propensity score estimate HOT 1
- qq plot HOT 2
- Matrix of matches if replace = TRUE HOT 4
- Error when including a numeric variable in formula using cem HOT 9
- Calculation of weights HOT 6
- Caliper not working as intended on versions 4.5.1 - 4.5.3 HOT 16
- "Simple" exact 1:1 matchint not working HOT 5
- Get "Sample Sizes:" summary as table or dataframe HOT 1
- eCDF HOT 1
- Error when providing a formula with "." character HOT 1
- 1:1 matching (cem, cutpoints=0) but tollerance/distance on one variable HOT 2
- MatchIt for matching control with multiple group of treatment HOT 5
- PSM-DiD on Panel data/match or Unmatch for control with multiple group of treatment HOT 3
- Optimal matching throws an error about the data argument when discarding is used HOT 1
- Very difference Matching Results using R matchit compared to SAS PSmatch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from matchit.