Comments (10)
Hello,
Can you be more specific about what functions in MatchIt
are running slow? I can think of a few reasons why certain functions might run slow and some ways to speed them up, but without further information, there is little I can do.
If you are estimating propensity scores within matchit()
, then the estimation of the propensity scores could be slow due to the fixed effects. By default, the estimation of propensity scores is performed by glm()
, so it may be that glm()
is running slowly. One way to get around this would be to estimate the propensity scores outside matchit()
using a package specifically designed to handle fixed effects quickly, such as the fixest
package, and then supply those propensity scores to matchit()
with the distance
argument. For example, if your fixed effect variable is called cl
(i.e., for cluster), you could run the following:
fefit <- fixest::feglm(treat ~ X1 + X2 | cl, data = data, family = binomial)
ps <- fefit$fitted
m.out <- matchit(treat ~ X1 + X2, data = data, distance = ps)
Other propensity score-estimation methods may simply be incompatible with fixed effects, like cbps
, which you should therefore avoid using.
If you are performing Mahalanobis distance or genetic matching, matchit()
may need to invert and multiply huge matrices if there are lots of fixed effects and many units. This cannot be avoided except by excluding the fixed effects from the calculation of the Mahalanobis distance.
If summary()
is running slow after including fixed effects in the matchit()
model formula, that is because summary()
needs to compute balance on every fixed effect individually, which can take a long time. You can avoid this by using the first method I recommended so that the fixed effects are included in the propensity score but not in the matchit()
object, or by using cobalt
to assess balance instead of MatchIt
since cobalt
offers finer control of which covariates are included.
Let me know if any of this helped, or please provide more detail so I can better address the problem.
- Noah
from matchit.
I would suggest to use exact restricting. That is, match within groups that define fixed effects. The idea of fixed effects is basically within-group comparison and matching exactly on groups is usually a better strategy. See this paper and this one show the equivalence (or lack thereof) between fixed effects and matching. The first paper is about one-way fixed effects while the other paper is about two-way fixed effects.
from matchit.
Thanks so much! The fixest method fixed my problem! I have another follow-up question, how can I specify that for all my treated observations, each one's matched observation should be in the sample group (such as industry, year, etc) and the matched distance cannot be higher than 0.1? If some treated observations do not have a matched observation that satisfies these requirements, then delete them from the treated group.
from matchit.
Use the exact
argument to request exact matching on those characteristics, i.e., exact = ~industry+year
. This ensures that each treated unit's match is within the same industry and year. Use the caliper
argument to restrict the distance between matches. By default, the caliper is in standard deviation units of the distance measure (i.e., propensity score). Use the std.caliper
argument to control whether the caliper should be in raw units. For example, caliper = .1, std.caliper = FALSE
ensures that each treated unit's match has a propensity score within .1 of the treated unit's propensity score. You can also place calipers on individual covariates in addition to the propensity score. Any treated units that don't have matches that satisfy the exact
and caliper
restrictions will be dropped.
from matchit.
Thanks so much. I just want to make sure that the variables supplied to the exact argument do not have to be in the variables used for matching. For example, matchit(y ~ x, exact = ~z + h, data = data) will work.
from matchit.
My understanding is that you aren't using any variables for matching except the propensity score, which is supplied to distance
. The variables in the main formula are used solely for balance checking with summary()
but will not affect the match if you provide already-estimated propensity scores to the distance
argument (unless you're using genetic matching).
The variables in exact
and caliper
just need to be in the dataset supplied to data
and don't need to be specified anywhere else, so that example you provided should work fine as long as z
and h
are in data
.
from matchit.
OK. Thanks for the clarification.
from matchit.
It seems that when I supply machit's distance with a feglm model fitted value. Setting caliper to 0.1 and std.caliper to FALSE does not drop the matched observations with a distance higher than 0.1. Is this a bug?
from matchit.
You need to provide more information for me to help you. Please provide your code and the results that you think are in error and I can try to assess.
from matchit.
My bad, I mistyped it. Thanks again for the great package and help!
from matchit.
Related Issues (20)
- optmatch missing from CRAN HOT 2
- Memory allocation problem with method="full" in matchit : Error: (from optmatch) result would exceed 2^31-1 bytes
- weight to initial variable to achieve 100% matching for certain variable
- ratio = 0.5 is not supported HOT 1
- Propensity score weigths & ordinal variables HOT 4
- MatchIt should warn against matching without caliper HOT 1
- Possible Infinite Loop when Exact Matching for Method = 'Nearest' HOT 1
- Errors from using random forest HOT 2
- Error - CEM with NULL in k2k.method HOT 4
- Improve error output of `matchit()` HOT 2
- How larger is the value of caliper by default when the method was set with โnearestโ . HOT 4
- "antiexact" not working as expected HOT 1
- Why is `ratio=2` different from two 1:1 iterations HOT 3
- How does MatchIt calculate Mean for factors/categories? HOT 4
- Interpretation balance statistics HOT 1
- cardinality matching with exact match HOT 2
- What is the meaning of a negative Standardized mean difference (SMD)? HOT 1
- nn table disappeared? HOT 1
- Template matching using gurobi solver with weighted survey data, matching never converge to optimal sollution HOT 1
- Using matchit on data with long format HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from matchit.