Code Monkey home page Code Monkey logo

Comments (10)

ngreifer avatar ngreifer commented on July 28, 2024

Hello,

Can you be more specific about what functions in MatchIt are running slow? I can think of a few reasons why certain functions might run slow and some ways to speed them up, but without further information, there is little I can do.

If you are estimating propensity scores within matchit(), then the estimation of the propensity scores could be slow due to the fixed effects. By default, the estimation of propensity scores is performed by glm(), so it may be that glm() is running slowly. One way to get around this would be to estimate the propensity scores outside matchit() using a package specifically designed to handle fixed effects quickly, such as the fixest package, and then supply those propensity scores to matchit() with the distance argument. For example, if your fixed effect variable is called cl (i.e., for cluster), you could run the following:

fefit <- fixest::feglm(treat ~ X1 + X2 | cl, data = data, family = binomial)
ps <- fefit$fitted
m.out <- matchit(treat ~ X1 + X2, data = data, distance = ps)

Other propensity score-estimation methods may simply be incompatible with fixed effects, like cbps, which you should therefore avoid using.

If you are performing Mahalanobis distance or genetic matching, matchit() may need to invert and multiply huge matrices if there are lots of fixed effects and many units. This cannot be avoided except by excluding the fixed effects from the calculation of the Mahalanobis distance.

If summary() is running slow after including fixed effects in the matchit() model formula, that is because summary() needs to compute balance on every fixed effect individually, which can take a long time. You can avoid this by using the first method I recommended so that the fixed effects are included in the propensity score but not in the matchit() object, or by using cobalt to assess balance instead of MatchIt since cobalt offers finer control of which covariates are included.

Let me know if any of this helped, or please provide more detail so I can better address the problem.

  • Noah

from matchit.

kosukeimai avatar kosukeimai commented on July 28, 2024

I would suggest to use exact restricting. That is, match within groups that define fixed effects. The idea of fixed effects is basically within-group comparison and matching exactly on groups is usually a better strategy. See this paper and this one show the equivalence (or lack thereof) between fixed effects and matching. The first paper is about one-way fixed effects while the other paper is about two-way fixed effects.

from matchit.

waynelapierre avatar waynelapierre commented on July 28, 2024

Thanks so much! The fixest method fixed my problem! I have another follow-up question, how can I specify that for all my treated observations, each one's matched observation should be in the sample group (such as industry, year, etc) and the matched distance cannot be higher than 0.1? If some treated observations do not have a matched observation that satisfies these requirements, then delete them from the treated group.

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

Use the exact argument to request exact matching on those characteristics, i.e., exact = ~industry+year. This ensures that each treated unit's match is within the same industry and year. Use the caliper argument to restrict the distance between matches. By default, the caliper is in standard deviation units of the distance measure (i.e., propensity score). Use the std.caliper argument to control whether the caliper should be in raw units. For example, caliper = .1, std.caliper = FALSE ensures that each treated unit's match has a propensity score within .1 of the treated unit's propensity score. You can also place calipers on individual covariates in addition to the propensity score. Any treated units that don't have matches that satisfy the exact and caliper restrictions will be dropped.

from matchit.

waynelapierre avatar waynelapierre commented on July 28, 2024

Thanks so much. I just want to make sure that the variables supplied to the exact argument do not have to be in the variables used for matching. For example, matchit(y ~ x, exact = ~z + h, data = data) will work.

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

My understanding is that you aren't using any variables for matching except the propensity score, which is supplied to distance. The variables in the main formula are used solely for balance checking with summary() but will not affect the match if you provide already-estimated propensity scores to the distance argument (unless you're using genetic matching).

The variables in exact and caliper just need to be in the dataset supplied to data and don't need to be specified anywhere else, so that example you provided should work fine as long as z and h are in data.

from matchit.

waynelapierre avatar waynelapierre commented on July 28, 2024

OK. Thanks for the clarification.

from matchit.

waynelapierre avatar waynelapierre commented on July 28, 2024

It seems that when I supply machit's distance with a feglm model fitted value. Setting caliper to 0.1 and std.caliper to FALSE does not drop the matched observations with a distance higher than 0.1. Is this a bug?

from matchit.

ngreifer avatar ngreifer commented on July 28, 2024

You need to provide more information for me to help you. Please provide your code and the results that you think are in error and I can try to assess.

from matchit.

waynelapierre avatar waynelapierre commented on July 28, 2024

My bad, I mistyped it. Thanks again for the great package and help!

from matchit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.