Two-Stage Cluster Bootstrap and Causal Cluster Variance for Stata

This repository contains a Stata implementation of the Two-Stage Cluster Bootstrap (TSCB) estimator and the Causal Cluster Variance (CCV) estimator described in Abadie et al (2023). These programs return standard errors for regression analysis of some outcome on a treatment of interest using either simple OLS, or fixed effects models, while accounting for clustering by group. Unlike standard cluster-robust inference, these estimators are based on a design-based approach, where uncertainty owes to the sampling process and the treatment assignment mechanism. As the interest in this setting is in estimating some average treatment effect for a particular population, inference is conducted with regards to the finite population of interest (for example, all states in a country), rather than infinite super-populations flowing from some data-generating process. In many empirical situations, a considerable proportion of clusters, or even all clusters, may be sampled in data, and in these cases standard errors based on TSCB or CCV can be considerably smaller than traditional model-based cluster-robust standard errors (ie Stata's s vce(cluster clustvar) implementation).

Uncertainty related to estimated regression parameters in this setting owes to the following elements:

The Sampling process:
- The proportion of clusters sampled (may be 100%). Referred to as qk below.
- The proportion of individuals sampled within each cluster (may be 100%). Referred to as pk below.
The Treatment assignment mechanism: The proportion of each cluster assigned to receive treatment
The Heterogeneity in treatment effects across clusters

For example, consider the following two circumstances, based on US Census data and returns to college education discussed in Abadie et al (2023). These are based on two specific simulations: the top panel considers inference on the full sample of individuals in the 2000 US census data, and hence pk=1 and where all states are observed, hence qk=1. The second panel considers the same case, however here only 26 of 52 clusters are sampled (qk=0.5). In both cases, the variation of the treatment effect by cluster is plotted in left hand panels, and the variation of treatment assignment by cluster is plotted in right hand panels. Variation in these particular elements will drive the variance on the estimated treatment effect.

The implications of this variation for standard errors can be considerable. In the plot below, a range of point estimates and 95% confidence intervals are displayed corresponding to simulations described in section VI of Abadie et al (2023), where we additionally vary the proportion of clusters sampled. Here we observe that in these cases, confidence intervals based on TSCB and Causal Cluster Variance estimates achieve good coverage with regards to the aymptotic variance of interest. They are additionally considerably shorter than confidence intervals based on traditional (model based) cluster robust standard errors, particularly in the case when not all clusters are sampled.

The Causal Cluster Variance estimator is a closed-form variance estimate for treatment effects which is based on a refinement to the standard cluster-robust variance estimator. The computational implementation of this estimator follows equation (13) of Abadie et al (2023) in cases where all clusters are sampled, or equation (14) in cases where all clusters are not sampled. The Two-Stage Cluster Bootstrap estimator is a bootstrap-based variance estimator for treatment effects where bootstrap resamples have alternative treatment assignment probabilities than in the original sample, while allowing for the case where a large fraction of clusters are observed. The comuptational implementation of this estimator follows Algorithm 1 of Abadie et al (2023).

Both cases additionally admit for fixed effects estimators, following section V of Abadie et al (2023). Details on code installation and implementation are below, with full documentation available in help files installed with the programs (typing help tscb or help ccv in Stata).

TSCB: Two-Stage Cluster Bootstrap

tscb.ado - Implements the Two-Stage Cluster Bootstrap variance, reporting standard errors for OLS or fixed effects models. It additionally reports typical cluster robust and heteroscedasticity robust standard errors. This code follow algorithm 1 of Abadie et al (2023). Additional details can be found after installation by typing help tscb in Stata.

To install directly into Stata:

net install tscb, from("https://raw.githubusercontent.com/daniel-pailanir/tscb-ccv/master") replace

Syntax

tscb Y W M [if] [in], qk() seed() reps() fe

Where Y is an outcome variable, W a binary treatment variable and M is a variable indicating the group over which clustering is calculated. The option qk() is required, and must take values between 0 and 1 indicating the proportion of clusters sampled (1 implies all clustered are sampled). We provide an example based on the 2000 US Census, discussed in the introduction of Abadie et al (2023).

OLS Estimates

webuse set www.damianclarke.net/stata/

webuse "census2000_5pc.dta", clear

* run TSCB
tscb ln_earnings college state, qk(1) seed(2022) reps(150)

The code returns the following results

Two-Stage Cluster Bootstrap replications (150).
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................     50
..................................................     100
..................................................     150

OLS regression with Two-Stage Cluster Bootstrap Variance
                                                Number of obs     =  2,632,838
                                                R-squared         =     0.0567

------------------------------------------------------------------------------
 ln_earnings | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     college |      0.466      0.003   133.59   0.000        0.459       0.472
-------------+----------------------------------------------------------------
 Robust SE   |                 0.001   400.76   0.000        0.463       0.468
 Cluster SE  |                 0.027    17.16   0.000        0.412       0.519
------------------------------------------------------------------------------

CCV: Causal Cluster Variance

ccv.ado - Implements the Causal Cluster Variance, reporting standard errors for OLS or fixed effects models. It additionally reports typical cluster robust and heteroscedasticity robust standard errors. This code implements equations (13), (14), or (20) of Abadie et al (2023), depending whether all clusters are sampled or not, and whether OLS or FE models are desired. Additional details can be found after installation by typing help ccv in Stata.

To install directly into Stata:

net install ccv, from("https://raw.githubusercontent.com/daniel-pailanir/tscb-ccv/master") replace

Syntax

ccv Y W M [if] [in], qk() pk() seed() reps() fe

Where Y is an outcome variable, W a binary treatment variable and M a variable indicating the group over which clustering is calculated. The options qk() and pk() are required, and must take values between 0 and 1 indicating, respectively, the proportion of clusters sampled (1 implies all clustered are sampled) and the proportion of individuals sampled within clusters. Below, an example is provided using the 5% sample of the 2000 US Census, described in the introduction of Abadie et al. (2023).

OLS Estimates

webuse set www.damianclarke.net/stata/

webuse "census2000_5pc.dta", clear

* run CCV
ccv ln_earnings college state, pk(0.05) qk(1) seed(2022) reps(8)

The code returns the following results

Causal Cluster Variance with (8) sample splits.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
........
OLS regression with Causal Cluster Variance
                                                Number of obs     =  2,632,838
                                                R-squared         =     0.0567

------------------------------------------------------------------------------
 ln_earnings | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     college |      0.466      0.003   137.32   0.000        0.459       0.472
-------------+----------------------------------------------------------------
 Robust SE   |                 0.001   400.76   0.000        0.463       0.468
 Cluster SE  |                 0.027    17.16   0.000        0.412       0.519
------------------------------------------------------------------------------

References

When Should You Adjust Standard Errors for Clustering?, Alberto Abadie, Susan Athey, Guido W Imbens, Jeffrey M Wooldridge, The Quarterly Journal of Economics, 138(1):1-35, 2023.

daniel-pailanir / tscb-ccv Goto Github PK

tscb-ccv's Introduction

Two-Stage Cluster Bootstrap and Causal Cluster Variance for Stata

TSCB: Two-Stage Cluster Bootstrap

Syntax

OLS Estimates

CCV: Causal Cluster Variance

Syntax

OLS Estimates

References

tscb-ccv's People

Contributors

Stargazers

Watchers

Forkers

tscb-ccv's Issues

Multiple independent variables?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent