Minimal reproducible example: <div class="highlight highlight-source-python notran

How to fit LIML with 1 endogenous variable, 1 instrument, including an intercept about linearmodels HOT 11 CLOSED

mlondschien commented on August 10, 2024

How to fit LIML with 1 endogenous variable, 1 instrument, including an intercept

from linearmodels.

Comments (11)

bashtage commented on August 10, 2024

You shouldn't have a constant in the endogenous. Constants are always exogenous.

import numpy as np
from linearmodels.iv import IVLIML

x,e,z = np.random.multivariate_normal([0,0,0],[[1,0.5,0.5],[0.5,1,0],[0.5,0,1]],size=(1,500)).T
y = 1 + x + e
const = np.ones_like(x)

IVLIML(y,const,x,z).fit()

Out[1]:
                          IV-LIML Estimation Summary
==============================================================================
Dep. Variable:              dependent   R-squared:                      0.6537
Estimator:                    IV-LIML   Adj. R-squared:                 0.6530
No. Observations:                 500   F-statistic:                    81.456
Date:                Tue, Apr 23 2024   P-value (F-stat)                0.0000
Time:                        08:27:28   Distribution:                  chi2(1)
Cov. Estimator:                robust

                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exog           0.9909     0.0453     21.853     0.0000      0.9021      1.0798
endog          0.9744     0.1080     9.0253     0.0000      0.7628      1.1860
==============================================================================

Endogenous: endog
Instruments: instruments
Robust Covariance (Heteroskedastic)
Debiased: False
Kappa: 1.000
IVResults, id: 0x158414c7d10

from linearmodels.

bashtage commented on August 10, 2024

Closing as answered, but fell free to comment if not clear.

from linearmodels.

mlondschien commented on August 10, 2024

Thanks.

How do I specify whether the first stage has a constant or not?

from linearmodels.

bashtage commented on August 10, 2024

Exogenous variables are always included in the first stage.

from linearmodels.

bashtage commented on August 10, 2024

In short hand, the first stage is W = X + Z where W are the endogenous, X are exogenous and Z are instruments. If you don't want a constant in the 1st state, just use

IVLIML(y,None,x,z)

This worked in your code above.

from linearmodels.

mlondschien commented on August 10, 2024

Thanks. But this gives the wrong result, as there is a constant in the second stage. How do I fit a model with a constant in the second stage but none in the first? I see this is a bit pedantic.

from linearmodels.

bashtage commented on August 10, 2024

What you are trying to do is statistically unsound. Why do you think you need to do this?

When you include the constant as exogenous (which it is), then the two models fit are

W = X + Z
Y = X + W-hat

W-hat is a linear combination of X and Z, but X is still there in the second stage.

This is identical to fitting

X + W = X + Z
Y = X-hat + W-hat

in this contrived example, X-hat is trivally equal to X since the best exogenous predictor of X is just X.

from linearmodels.

mlondschien commented on August 10, 2024

I've thought and enquired about this a bit more. Here are two relevant references:

Stata FAQ
StackExchange

The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by ivregress. The Stackexchange answer states that excluding exogenous regressors $X$ in the first stage $Z \rightarrow T$ leads to bias in the causal parameter $T \rightarrow Y$ if there is a link (confounding) $Z \leftrightarrow X$ and $T \leftrightarrow X$.

That is, there are settings where it is statistically sound to exclude exogenous regressors from the first stage, given prior (domain) knowledge. The setting where the exogenous regressor is an intercept is one such setting.

from linearmodels.

bashtage commented on August 10, 2024

The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by ivregress.

This is correct but it is statistically unsound* to do this. The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included and leads to a lower R2 in the first stage. This in turn leads to a second stage fit that is less correlated with the true effect and so larger standard errors.

The Stackexchange answer states that excluding exogenous regressors in the first stage leads to bias in the causal parameter if there is a link (confounding) and

I don't think this supports excluding them. The conclusion is that they either need to be included or can optionally be excluded (under additional orthogonality assumptions, which coincides with the triangular explanation from Stats, but you might as well include them).

In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).

When I say statistically unsound I mean in the case where the number of instrumental variables is small relative to the number of observations. IV can be badly biased in situations with many instrument. However, there are simple workarounds such as PCA on the IVs or using LASSO to select variables for the first stage (but then estimating using a standard IV estimator on the selected variables).

from linearmodels.

mlondschien commented on August 10, 2024

The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included

Do you mean the error variance of the estimate of the causal parameter? If so, would you happen to have a reference for this statement?

I don't think this supports excluding them.

This does not make a statement about whether excluding them is better than including them, yes.

In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).

To my understanding, if the regressors are exogenous, including them should never lead to inconsistency, correct? Are there settings where in / excluding them in the first stage improves / worsens (asymptotic) efficiency of the estimator (assuming consistency)? There are settings where excluding exogenous regressors from the first stage does improve asymptotic efficiency.

from linearmodels.

How to fit LIML with 1 endogenous variable, 1 instrument, including an intercept about linearmodels HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent