Comments (11)
You shouldn't have a constant in the endogenous. Constants are always exogenous.
import numpy as np
from linearmodels.iv import IVLIML
x,e,z = np.random.multivariate_normal([0,0,0],[[1,0.5,0.5],[0.5,1,0],[0.5,0,1]],size=(1,500)).T
y = 1 + x + e
const = np.ones_like(x)
IVLIML(y,const,x,z).fit()
Out[1]:
IV-LIML Estimation Summary
==============================================================================
Dep. Variable: dependent R-squared: 0.6537
Estimator: IV-LIML Adj. R-squared: 0.6530
No. Observations: 500 F-statistic: 81.456
Date: Tue, Apr 23 2024 P-value (F-stat) 0.0000
Time: 08:27:28 Distribution: chi2(1)
Cov. Estimator: robust
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
exog 0.9909 0.0453 21.853 0.0000 0.9021 1.0798
endog 0.9744 0.1080 9.0253 0.0000 0.7628 1.1860
==============================================================================
Endogenous: endog
Instruments: instruments
Robust Covariance (Heteroskedastic)
Debiased: False
Kappa: 1.000
IVResults, id: 0x158414c7d10
from linearmodels.
Closing as answered, but fell free to comment if not clear.
from linearmodels.
Thanks.
How do I specify whether the first stage has a constant or not?
from linearmodels.
Exogenous variables are always included in the first stage.
from linearmodels.
In short hand, the first stage is W = X + Z where W are the endogenous, X are exogenous and Z are instruments. If you don't want a constant in the 1st state, just use
IVLIML(y,None,x,z)
This worked in your code above.
from linearmodels.
Thanks. But this gives the wrong result, as there is a constant in the second stage. How do I fit a model with a constant in the second stage but none in the first? I see this is a bit pedantic.
from linearmodels.
What you are trying to do is statistically unsound. Why do you think you need to do this?
When you include the constant as exogenous (which it is), then the two models fit are
W = X + Z
Y = X + W-hat
W-hat is a linear combination of X and Z, but X is still there in the second stage.
This is identical to fitting
X + W = X + Z
Y = X-hat + W-hat
in this contrived example, X-hat is trivally equal to X since the best exogenous predictor of X is just X.
from linearmodels.
I've thought and enquired about this a bit more. Here are two relevant references:
The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by ivregress
. The Stackexchange answer states that excluding exogenous regressors
That is, there are settings where it is statistically sound to exclude exogenous regressors from the first stage, given prior (domain) knowledge. The setting where the exogenous regressor is an intercept is one such setting.
from linearmodels.
The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by
ivregress
.
This is correct but it is statistically unsound* to do this. The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included and leads to a lower R2 in the first stage. This in turn leads to a second stage fit that is less correlated with the true effect and so larger standard errors.
The Stackexchange answer states that excluding exogenous regressors in the first stage leads to bias in the causal parameter if there is a link (confounding) and
I don't think this supports excluding them. The conclusion is that they either need to be included or can optionally be excluded (under additional orthogonality assumptions, which coincides with the triangular explanation from Stats, but you might as well include them).
In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).
- When I say statistically unsound I mean in the case where the number of instrumental variables is small relative to the number of observations. IV can be badly biased in situations with many instrument. However, there are simple workarounds such as PCA on the IVs or using LASSO to select variables for the first stage (but then estimating using a standard IV estimator on the selected variables).
from linearmodels.
The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included
Do you mean the error variance of the estimate of the causal parameter? If so, would you happen to have a reference for this statement?
I don't think this supports excluding them.
This does not make a statement about whether excluding them is better than including them, yes.
In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).
To my understanding, if the regressors are exogenous, including them should never lead to inconsistency, correct? Are there settings where in / excluding them in the first stage improves / worsens (asymptotic) efficiency of the estimator (assuming consistency)? There are settings where excluding exogenous regressors from the first stage does improve asymptotic efficiency.
from linearmodels.
Related Issues (20)
- Variables order in results HOT 8
- Tests fail to run: E ModuleNotFoundError: No module named 'interface_meta' HOT 7
- wald_test HOT 4
- Option to avoid MissingValueWarning warning. HOT 3
- Suggestion: Implement a `.remove_data` function for Results
- Min. obs pr Entities/Time Periods is sometimes reported as 0 HOT 3
- Small sample correction clustered avaialble? HOT 2
- Clustered standard errors in `lineramodels` and `statsmodels` HOT 3
- Fixed-effect `p-value` with clustered covariance differs from Stata HOT 2
- PanelOLS : Dropping absorbed variables doesn't work with sample weights HOT 2
- 14 tests fail HOT 2
- ENH: Implement 2+ fixed-effect with multi-way clustering OLS
- Estimated Effects in random effect HOT 7
- Different results for Random Effects models for Python vs Stata HOT 5
- Fixed effects std. error and p-values HOT 2
- Sanitize extra text from summary
- Interaction terms for Categorical Variables in a RE Model HOT 4
- Typo in doc in Driscoll-Kraay formula
- MemoryError with IV2SLS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linearmodels.