Code Monkey home page Code Monkey logo

Comments (11)

bashtage avatar bashtage commented on August 10, 2024

You shouldn't have a constant in the endogenous. Constants are always exogenous.

import numpy as np
from linearmodels.iv import IVLIML

x,e,z = np.random.multivariate_normal([0,0,0],[[1,0.5,0.5],[0.5,1,0],[0.5,0,1]],size=(1,500)).T
y = 1 + x + e
const = np.ones_like(x)

IVLIML(y,const,x,z).fit()
Out[1]:
                          IV-LIML Estimation Summary
==============================================================================
Dep. Variable:              dependent   R-squared:                      0.6537
Estimator:                    IV-LIML   Adj. R-squared:                 0.6530
No. Observations:                 500   F-statistic:                    81.456
Date:                Tue, Apr 23 2024   P-value (F-stat)                0.0000
Time:                        08:27:28   Distribution:                  chi2(1)
Cov. Estimator:                robust

                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exog           0.9909     0.0453     21.853     0.0000      0.9021      1.0798
endog          0.9744     0.1080     9.0253     0.0000      0.7628      1.1860
==============================================================================

Endogenous: endog
Instruments: instruments
Robust Covariance (Heteroskedastic)
Debiased: False
Kappa: 1.000
IVResults, id: 0x158414c7d10

from linearmodels.

bashtage avatar bashtage commented on August 10, 2024

Closing as answered, but fell free to comment if not clear.

from linearmodels.

mlondschien avatar mlondschien commented on August 10, 2024

Thanks.

How do I specify whether the first stage has a constant or not?

from linearmodels.

bashtage avatar bashtage commented on August 10, 2024

Exogenous variables are always included in the first stage.

from linearmodels.

bashtage avatar bashtage commented on August 10, 2024

In short hand, the first stage is W = X + Z where W are the endogenous, X are exogenous and Z are instruments. If you don't want a constant in the 1st state, just use

IVLIML(y,None,x,z)

This worked in your code above.

from linearmodels.

mlondschien avatar mlondschien commented on August 10, 2024

Thanks. But this gives the wrong result, as there is a constant in the second stage. How do I fit a model with a constant in the second stage but none in the first? I see this is a bit pedantic.

from linearmodels.

bashtage avatar bashtage commented on August 10, 2024

What you are trying to do is statistically unsound. Why do you think you need to do this?

When you include the constant as exogenous (which it is), then the two models fit are

W = X + Z
Y = X + W-hat

W-hat is a linear combination of X and Z, but X is still there in the second stage.

This is identical to fitting

X + W = X + Z
Y = X-hat + W-hat

in this contrived example, X-hat is trivally equal to X since the best exogenous predictor of X is just X.

from linearmodels.

mlondschien avatar mlondschien commented on August 10, 2024

I've thought and enquired about this a bit more. Here are two relevant references:

Stata FAQ
StackExchange

The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by ivregress. The Stackexchange answer states that excluding exogenous regressors $X$ in the first stage $Z \rightarrow T$ leads to bias in the causal parameter $T \rightarrow Y$ if there is a link (confounding) $Z \leftrightarrow X$ and $T \leftrightarrow X$.

That is, there are settings where it is statistically sound to exclude exogenous regressors from the first stage, given prior (domain) knowledge. The setting where the exogenous regressor is an intercept is one such setting.

from linearmodels.

bashtage avatar bashtage commented on August 10, 2024

The Stata FAQ states that in a triangular simultaneous equation model as in IV, not including all exogenous regressors in the first stage is unbiased but not supported by ivregress.

This is correct but it is statistically unsound* to do this. The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included and leads to a lower R2 in the first stage. This in turn leads to a second stage fit that is less correlated with the true effect and so larger standard errors.

The Stackexchange answer states that excluding exogenous regressors in the first stage leads to bias in the causal parameter if there is a link (confounding) and

I don't think this supports excluding them. The conclusion is that they either need to be included or can optionally be excluded (under additional orthogonality assumptions, which coincides with the triangular explanation from Stats, but you might as well include them).

In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).

  • When I say statistically unsound I mean in the case where the number of instrumental variables is small relative to the number of observations. IV can be badly biased in situations with many instrument. However, there are simple workarounds such as PCA on the IVs or using LASSO to select variables for the first stage (but then estimating using a standard IV estimator on the selected variables).

from linearmodels.

mlondschien avatar mlondschien commented on August 10, 2024

The reason is that using only the instruments leads to a larger error variance than when using both the instruments and the exogenous regressors are included

Do you mean the error variance of the estimate of the causal parameter? If so, would you happen to have a reference for this statement?

I don't think this supports excluding them.

This does not make a statement about whether excluding them is better than including them, yes.

In general, if you are in a case where including exogenous regressors leads to bias (really inconsistency, since we don't have unbiasedness in IV in general), then it becomes a different problem (one of limiting bias, rather than having valid IV estimates).

To my understanding, if the regressors are exogenous, including them should never lead to inconsistency, correct? Are there settings where in / excluding them in the first stage improves / worsens (asymptotic) efficiency of the estimator (assuming consistency)? There are settings where excluding exogenous regressors from the first stage does improve asymptotic efficiency.

from linearmodels.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.