Endogeneity

In this section , we are going to focus on Linear Regression When E[XU]0E[XU] \neq 0. This condition is equivalent to E[UX]0E[U|X]\neq0.

I will illustrate both conditions in detail in the following. However, they are somehow equivalent.

Condition 1: E[XU] not equal to 0

Let (Y,X,U)(Y, X, U) be a random vector where YY and UU take values in R\mathbf{R} and XRk+1X \in \mathbf{R}^{k+1}. Assume further that X=(X0,X1,,Xk)X=\left(X_0, X_1, \ldots, X_k\right)^{\prime} with X0=1X_0=1 and let β=(β0,β1,,βk)Rk+1\beta=\left(\beta_0, \beta_1, \ldots, \beta_k\right)^{\prime} \in \mathbf{R}^{k+1} be such that

Y=Xβ+UY=X^{\prime} \beta+U

Not that now, we do not assume E[XU]=0E[X U]=0. Any XjX_j such that E[XjU]=0E\left[X_j U\right]=0 is said to be exogenous; Any XjX_j such that E[XjU]0E\left[X_j U\right] \neq 0 is said to be endogenous. Normalizing β0\beta_0 if necessary, we view X0X_0 as exogenous.

Here is an example:

Recall the Cobb-Douglas production function, a fundamental concept in macroeconomics.

Y=AKαLβY=A K^\alpha L^\beta

where we have:

  • YY: Output, or total production of goods and services in an economy.

  • AA: Total factor productivity

  • KK: Capital input

  • LL: Labor input

  • αα and ββ: These are the output elasticities of capital and labor, respectively.

We can reform this production function into:

ln(Y)=logA+αlogK+βlogL.\ln (Y)=\log A+\alpha \log K+\beta \log L .

To do the regression on this function, we can further reform it as

y=β0+β1K+β2L+uy=\beta_0+\beta_1 K+\beta_2 L+u

Here, we can easily know that this model is endogenous since there are more macro-economy factors that correlated with Capital and Labor but not included in our model, therefore,

E(Ku)0E(Lu)0\mathbb{E}(K u) \neq 0 \quad E(L u) \neq 0 \text {, }

Note that, this is a structure model, which is based on economic theory and are designed to capture the underlying mechanisms and relationships between different variables. It focus on the causal relationship. However, if we treat it as a projection model, the parameters β\beta, β1\beta_1, and β2\beta_2 are going to be slightly different from this structure model. Since in a projection model, we will assume E(Ku)=0&E(Lu)=0\mathbb{E}(K u) = 0 \quad\& \quad E(L u) = 0.

Now we raise a question, what will happen to the OLS estimator in this setting E[XU]0E[X U]\neq0?

The Projection Model will have the following inconsistency problem.

β^=(i=1nXiXi)1i=1nXiYi=(1ni=1nXiXi)11ni=1nXiYiP(E[XiXi])1E[XiYi]=(E[XiXi])1E[Xi(Xiβ+ui)]=β+(E[XiXi])1E[Xiui]β if E[Xiui]0.\begin{aligned} \hat{\beta} & =\left(\sum_{i=1}^n X_i X_i^{\prime}\right)^{-1} \sum_{i=1}^n X_i Y_i \\ & =\left(\frac{1}{n} \sum_{i=1}^n X_i X_i^{\prime}\right)^{-1} \frac{1}{n} \sum_{i=1}^n X_i Y_i \\ & \stackrel{P}{\longrightarrow}\left(\mathbb{E}\left[X_i X_i^{\prime}\right]\right)^{-1} \mathbb{E}\left[X_i Y_i^{\prime}\right] \\ & =\left(\mathbb{E}\left[X_i X_i^{\prime}\right]\right)^{-1} \mathbb{E}\left[X_i\left(X_i^{\prime} \beta+u_i\right)\right] \\ & =\beta+\left(\mathbb{E}\left[X_i X_i^{\prime}\right]\right)^{-1} \mathbb{E}\left[X_i u_i\right] \\ & \neq \beta \quad \text { if } \mathbb{E}\left[X_i u_i\right] \neq 0 . \end{aligned}

Therefore, plimnβ^βp \lim _{n \rightarrow \infty} \hat{\beta} \neq \beta, the OLS estimator is inconsistent and biased.

Condition 2: E[U|X] not equal to 0

For regression model

Last updated