TSLS Estimator

Two-Stage Least Squares

The Two-Stage Least Squares (TSLS) is used when the number of instrument variables is greater than the number of explanatory variables. Means to use it in the following condition:

 Over-identified case: l>k\text { Over-identified case: } l>k

The expressions we derived for β\beta in this case, like

β=E[ΠE[ZX]]1ΠE[ZY]\beta=E\left[\Pi^{\prime} E\left[Z X^{\prime}\right]\right]^{-1} \Pi^{\prime} E[Z Y]

all involved the matrix Π\Pi, where

BLP(XZ)=ΠZB L P(X \mid Z)=\Pi^{\prime} Z

An estimate of Π\Pi can be obtained by OLS. Since Π=E[ZZ]1E[ZX]\Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right], a natural estimator of Π\Pi is

Π^=(1niZiZi)1(1niZiXi).\hat{\Pi}=\left(\frac{1}{n} \sum_i Z_i Z_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum_i Z_i X_i^{\prime}\right) .

Let Xi=Π^Zi+V^iX_i=\hat{\Pi}^{\prime} Z_i+\hat{V}_i, with above estimator of Π\Pi, a natural estimator of β\beta is simply:

β^=[Π^1ni=1nZiXi]1[Π^1ni=1nZiYi]\hat{\beta}=\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i Y_i\right]

Proof:

β^=[Π^1ni=1nZiXi]1[Π^1ni=1nZiYi]=[Π^1ni=1nZiXi]1[Π^1ni=1nZi(Xiβ+U)]pE[ZX]1(E[ZXβ]+E[ZU])=β+E[ZX]1E[ZU] by Instrument Exogeneity: E[ZU]=0=β\begin{aligned} \hat{\beta}&=\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi}\frac{1}{n} \sum_{i=1}^n Z_i Y_i\right] \\ & =\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i\left(X_i^{\prime} \beta+U\right)\right] \\ & \stackrel{p}{\rightarrow} \mathbb{E}\left[Z X^{\prime}\right]^{-1}\left(\mathbb{E}\left[Z X^{\prime} \beta\right]+\mathbb{E}[Z U]\right) \\ & =\beta+\mathbb{E}\left[Z X^{\prime}\right]^{-1} \mathbb{E}[Z U] \quad\text{ by Instrument Exogeneity: } \mathbb{E}[Z U]=0\\ & =\beta \\\end{aligned}

Note that β^n\hat{\beta}_n satisfies

1niΠ^Zi(YiXiβ^)=0.\frac{1}{n} \sum_i \hat{\Pi}^{\prime} Z_i\left(Y_i-X_i^{\prime} \hat{\beta}\right)=0 .

In particular, U^i=YiXiβ^\hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} satisfies

1niΠ^ZiU^i=0\frac{1}{n} \sum_i \hat{\Pi}^{\prime} Z_i \hat{U}_i=0

This implies that U^i\hat{U}_i is orthogonal to all of the instruments equal to an exogenous regressors, but may not be orthogonal to the other regressors.

It is termed the TSLS estimator because it may be obtained in the following way:

  1. Regress (each component of) XiX_i on ZiZ_i to obtain X^i=Π^Zi\hat{X}_i=\hat{\Pi}^{\prime} Z_i

  2. Regress YiY_i on X^i\hat{X}_i to obtain β^\hat{\beta}. However, in order to obtain proper standard errors, it is recommended to compute the estimator in one step

Matrix Notation

This estimator may be expressed more compactly using matrix notation. Define

Z=(Z1,,Zn)X=(X1,,Xn)Y=(Y1,,Yn)X^=(X^1,,X^n)=PZX,\begin{aligned} \mathbb{Z} & =\left(Z_1, \ldots, Z_n\right)^{\prime} \\ \mathbb{X} & =\left(X_1, \ldots, X_n\right)^{\prime} \\ \mathbb{Y} & =\left(Y_1, \ldots, Y_n\right)^{\prime} \\ \hat{\mathbb{X}} & =\left(\hat{X}_1, \ldots, \hat{X}_n\right)^{\prime} \\ & =\mathbb{P}_Z \mathbb{X}, \end{aligned}

where

PZ=Z(ZZ)1Z\mathbb{P}_Z=\mathbb{Z}\left(\mathbb{Z}^{\prime} \mathbb{Z}\right)^{-1} \mathbb{Z}^{\prime}

is the projection matrix onto the column space of Z\mathbb{Z}. In this notation, we have

β^=(X^X)1(X^Y)=(X^X^)1(X^Y)=(XPZX)1(XPZY)\begin{aligned} \hat{\beta} & =\left(\hat{\mathbb{X}}^{\prime} \mathbb{X}\right)^{-1}\left(\hat{\mathbb{X}}^{\prime} \mathbb{Y}\right) \\ & =\left(\hat{\mathbb{X}}^{\prime} \hat{\mathbb{X}}\right)^{-1}\left(\hat{\mathbb{X}}^{\prime} \mathbb{Y}\right) \\ & =\left(\mathbb{X}^{\prime} \mathbb{P}_Z \mathbb{X}\right)^{-1}\left(\mathbb{X}^{\prime} \mathbb{P}_Z \mathbb{Y}\right) \end{aligned}

Properties of Two-Stage Least Squares

Let (Y,X,U)(Y, X, U) be a random vector where YY and UU take values in R\mathbf{R} and XX takes values in Rk+1\mathbf{R}^{k+1}. Assume further that the first component of XX is constant and equal to one, i.e., X=(X0,X1,,Xk)X=\left(X_0, X_1, \ldots, X_k\right)^{\prime} with X0=1X_0=1. Let β=(β0,β1,,βk)Rk+1\beta=\left(\beta_0, \beta_1, \ldots, \beta_k\right)^{\prime} \in \mathbf{R}^{k+1} be such that

Y=Xβ+UY=X^{\prime} \beta+U

Estimation in OLS is inconsistent and biased if E[XU]0E[XU]\neq0

We assume:

  1. E[ZU]=0E[Z U]=0: Exclusion Condition: variable need to be valid IV

  2. E[ZX]<E\left[Z X^{\prime}\right]<\infty: Regularity condition

  3. E[ZZ]<E\left[Z Z^{\prime}\right]<\infty: Regularity condition

  4. There is no perfect collinearity in ZZ

  5. The rank of E[ZX]E\left[Z X^{\prime}\right] is k+1k+1: Relevance Condition

Let (Y1,X1,Z1),,(Yn,Xn,Zn)\left(Y_1, X_1, Z_1\right), \ldots,\left(Y_n, X_n, Z_n\right) be an i.i.d. sequence of random variables with distribution PP.

Under these assumptions the TSLS estimator is consistent for β\beta, and under the additional requirement that Var[ZU]<\operatorname{Var}[Z U]<\infty, it is asymptotically normal with limiting variance

V=[E(ΠZZΠ)]1ΠVar[ZU]Π[E(ΠZZΠ)]1\mathbb{V}=\left[E\left(\Pi^{\prime} Z Z^{\prime} \Pi\right)\right]^{-1} \Pi^{\prime} \operatorname{Var}[Z U] \Pi\left[E\left(\Pi^{\prime} Z Z^{\prime} \Pi\right)\right]^{-1}

Consistency of TSLS

The nature estimator of β\beta under TSLS β^\hat{\beta} satisfies

β^=[Π^n(1n1inZiXi)]1Π^n(1n1inZiYi)Pβ as n.\hat{\beta}=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow} \beta \text { as } n \rightarrow \infty .

Proof:

As Π^=(1niZiZi)1(1niZiXi)\hat{\Pi}=\left(\frac{1}{n} \sum_i Z_i Z_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum_i Z_i X_i^{\prime}\right) P\stackrel{P}{\longrightarrow} Π=E[ZZ]1E[ZX]\Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right], and 1n1inZiXiPE[ZiXi]\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i X_i^{\prime} \stackrel{P}{\longrightarrow} \mathbb{E}\left[Z_i X_i^{\prime}\right], then by Slutsky Theorem and Continuous Mapping Theorem (CMP) (for function f(X)=X1f(X)=X^{-1}), we can have that, for the left part:

(Π^n(1nZiXi))1P(ΠE[ZiXi])1\left(\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum Z_i X_i^{\prime}\right)\right)^{-1} \stackrel{P}{\longrightarrow}\left(\Pi^{\prime} \mathbb{E}\left[Z_i X_i^{\prime}\right]\right)^{-1}

For the right part, similarly, we can get that:

1n1inZiYi=1n1inZi(Xiβ+U)=1n1inZiXiβ+1n1inZiUiPE[ZiXi]β+E[ZiUi]=E[ZiXi]β+0=E[ZiXi]β\begin{aligned} & \frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i Y_i=\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i\left(X_i^{\prime} \beta+U\right) \\ & =\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i X_{i}^{\prime} \beta+\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i U_i \\ & \stackrel{P}{\longrightarrow}\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta+\mathbb{E}[Z_i U_i] \\ & =\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta+0=\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta \\ \end{aligned}

Therefore,

Π^n(1n1inZiYi)PΠE[ZiXi]β\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow}\Pi^{\prime} \mathbb{E}\left[Z_i X_i^{\prime}\right]\beta

Therefore, we can have finished the proof that

β^=[Π^n(1n1inZiXi)]1Π^n(1n1inZiYi)Pβ as n.\hat{\beta}=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow} \beta \text { as } n \rightarrow \infty .

Asymptotic Normality of TSLS

Assume that Var[ZU]=E[ZZU2]<\operatorname{Var}[Z U]=E\left[Z Z^{\prime} U^2\right]<\infty. Then, as nn \rightarrow \infty,

n(β^β)dN(0,V)\sqrt{n}(\hat{\beta}-\beta) \stackrel{d}{\rightarrow} N(0, \mathbb{V})

Based on the estimator of β^\hat{\beta}, we can have that:

β^β=[Π^n(1n1inZiXi)]1Π^n(1n1inZiUi)\hat{\beta}-\beta=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i U_i\right)

By CLT:

n1ni=1nZiUidN(0,Var(ZiUi)).\sqrt{n} \frac{1}{n} \sum_{i=1}^n Z_i U_i \stackrel{d}{\longrightarrow} N\left(0, \operatorname{Var}\left(Z_i U_i\right)\right) .

Then, take this inside, we can have that, based on the Slustky Theorem:

n(β^β)=[Π^1ni=1nZiXi]1Π^(n1ni=1nZiUi)d([Π^1ni=1nZiXi]1Π^)AN(0,Var(ZiUi))W\begin{aligned} \sqrt{n}\left(\hat{\beta}-\beta\right)&=\left[\hat{\Pi}^{\prime} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1} \hat{\Pi}^{\prime}\left(\sqrt{n} \frac{1}{n} \sum_{i=1}^n Z_i U_i\right)\\ &\stackrel{d}{\rightarrow}\underbrace{\left(\left[\hat{\Pi}^{\prime} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1} \hat{\Pi}^{\prime}\right)}_A \underbrace{N\left(0, \operatorname{Var}\left(Z_i U_i\right)\right)}_W \end{aligned}

Since AA is scaler, we can have that:

Var(AW)=E[(AWE[AW])(AWE[AW])]=E[A(WE[W])(WE[W])A]=AE[(WE[W])(WE[W])]A=AVar(W)A\begin{aligned} \operatorname{Var}(A \cdot W) & =\mathbb{E}\left[(A W-\mathbb{E}[A W])(A W-\mathbb{E}[A W])^{\prime}\right] \\ & =\mathbb{E}[A(W-\mathbb{E}[W])(W-\mathbb{E}[W])^{\prime} A^{\prime}] \\ & =A \mathbb{E}[(W-\mathbb{E}[W])(W-\mathbb{E}[W])^{\prime}] A^{\prime} \\ & =A \operatorname{Var}(W) A^{\prime} \end{aligned}

Now we can have that V\mathbb{V} is

V=[Π^(1n1inZiXi)]1Π^Var(W)Π^[Π^(1n1inZiXi)]1\mathbb{V}= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right) \right]^{-1} \hat{\Pi}^{\prime}Var\left(W\right)\hat{\Pi} \left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right) \right]^{-1} }

As we have X=ΠZ+eX=\Pi^{\prime} Z+e \Rightarrow X=ZΠ+eX^{\prime}=Z^{\prime} \Pi+e

Therefore, we can have that,

E[ZiXi]=E[ZiZi]Π+E[ZiEi]=E[ZiZi]ΠE\left[Z_i X_i\right] =E\left[Z_i Z_i^{\prime}\right] \Pi +\mathbb{E}\left[Z_i E_i\right] =E\left[Z_i Z_i^{\prime}\right] \Pi

Now, we can get that:

V=[Π^(1n1inZiZi)Π^]1Π^Var(W)Π^[Π^(1n1inZiZi)Π^]1\mathbb{V}= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} \hat{\Pi}^{\prime}Var\left(W\right)\hat{\Pi} \left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} }

Estimation of V:

A natural estimator of V\mathbb{V} is given by

V^n=[Π^(1n1inZiZi)Π^]1Π^(1n1inZiZiU^i2)Π^[Π^(1n1inZiZi)Π^]1\hat{\mathbb{V}}_n= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} \hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime} \hat{U}_i^2\right) \hat{\Pi} } {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} }

where U^i=YiXiβ^\hat{U}_i=Y_i-X_i^{\prime} \hat{\beta}.

The primary difficulty in establishing the consistency of this estimator lies in showing that

1n1inZiZiU^i2PVar[ZU]\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime} \hat{U}_i^2 \stackrel{P}{\rightarrow} \operatorname{Var}[Z U]

as nn \rightarrow \infty. The complication lies in the fact that we do not observe UiU_i and therefore have to use U^i\hat{U}_i.

Var(ZU)=E[ZUUZ] since E[ZU]=0\operatorname{Var}(Z U)=E\left[Z U \cdot U Z^{\prime}\right] \text { since } E[Z U]=0

However, please note that U^i=YiXiβ^YiX^iβ^\hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} \neq Y_i-\hat{X}_i^{\prime} \hat{\beta}, where Xi^\hat{X_i} is the regressor in the second stage of regression. So the standard errors from two repeated applications of OLS will be incorrect. And Stata is using Xi^\hat{X_i}^{\prime} as default. So, to do the Two-Step Regression correctly, you need to use command ivregress.

Last updated