Two-Stage Least Squares
The Two-Stage Least Squares (TSLS) is used when the number of instrument variables is greater than the number of explanatory variables. Means to use it in the following condition:
Over-identified case: l > k \text { Over-identified case: } l>k Over-identified case: l > k The expressions we derived for β \beta β in this case, like
β = E [ Π ′ E [ Z X ′ ] ] − 1 Π ′ E [ Z Y ] \beta=E\left[\Pi^{\prime} E\left[Z X^{\prime}\right]\right]^{-1} \Pi^{\prime} E[Z Y] β = E [ Π ′ E [ Z X ′ ] ] − 1 Π ′ E [ Z Y ] all involved the matrix Π \Pi Π , where
B L P ( X ∣ Z ) = Π ′ Z B L P(X \mid Z)=\Pi^{\prime} Z B L P ( X ∣ Z ) = Π ′ Z An estimate of Π \Pi Π can be obtained by OLS. Since Π = E [ Z Z ′ ] − 1 E [ Z X ′ ] \Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right] Π = E [ Z Z ′ ] − 1 E [ Z X ′ ] , a natural estimator of Π \Pi Π is
Π ^ = ( 1 n ∑ i Z i Z i ′ ) − 1 ( 1 n ∑ i Z i X i ′ ) . \hat{\Pi}=\left(\frac{1}{n} \sum_i Z_i Z_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum_i Z_i X_i^{\prime}\right) . Π ^ = ( n 1 i ∑ Z i Z i ′ ) − 1 ( n 1 i ∑ Z i X i ′ ) . Let X i = Π ^ ′ Z i + V ^ i X_i=\hat{\Pi}^{\prime} Z_i+\hat{V}_i X i = Π ^ ′ Z i + V ^ i , with above estimator of Π \Pi Π , a natural estimator of β \beta β is simply:
β ^ = [ Π ^ 1 n ∑ i = 1 n Z i X i ′ ] − 1 [ Π ^ 1 n ∑ i = 1 n Z i Y i ] \hat{\beta}=\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i Y_i\right] β ^ = [ Π ^ n 1 i = 1 ∑ n Z i X i ′ ] − 1 [ Π ^ n 1 i = 1 ∑ n Z i Y i ] Proof:
β ^ = [ Π ^ 1 n ∑ i = 1 n Z i X i ′ ] − 1 [ Π ^ 1 n ∑ i = 1 n Z i Y i ] = [ Π ^ 1 n ∑ i = 1 n Z i X i ′ ] − 1 [ Π ^ 1 n ∑ i = 1 n Z i ( X i ′ β + U ) ] → p E [ Z X ′ ] − 1 ( E [ Z X ′ β ] + E [ Z U ] ) = β + E [ Z X ′ ] − 1 E [ Z U ] by Instrument Exogeneity: E [ Z U ] = 0 = β \begin{aligned} \hat{\beta}&=\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi}\frac{1}{n} \sum_{i=1}^n Z_i Y_i\right] \\ & =\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1}\left[\hat{\Pi} \frac{1}{n} \sum_{i=1}^n Z_i\left(X_i^{\prime} \beta+U\right)\right] \\ & \stackrel{p}{\rightarrow} \mathbb{E}\left[Z X^{\prime}\right]^{-1}\left(\mathbb{E}\left[Z X^{\prime} \beta\right]+\mathbb{E}[Z U]\right) \\ & =\beta+\mathbb{E}\left[Z X^{\prime}\right]^{-1} \mathbb{E}[Z U] \quad\text{ by Instrument Exogeneity: } \mathbb{E}[Z U]=0\\ & =\beta \\\end{aligned} β ^ = [ Π ^ n 1 i = 1 ∑ n Z i X i ′ ] − 1 [ Π ^ n 1 i = 1 ∑ n Z i Y i ] = [ Π ^ n 1 i = 1 ∑ n Z i X i ′ ] − 1 [ Π ^ n 1 i = 1 ∑ n Z i ( X i ′ β + U ) ] → p E [ Z X ′ ] − 1 ( E [ Z X ′ β ] + E [ Z U ] ) = β + E [ Z X ′ ] − 1 E [ Z U ] by Instrument Exogeneity: E [ Z U ] = 0 = β Note that β ^ n \hat{\beta}_n β ^ n satisfies
1 n ∑ i Π ^ ′ Z i ( Y i − X i ′ β ^ ) = 0. \frac{1}{n} \sum_i \hat{\Pi}^{\prime} Z_i\left(Y_i-X_i^{\prime} \hat{\beta}\right)=0 . n 1 i ∑ Π ^ ′ Z i ( Y i − X i ′ β ^ ) = 0. In particular, U ^ i = Y i − X i ′ β ^ \hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} U ^ i = Y i − X i ′ β ^ satisfies
1 n ∑ i Π ^ ′ Z i U ^ i = 0 \frac{1}{n} \sum_i \hat{\Pi}^{\prime} Z_i \hat{U}_i=0 n 1 i ∑ Π ^ ′ Z i U ^ i = 0 This implies that U ^ i \hat{U}_i U ^ i is orthogonal to all of the instruments equal to an exogenous regressors, but may not be orthogonal to the other regressors.
It is termed the TSLS estimator because it may be obtained in the following way:
Regress (each component of) X i X_i X i on Z i Z_i Z i to obtain X ^ i = Π ^ ′ Z i \hat{X}_i=\hat{\Pi}^{\prime} Z_i X ^ i = Π ^ ′ Z i
Regress Y i Y_i Y i on X ^ i \hat{X}_i X ^ i to obtain β ^ \hat{\beta} β ^ . However, in order to obtain proper standard errors, it is recommended to compute the estimator in one step
Matrix Notation
This estimator may be expressed more compactly using matrix notation. Define
Z = ( Z 1 , … , Z n ) ′ X = ( X 1 , … , X n ) ′ Y = ( Y 1 , … , Y n ) ′ X ^ = ( X ^ 1 , … , X ^ n ) ′ = P Z X , \begin{aligned} \mathbb{Z} & =\left(Z_1, \ldots, Z_n\right)^{\prime} \\ \mathbb{X} & =\left(X_1, \ldots, X_n\right)^{\prime} \\ \mathbb{Y} & =\left(Y_1, \ldots, Y_n\right)^{\prime} \\ \hat{\mathbb{X}} & =\left(\hat{X}_1, \ldots, \hat{X}_n\right)^{\prime} \\ & =\mathbb{P}_Z \mathbb{X}, \end{aligned} Z X Y X ^ = ( Z 1 , … , Z n ) ′ = ( X 1 , … , X n ) ′ = ( Y 1 , … , Y n ) ′ = ( X ^ 1 , … , X ^ n ) ′ = P Z X , where
P Z = Z ( Z ′ Z ) − 1 Z ′ \mathbb{P}_Z=\mathbb{Z}\left(\mathbb{Z}^{\prime} \mathbb{Z}\right)^{-1} \mathbb{Z}^{\prime} P Z = Z ( Z ′ Z ) − 1 Z ′ is the projection matrix onto the column space of Z \mathbb{Z} Z . In this notation, we have
β ^ = ( X ^ ′ X ) − 1 ( X ^ ′ Y ) = ( X ^ ′ X ^ ) − 1 ( X ^ ′ Y ) = ( X ′ P Z X ) − 1 ( X ′ P Z Y ) \begin{aligned} \hat{\beta} & =\left(\hat{\mathbb{X}}^{\prime} \mathbb{X}\right)^{-1}\left(\hat{\mathbb{X}}^{\prime} \mathbb{Y}\right) \\ & =\left(\hat{\mathbb{X}}^{\prime} \hat{\mathbb{X}}\right)^{-1}\left(\hat{\mathbb{X}}^{\prime} \mathbb{Y}\right) \\ & =\left(\mathbb{X}^{\prime} \mathbb{P}_Z \mathbb{X}\right)^{-1}\left(\mathbb{X}^{\prime} \mathbb{P}_Z \mathbb{Y}\right) \end{aligned} β ^ = ( X ^ ′ X ) − 1 ( X ^ ′ Y ) = ( X ^ ′ X ^ ) − 1 ( X ^ ′ Y ) = ( X ′ P Z X ) − 1 ( X ′ P Z Y ) Properties of Two-Stage Least Squares
Let ( Y , X , U ) (Y, X, U) ( Y , X , U ) be a random vector where Y Y Y and U U U take values in R \mathbf{R} R and X X X takes values in R k + 1 \mathbf{R}^{k+1} R k + 1 . Assume further that the first component of X X X is constant and equal to one, i.e., X = ( X 0 , X 1 , … , X k ) ′ X=\left(X_0, X_1, \ldots, X_k\right)^{\prime} X = ( X 0 , X 1 , … , X k ) ′ with X 0 = 1 X_0=1 X 0 = 1 . Let β = ( β 0 , β 1 , … , β k ) ′ ∈ R k + 1 \beta=\left(\beta_0, \beta_1, \ldots, \beta_k\right)^{\prime} \in \mathbf{R}^{k+1} β = ( β 0 , β 1 , … , β k ) ′ ∈ R k + 1 be such that
Y = X ′ β + U Y=X^{\prime} \beta+U Y = X ′ β + U Estimation in OLS is inconsistent and biased if E [ X U ] ≠ 0 E[XU]\neq0 E [ X U ] = 0
We assume:
E [ Z U ] = 0 E[Z U]=0 E [ Z U ] = 0 : Exclusion Condition: variable need to be valid IV
E [ Z X ′ ] < ∞ E\left[Z X^{\prime}\right]<\infty E [ Z X ′ ] < ∞ : Regularity condition
E [ Z Z ′ ] < ∞ E\left[Z Z^{\prime}\right]<\infty E [ Z Z ′ ] < ∞ : Regularity condition
There is no perfect collinearity in Z Z Z
The rank of E [ Z X ′ ] E\left[Z X^{\prime}\right] E [ Z X ′ ] is k + 1 k+1 k + 1 : Relevance Condition
Let ( Y 1 , X 1 , Z 1 ) , … , ( Y n , X n , Z n ) \left(Y_1, X_1, Z_1\right), \ldots,\left(Y_n, X_n, Z_n\right) ( Y 1 , X 1 , Z 1 ) , … , ( Y n , X n , Z n ) be an i.i.d. sequence of random variables with distribution P P P .
Under these assumptions the TSLS estimator is consistent for β \beta β , and under the additional requirement that Var [ Z U ] < ∞ \operatorname{Var}[Z U]<\infty Var [ Z U ] < ∞ , it is asymptotically normal with limiting variance
V = [ E ( Π ′ Z Z ′ Π ) ] − 1 Π ′ Var [ Z U ] Π [ E ( Π ′ Z Z ′ Π ) ] − 1 \mathbb{V}=\left[E\left(\Pi^{\prime} Z Z^{\prime} \Pi\right)\right]^{-1} \Pi^{\prime} \operatorname{Var}[Z U] \Pi\left[E\left(\Pi^{\prime} Z Z^{\prime} \Pi\right)\right]^{-1} V = [ E ( Π ′ Z Z ′ Π ) ] − 1 Π ′ Var [ Z U ] Π [ E ( Π ′ Z Z ′ Π ) ] − 1 Consistency of TSLS
The nature estimator of β \beta β under TSLS β ^ \hat{\beta} β ^ satisfies
β ^ = [ Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i X i ′ ) ] − 1 Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Y i ) → P β as n → ∞ . \hat{\beta}=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow} \beta \text { as } n \rightarrow \infty . β ^ = [ Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i X i ′ ) ] − 1 Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i Y i ) → P β as n → ∞. Proof:
As Π ^ = ( 1 n ∑ i Z i Z i ′ ) − 1 ( 1 n ∑ i Z i X i ′ ) \hat{\Pi}=\left(\frac{1}{n} \sum_i Z_i Z_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum_i Z_i X_i^{\prime}\right) Π ^ = ( n 1 ∑ i Z i Z i ′ ) − 1 ( n 1 ∑ i Z i X i ′ ) ⟶ P \stackrel{P}{\longrightarrow} ⟶ P Π = E [ Z Z ′ ] − 1 E [ Z X ′ ] \Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right] Π = E [ Z Z ′ ] − 1 E [ Z X ′ ] , and 1 n ∑ 1 ⩽ i ⩽ n Z i X i ′ ⟶ P E [ Z i X i ′ ] \frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i X_i^{\prime} \stackrel{P}{\longrightarrow} \mathbb{E}\left[Z_i X_i^{\prime}\right] n 1 ∑ 1 ⩽ i ⩽ n Z i X i ′ ⟶ P E [ Z i X i ′ ] , then by Slutsky Theorem and Continuous Mapping Theorem (CMP) (for function f ( X ) = X − 1 f(X)=X^{-1} f ( X ) = X − 1 ), we can have that, for the left part:
( Π ^ n ′ ( 1 n ∑ Z i X i ′ ) ) − 1 ⟶ P ( Π ′ E [ Z i X i ′ ] ) − 1 \left(\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum Z_i X_i^{\prime}\right)\right)^{-1} \stackrel{P}{\longrightarrow}\left(\Pi^{\prime} \mathbb{E}\left[Z_i X_i^{\prime}\right]\right)^{-1} ( Π ^ n ′ ( n 1 ∑ Z i X i ′ ) ) − 1 ⟶ P ( Π ′ E [ Z i X i ′ ] ) − 1 For the right part, similarly, we can get that:
1 n ∑ 1 ⩽ i ⩽ n Z i Y i = 1 n ∑ 1 ⩽ i ⩽ n Z i ( X i ′ β + U ) = 1 n ∑ 1 ⩽ i ⩽ n Z i X i ′ β + 1 n ∑ 1 ⩽ i ⩽ n Z i U i ⟶ P E [ Z i X i ′ ] β + E [ Z i U i ] = E [ Z i X i ′ ] β + 0 = E [ Z i X i ′ ] β \begin{aligned} & \frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i Y_i=\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i\left(X_i^{\prime} \beta+U\right) \\ & =\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i X_{i}^{\prime} \beta+\frac{1}{n} \sum_{1 \leqslant i \leqslant n} Z_i U_i \\ & \stackrel{P}{\longrightarrow}\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta+\mathbb{E}[Z_i U_i] \\ & =\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta+0=\mathbb{E}\left[Z_i X_{i}^{\prime}\right] \beta \\ \end{aligned} n 1 1 ⩽ i ⩽ n ∑ Z i Y i = n 1 1 ⩽ i ⩽ n ∑ Z i ( X i ′ β + U ) = n 1 1 ⩽ i ⩽ n ∑ Z i X i ′ β + n 1 1 ⩽ i ⩽ n ∑ Z i U i ⟶ P E [ Z i X i ′ ] β + E [ Z i U i ] = E [ Z i X i ′ ] β + 0 = E [ Z i X i ′ ] β Therefore,
Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Y i ) → P Π ′ E [ Z i X i ′ ] β \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow}\Pi^{\prime} \mathbb{E}\left[Z_i X_i^{\prime}\right]\beta Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i Y i ) → P Π ′ E [ Z i X i ′ ] β Therefore, we can have finished the proof that
β ^ = [ Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i X i ′ ) ] − 1 Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Y i ) → P β as n → ∞ . \hat{\beta}=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Y_i\right) \stackrel{P}{\rightarrow} \beta \text { as } n \rightarrow \infty . β ^ = [ Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i X i ′ ) ] − 1 Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i Y i ) → P β as n → ∞. Asymptotic Normality of TSLS
Assume that Var [ Z U ] = E [ Z Z ′ U 2 ] < ∞ \operatorname{Var}[Z U]=E\left[Z Z^{\prime} U^2\right]<\infty Var [ Z U ] = E [ Z Z ′ U 2 ] < ∞ . Then, as n → ∞ n \rightarrow \infty n → ∞ ,
n ( β ^ − β ) → d N ( 0 , V ) \sqrt{n}(\hat{\beta}-\beta) \stackrel{d}{\rightarrow} N(0, \mathbb{V}) n ( β ^ − β ) → d N ( 0 , V ) Based on the estimator of β ^ \hat{\beta} β ^ , we can have that:
β ^ − β = [ Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i X i ′ ) ] − 1 Π ^ n ′ ( 1 n ∑ 1 ≤ i ≤ n Z i U i ) \hat{\beta}-\beta=\left[\hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right)\right]^{-1} \hat{\Pi}_n^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i U_i\right) β ^ − β = [ Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i X i ′ ) ] − 1 Π ^ n ′ ( n 1 1 ≤ i ≤ n ∑ Z i U i ) By CLT:
n 1 n ∑ i = 1 n Z i U i ⟶ d N ( 0 , Var ( Z i U i ) ) . \sqrt{n} \frac{1}{n} \sum_{i=1}^n Z_i U_i \stackrel{d}{\longrightarrow} N\left(0, \operatorname{Var}\left(Z_i U_i\right)\right) . n n 1 i = 1 ∑ n Z i U i ⟶ d N ( 0 , Var ( Z i U i ) ) . Then, take this inside, we can have that, based on the Slustky Theorem:
n ( β ^ − β ) = [ Π ^ ′ 1 n ∑ i = 1 n Z i X i ′ ] − 1 Π ^ ′ ( n 1 n ∑ i = 1 n Z i U i ) → d ( [ Π ^ ′ 1 n ∑ i = 1 n Z i X i ′ ] − 1 Π ^ ′ ) ⏟ A N ( 0 , Var ( Z i U i ) ) ⏟ W \begin{aligned} \sqrt{n}\left(\hat{\beta}-\beta\right)&=\left[\hat{\Pi}^{\prime} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1} \hat{\Pi}^{\prime}\left(\sqrt{n} \frac{1}{n} \sum_{i=1}^n Z_i U_i\right)\\ &\stackrel{d}{\rightarrow}\underbrace{\left(\left[\hat{\Pi}^{\prime} \frac{1}{n} \sum_{i=1}^n Z_i X_i^{\prime}\right]^{-1} \hat{\Pi}^{\prime}\right)}_A \underbrace{N\left(0, \operatorname{Var}\left(Z_i U_i\right)\right)}_W \end{aligned} n ( β ^ − β ) = [ Π ^ ′ n 1 i = 1 ∑ n Z i X i ′ ] − 1 Π ^ ′ ( n n 1 i = 1 ∑ n Z i U i ) → d A [ Π ^ ′ n 1 i = 1 ∑ n Z i X i ′ ] − 1 Π ^ ′ W N ( 0 , Var ( Z i U i ) ) Since A A A is scaler, we can have that:
Var ( A ⋅ W ) = E [ ( A W − E [ A W ] ) ( A W − E [ A W ] ) ′ ] = E [ A ( W − E [ W ] ) ( W − E [ W ] ) ′ A ′ ] = A E [ ( W − E [ W ] ) ( W − E [ W ] ) ′ ] A ′ = A Var ( W ) A ′ \begin{aligned} \operatorname{Var}(A \cdot W) & =\mathbb{E}\left[(A W-\mathbb{E}[A W])(A W-\mathbb{E}[A W])^{\prime}\right] \\ & =\mathbb{E}[A(W-\mathbb{E}[W])(W-\mathbb{E}[W])^{\prime} A^{\prime}] \\ & =A \mathbb{E}[(W-\mathbb{E}[W])(W-\mathbb{E}[W])^{\prime}] A^{\prime} \\ & =A \operatorname{Var}(W) A^{\prime} \end{aligned} Var ( A ⋅ W ) = E [ ( A W − E [ A W ]) ( A W − E [ A W ] ) ′ ] = E [ A ( W − E [ W ]) ( W − E [ W ] ) ′ A ′ ] = A E [( W − E [ W ]) ( W − E [ W ] ) ′ ] A ′ = A Var ( W ) A ′ Now we can have that V \mathbb{V} V is
V = [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i X i ′ ) ] − 1 Π ^ ′ V a r ( W ) Π ^ [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i X i ′ ) ] − 1 \mathbb{V}= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right) \right]^{-1} \hat{\Pi}^{\prime}Var\left(W\right)\hat{\Pi} \left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i X_i^{\prime}\right) \right]^{-1} } V = [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i X i ′ ) ] − 1 Π ^ ′ Va r ( W ) Π ^ [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i X i ′ ) ] − 1 As we have X = Π ′ Z + e X=\Pi^{\prime} Z+e X = Π ′ Z + e ⇒ \Rightarrow ⇒ X ′ = Z ′ Π + e X^{\prime}=Z^{\prime} \Pi+e X ′ = Z ′ Π + e
Therefore, we can have that,
E [ Z i X i ] = E [ Z i Z i ′ ] Π + E [ Z i E i ] = E [ Z i Z i ′ ] Π E\left[Z_i X_i\right] =E\left[Z_i Z_i^{\prime}\right] \Pi +\mathbb{E}\left[Z_i E_i\right] =E\left[Z_i Z_i^{\prime}\right] \Pi E [ Z i X i ] = E [ Z i Z i ′ ] Π + E [ Z i E i ] = E [ Z i Z i ′ ] Π Now, we can get that:
V = [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Z i ′ ) Π ^ ] − 1 Π ^ ′ V a r ( W ) Π ^ [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Z i ′ ) Π ^ ] − 1 \mathbb{V}= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} \hat{\Pi}^{\prime}Var\left(W\right)\hat{\Pi} \left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} } V = [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i Z i ′ ) Π ^ ] − 1 Π ^ ′ Va r ( W ) Π ^ [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i Z i ′ ) Π ^ ] − 1 Estimation of V:
A natural estimator of V \mathbb{V} V is given by
V ^ n = [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Z i ′ ) Π ^ ] − 1 Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Z i ′ U ^ i 2 ) Π ^ [ Π ^ ′ ( 1 n ∑ 1 ≤ i ≤ n Z i Z i ′ ) Π ^ ] − 1 \hat{\mathbb{V}}_n= {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} \hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime} \hat{U}_i^2\right) \hat{\Pi} } {\left[\hat{\Pi}^{\prime}\left(\frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime}\right) \hat{\Pi}\right]^{-1} } V ^ n = [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i Z i ′ ) Π ^ ] − 1 Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i Z i ′ U ^ i 2 ) Π ^ [ Π ^ ′ ( n 1 1 ≤ i ≤ n ∑ Z i Z i ′ ) Π ^ ] − 1 where U ^ i = Y i − X i ′ β ^ \hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} U ^ i = Y i − X i ′ β ^ .
The primary difficulty in establishing the consistency of this estimator lies in showing that
1 n ∑ 1 ≤ i ≤ n Z i Z i ′ U ^ i 2 → P Var [ Z U ] \frac{1}{n} \sum_{1 \leq i \leq n} Z_i Z_i^{\prime} \hat{U}_i^2 \stackrel{P}{\rightarrow} \operatorname{Var}[Z U] n 1 1 ≤ i ≤ n ∑ Z i Z i ′ U ^ i 2 → P Var [ Z U ] as n → ∞ n \rightarrow \infty n → ∞ . The complication lies in the fact that we do not observe U i U_i U i and therefore have to use U ^ i \hat{U}_i U ^ i .
Var ( Z U ) = E [ Z U ⋅ U Z ′ ] since E [ Z U ] = 0 \operatorname{Var}(Z U)=E\left[Z U \cdot U Z^{\prime}\right] \text { since } E[Z U]=0 Var ( Z U ) = E [ Z U ⋅ U Z ′ ] since E [ Z U ] = 0
However, please note that U ^ i = Y i − X i ′ β ^ ≠ Y i − X ^ i ′ β ^ \hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} \neq Y_i-\hat{X}_i^{\prime} \hat{\beta} U ^ i = Y i − X i ′ β ^ = Y i − X ^ i ′ β ^ , where X i ^ \hat{X_i} X i ^ is the regressor in the second stage of regression. So the standard errors from two repeated applications of OLS will be incorrect. And Stata is using X i ^ ′ \hat{X_i}^{\prime} X i ^ ′ as default. So, to do the Two-Step Regression correctly, you need to use command ivregress
.