Instrumental Variables

Instrumental Variables

In order to overcome the difficulty associated with E[XU]0E[X U] \neq 0 (endogeneity). We assume that there is an additional random vector ZZ taking values in Rl+1\mathbf{R}^{l+1} with l+1k+1l+1 \geq k+1 such that E[ZU]=0E[Z U]=0.

Any exogenous component of XX is contained in ZZ (the so-called included instruments). In particular, we assume the first component of ZZ is constant equal to one, i.e., Z=(Z0,Z1,,Zl)Z=\left(Z_0, Z_1, \ldots, Z_l\right)^{\prime} with Z0=1Z_0=1.

We also assume that E[ZX]<,E[ZZ]<E\left[Z X^{\prime}\right]<\infty, E\left[Z Z^{\prime}\right]<\infty and that there is no perfect collinearity in ZZ.

In summary, we assume:

  1. E[ZU]=0E[Z U]=0: Instrument Exogeneity

  2. E[ZX]<E\left[Z X^{\prime}\right]<\infty

  3. E[ZZ]<E\left[Z Z^{\prime}\right]<\infty

  4. There is no perfect collinearity in ZZ

  5. We further assume the rank of E[ZX]E\left[Z X^{\prime}\right] is k+1k+1. This is termed Instrument Relevance or Rank Condition.

    A necessary condition for 5 to be true is lkl \geq k. This is referred to as the Order Condition.

To further understand the IV estimator, we need to know the following:

For regression

Y=Xβ+UY=X^{\prime} \beta+U
  • XX: dimension is (k+1)×1(k+1) \times 1

  • ZZ: dimension is (+1)×1(\ell+1) \times 1, and lkl \geqslant k

  • ZZ only influence YYthrough XX, not through UU. this means that ZZ only correlated with XX but not correlated with UU.

Here is an example of using IV in real-world empirical analysis:

To analyze the education level of out-of-service military people education level's (e.g., years of schooling) relationship with proximity to the school (e.g., distance to the nearest school or a measure of school accessibility), we can choose draft lottery return status (e.g., whether an individual was likely to be drafted for military service) as an Instrument Variable. Now, we check the relevance and exogenity of the draft lottery return status as an IV.

  • Relevance: The draft lottery status could affect an individual's proximity to school, as those with a higher likelihood of being drafted might choose to stay in school longer or closer to educational institutions.

  • Exogeneity: The draft lottery status is presumably random and thus not correlated with the unobserved factors affecting education levels directly.

Therefore, the draft lottery return status can serve as a good IV.

Solving For Beta

Using that U=YXβU=Y-X^{\prime} \beta and E[ZU]=0E[Z U]=0, we see that β\beta solves the system of equations

E[ZY]=E[ZX]βE[Z Y]=E\left[Z X^{\prime}\right] \beta

Proof:

E[ZY]=E[Z(U+Xβ)]=0+E[ZX]β=E[ZX]βE[Z Y]=E\left[Z(U+X^{\prime}\beta)\right] = 0 + E\left[Z X^{\prime}\right] \beta = E\left[Z X^{\prime}\right] \beta

Note that the invertible of E[ZX]E\left[Z X^{\prime}\right] is not guaranteed. This is because since l+1k+1l+1 \geq k+1, this may be an over-determined system of equations. There is more information than we need. Which can be shown in the following:

Z=(1Z1Zl)(l+1)×1X=(1X1Xk)(k+1)×1β=(1β1βk)(k+1)×1Z=\left(\begin{array}{c} 1 \\ Z_1 \\ \vdots \\ Z_l \end{array}\right)_{(l+1) \times 1} \quad X=\left(\begin{array}{c} 1 \\ X_1 \\ \vdots \\ X_k \end{array}\right)_{(k + 1) \times 1} \quad \beta=\left(\begin{array}{c} 1\\ \beta_1 \\ \vdots \\ \beta_k \end{array}\right)_{(k+1) \times 1}

Therefore, in order to solve for β\beta, we introduce the following lemma:

Lemma

Suppose there is no perfect collinearity in ZZ and let Π\Pi be such that BLP(XZ)=ΠZ.E[ZX]B L P(X \mid Z)=\Pi^{\prime} Z . E\left[Z X^{\prime}\right] has rank k+1k+1 if and only if Π\Pi has rank k+1k+1. Moreover, the matrix ΠE[ZX]\Pi^{\prime} E\left[Z X^{\prime}\right] is invertible.

Note that if some XjX_j are exogenous, then we do not need IVs for them.

Solve for Beta

As β solves: E[ZY]=E[ZX]β or ΠE[ZY]=ΠE[ZX]β\beta \text { solves: } E[Z Y]=E\left[Z X^{\prime}\right] \beta \text { or } \Pi^{\prime} E[Z Y]=\Pi^{\prime} E\left[Z X^{\prime}\right]\beta, then we can have that using the previous lemma and Π=E[ZZ]1E[ZX]\Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right], we can derive three formulae for β\beta

Since we have X=(1X1Xk)Z=(1ZZl)X=\left(\begin{array}{c}1 \\ X_1 \\ \vdots \\ X_k\end{array}\right) \quad Z=\left(\begin{array}{c}1 \\ Z \\ \vdots \\ Z_l\end{array}\right), and Π=[Π0,Π1,,Πk]\Pi=\left[\Pi_0, \Pi_1, \cdots, \Pi_k\right], the shape of Π\Pi is (l+1)×(k+1)(l+1)\times(k+1).

First regression XX on instrument variable ZZ:

X0=ZΠ0+U0=1X1=ZΠ1+U1X2=ZΠ2+U2Xk=ZΠk+Uk\begin{aligned} X_0&=Z^{\prime} \Pi_0+U_0=1 \\ X_1&=Z^{\prime} \Pi_1+U_1 \\ X_2&=Z^{\prime} \Pi_2+U_2 \\ \vdots \\ X_k&=Z^{\prime} \Pi_k+U_k \end{aligned}

Then we can get the BLP of Πi\Pi_is from these equations:

Π1=(EZZ)1E[ZX1]Π2=(EZZ)1E[ZX2].Πk=(EZZ)1E[ZXk]\begin{aligned} & \Pi_1=\left(\mathbb{E} Z Z^{\prime}\right)^{-1} \mathbb{E}\left[Z X_1\right] \\ & \Pi_2=\left(\mathbb{E} Z Z^{\prime}\right)^{-1} \mathbb{E}\left[Z X_2\right] . \\ & \vdots \\ & \Pi_k=\left(\mathbb{E} Z Z^{\prime}\right)^{-1} \mathbb{E}\left[Z X_k\right] \end{aligned}

Note that if X1X_1 is exogenous, then Z1=X1Z_1=X_1, projection X1X_1 on ZZ, we can have that,

Π1=(0100) where β1=1\Pi_1=\left(\begin{array}{c} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{array}\right)\text{ where }\beta_1=1

This is because, in X1X_1's regression on ZZ, we can have that X1=β0+β1X1+β2Z2++UX_1=\beta_0+\beta_1 X_1+\beta_2 Z_2+\cdots+U, in this regression model, X1X_1 will perfectly explain itself.

Equation 1: IV estination

By ΠE[ZY]=ΠE[ZX]β\Pi^{\prime} E[Z Y]=\Pi^{\prime} E\left[Z X^{\prime}\right] \beta, we have that,

β=[ΠE(ZX)]1ΠE[ZY]\beta=\left[\Pi^{\prime} E\left(Z X^{\prime}\right)\right]^{-1} \Pi^{\prime} E[Z Y]

As Π=E[ZZ]1E[ZX]\Pi=E\left[Z Z^{\prime}\right]^{-1} E\left[Z X^{\prime}\right], and if l=kl=k, we can have that:

β=[E(ZX)(E(ZZ))1E(ZX)]1E(ZX)(E(ZZ))1E(ZY)=E(ZX)1E(ZZ)[E(ZX)]1E(ZX)E(ZZ)1E(ZY)=E(ZX)1E(ZY)\begin{aligned} \beta & =\left[E(Z X^{\prime})^{\prime}\left(E\left(Z Z^{\prime}\right)\right)^{-1} E(Z X^{\prime})\right]^{-1} E\left(Z X^{\prime}\right)^{\prime} (E(Z Z^{\prime}))^{-1} E(ZY) \\ & =E\left(Z X^{\prime}\right)^{-1} E\left(Z Z^{\prime}\right)\left[E\left(Z X^{\prime}\right)^{\prime}\right]^{-1}E\left(Z X^{\prime}\right)^{\prime} E\left(Z Z^{\prime}\right)^{-1} E (Z Y) \\ & =E\left(Z X^{\prime}\right)^{-1} E (Z Y) \end{aligned}

Equation 2: TSLS Version 1

As we can have that:

X=(1X1Xk)=(ZΠ0ZΠ1ZΠk)+(U0U1Uk)=ΠZ+U.X=\left(\begin{array}{c} 1\\ X_1 \\ \vdots \\ X_k \end{array}\right)=\left(\begin{array}{c} Z^{\prime} \Pi_0 \\ Z^{\prime} \Pi_1 \\ \vdots \\ Z^{\prime} \Pi_k \end{array}\right)+\left(\begin{array}{c} U_0 \\ U_1 \\ \vdots \\ U_k \end{array}\right)=\Pi^{\prime} Z+U .

As ΠZ\Pi^{\prime}Z is the BLP of XX, we can have that E[U]=0 and E[ZU]=0E[U]=0 \text{ and } E[ZU]=0. Take this into the formula β=[ΠE(ZX)]1ΠE[ZY]\beta=\left[\Pi^{\prime} E\left(Z X^{\prime}\right)\right]^{-1} \Pi^{\prime} E[Z Y], we can have that.

β=[ΠE[Z(ΠZ+U)]]1ΠE[ZY]=[ΠE[ZZ]Π+ΠE[ZU]]1ΠE[ZY]=[ΠE[ZZ]Π]1ΠE[ZY]\begin{aligned} \beta & =\left[\Pi^{\prime} E\left[Z\left(\Pi^{\prime} Z+U\right)^{\prime}\right]\right]^{-1} \Pi^{\prime} E[Z Y] \\ & =\left[\Pi^{\prime} E\left[Z Z^{\prime}\right] \Pi+\Pi^{\prime}E[ZU]\right]^{-1} \Pi^{\prime} E[Z Y] \\ & =\left[\Pi^{\prime} E\left[Z Z^{\prime}\right] \Pi\right]^{-1} \Pi^{\prime} E[Z Y] \end{aligned}

Equation 3: TSLS Version 2

As Π\Pi^{\prime} is a matrix of real numbers, we can put it in the expectation. So β\beta can be rewritten as

β=[E[(ΠZ)(ΠZ)]]1E[(ΠZ)Y]\beta=\left[\mathbb{E}\left[\left(\Pi^{\prime} Z\right)\left(\Pi^{\prime} Z\right)^{\prime}\right]\right]^{-1} \mathbb{E}\left[\left(\Pi^{\prime} Z\right) Y\right]

We can denote that W=ΠZW = \Pi^{\prime}Z, which stands for the linear combination of instruments. Take this into the formula of β\beta, we have

β=(E[WW])1E[WY]\beta=\left(E\left[W W^{\prime}\right]\right)^{-1} E[W Y]

Interpreting The Rank Condition (Instrument Relevance)

  • The rank condition for IV estimation is a technical requirement that ensures the IV or set of IVs provides enough information to identify the model.

  • Essentially, it requires that the matrix of instruments ZZ should have sufficient rank so that the projection matrix ΠΠ adequately captures the relevant information in the endogenous variables.

Interpretation: Consider the case where k=lk=l and only XkX_k is endogenous. Let Zj=XjZ_j=X_j for all 0jk10 \leq j \leq k-1. In this case,

Π=(100001000010π0π1πl1πl)\Pi^{\prime}=\left(\begin{array}{ccccc} 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & \ldots & 1 & 0 \\ \pi_0 & \pi_1 & \ldots & \pi_{l-1} & \pi_l \end{array}\right)

The rank condition therefore requires πl0\pi_l \neq 0 : the instrument ZlZ_l must be "correlated with XkX_k after controlling for X0,X1,,Xk1X_0, X_1, \ldots, X_{k-1}."

  1. Strong IV:

    • A strong IV is highly correlated with the endogenous explanatory variable.

    • This strong correlation ensures that the IV effectively captures the variation in the endogenous variable that is not related to the error term in the regression model.

    • Strong IVs lead to more reliable and precise estimates in IV regression.

  2. Weak IV:

    • A weak IV has a weak correlation with the endogenous explanatory variable.

    • This weak correlation means that the IV does not effectively capture the variation in the endogenous variable, making it less effective in dealing with endogeneity.

    • Weak IVs can lead to biased estimates and poor inference because they do not provide a good substitute for the endogenous variable.

In our scenario,

  • If πlπ_l​ is close to zero, ZlZ_l​ is a weak IV because it does not add much explanatory power beyond what is already captured by X0,X1,,Xk1X_0,X_1,…,X_{k−1}.

  • If πlπ_l​ is significantly different from zero, ZlZ_l​ is considered a strong IV, as it is meaningfully correlated with the endogenous variable XkX_k​, independent of the other explanatory variables.

  • If ΠΠ is Near-Singularity, we can have that this is a weak IV estimator that doesn't explain XX well. This weak IV estimator problem will lead to:

    • Large Variance: When the instrument is weak, the variance of the IV estimator becomes very large. This leads to wide confidence intervals, making it difficult to draw precise inferences about the parameters.

    • Biased Estimation: In small samples, a weak instrument can lead to biased estimates, and these biases can be as bad or even worse than the OLS estimates that suffer from endogeneity.

    • Inconsistent Estimation: In theory, the IV estimator is consistent as the sample size approaches infinity. However, with a weak instrument, the convergence to the true parameter value can be very slow, leading to practical issues in estimation, even with large samples.

In summary, if the ZZ doesn't satisfy the rank condition (relevance condition), here are the consequences:

  • Weak Instrument Problem: If the IV is weakly correlated (or not correlated) with the endogenous variable, it results in a weak instrument problem. This can lead to biased and inconsistent estimates, similar to or even worse than the original OLS estimates that were affected by endogeneity.

  • Inefficiency: The estimates may also have large standard errors, leading to inefficiency and making it difficult to draw reliable inferences.

  • Identification Issue: In the extreme case where there is no correlation at all, the model becomes unidentified, meaning that you cannot reliably estimate the coefficients of the endogenous variables.

Interpreting The Exogeneity Condition

Consequence of Violation:

  • Biased and Inconsistent Estimates: If the IV is correlated with the error term, the estimates will be biased and inconsistent. This is because the instrument is not isolating only the exogenous variation in the endogenous explanatory variable—it's also capturing some of the effects that should be in the error term.

  • Invalid Inferences: Any inferences made about the effect of the endogenous explanatory variable on the dependent variable would be invalid because they are contaminated by the correlation with the error term.

Partition of Beta: Endogenous Components

Note that the IV can also be the variable itself, in following regression:

Y=X1β1+X2β2+UY=X_1^{\prime} \beta_1+X_2^{\prime} \beta_2+U
  • If E[X1U]0E\left[X_1 U\right] \neq 0: we can choose to find an IV Z1Z_1 for X1X_1, such that E[Z1U]=0E[Z_1 U]=0

  • If E[X2U]=0E\left[X_2 U\right]=0 : X2X_2 itself can be view as an IV for X2X_2, such as Z2=X2Z_2 = X_2

Here, we partition XX into X1X_1 and X2X_2, where X2X_2 is exogenous. Partition ZZ into Z1Z_1 and Z2Z_2 and β\beta into β1\beta_1 and β2\beta_2 analogously.

We have that, in this model:

  • Z2=X2Z_2=X_2 are included instruments

  • Z1Z_1 are excluded instruments

We can conveniently re-write this by projecting (BLP)(B L P) on Z2=X2Z_2=X_2. Consider the case k=lk=l

BLP(YZ2)=BLP(X1Z2)β1+X2β2.B L P\left(Y \mid Z_2\right)=B L P\left(X_1 \mid Z_2\right)^{\prime} \beta_1+X_2^{\prime} \beta_2 .

Define Y=YBLP(YZ2)Y^*=Y-B L P\left(Y \mid Z_2\right) and X1=X1BLP(X1Z2)X_1^*=X_1-B L P\left(X_1 \mid Z_2\right) so that

E[Z1Y]=E[Z1X1]β1+E[Z1U]E\left[Z_1 Y^*\right]=E\left[Z_1 X_1^{* \prime}\right] \beta_1+E\left[Z_1 U\right]

It follows that

β1=(E[Z1X1])1E[Z1Y]\beta_1=\left(E\left[Z_1 X_1^{* \prime}\right]\right)^{-1} E\left[Z_1 Y^*\right]

IV Estimator

The IV estimator is used under the case that number of instrument variables is equal to the number of explanatory variables. Means using it in the following condition:

 Just-identified case: l=k\text { Just-identified case: } l=k

Denote PP the marginal distribution of (Y,X,Z)(Y, X, Z). Let (Y1,X1,Z1),,(Yn,Xn,Zn)\left(Y_1, X_1, Z_1\right), \ldots,\left(Y_n, X_n, Z_n\right) be an i.i.d. sequence of random variables with distribution PP.

By analogy with β=(E[ZX])1E[ZY]\beta=\left(E\left[Z X^{\prime}\right]\right)^{-1} E[Z Y], the natural estimator of β\beta is simply

β^=(1niZiXi)1(1niZiYi).\hat{\beta}=\left(\frac{1}{n} \sum_i Z_i X_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum_i Z_i Y_i\right) .

This estimator is called the instrumental variables (IV) estimator of β\beta. Note that β^\hat{\beta} satisfies

1niZi(YiXiβ^)=0\frac{1}{n} \sum_i Z_i\left(Y_i-X_i^{\prime} \hat{\beta}\right)=0

In particular, U^i=YiXiβ^\hat{U}_i=Y_i-X_i^{\prime} \hat{\beta} satisfies

1niZiU^i=0.\frac{1}{n} \sum_i Z_i \hat{U}_i=0 .

Insight on the IV estimator: assume X0=1X_0=1 and X1RX_1 \in \mathbf{R}. An interesting interpretation of the IV estimator β1\beta_1 is obtained by multiplying and dividing by 1ni=1n(Z1,iZˉ1,n)2\frac{1}{n} \sum_{i=1}^n\left(Z_{1, i}-\bar{Z}_{1, n}\right)^2, i.e.,

β^1=1ni=1n(Z1,iZˉ1,n)Yi/1ni=1n(Z1,iZˉ1,n)21ni=1n(Z1,iZˉ1,n)X1,i/1ni=1n(Z1,iZˉ1,n)2= slope of Y on Z slope of X on Z.\begin{aligned} \hat{\beta}_1 & =\frac{\frac{1}{n} \sum_{i=1}^n\left(Z_{1, i}-\bar{Z}_{1, n}\right) Y_i / \frac{1}{n} \sum_{i=1}^n\left(Z_{1, i}-\bar{Z}_{1, n}\right)^2}{\frac{1}{n} \sum_{i=1}^n\left(Z_{1, i}-\bar{Z}_{1, n}\right) X_{1, i} / \frac{1}{n} \sum_{i=1}^n\left(Z_{1, i}-\bar{Z}_{1, n}\right)^2} \\ & =\frac{\text { slope of } Y \text { on } Z}{\text { slope of } X \text { on } Z} . \end{aligned}

Matrix Notation

This estimator may be expressed more compactly using matrix notation. Define

Z=(Z1,,Zn)X=(X1,,Xn)Y=(Y1,,Yn)\begin{aligned} & \mathbb{Z}=\left(Z_1, \ldots, Z_n\right)^{\prime} \\ & \mathbb{X}=\left(X_1, \ldots, X_n\right)^{\prime} \\ & \mathbb{Y}=\left(Y_1, \ldots, Y_n\right)^{\prime} \end{aligned}

In this notation, we have

β^=(ZX)1(ZY).\hat{\beta}=\left(\mathbb{Z}^{\prime} \mathbb{X}\right)^{-1}\left(\mathbb{Z}^{\prime} \mathbb{Y}\right) .

Last updated