Interpretation Under Heterogeneity

Recall that in the model

Y=Xβ+UY=X^{\prime} \beta+U

the effect of a change in XX (say, from X=xX=x to X=xX=x^{\prime} ) is the same for everybody.

However, the effect of a change in XX on YY can be different for different people. To capture this: allow for β\beta to be random. When β\beta is random, we may absorb UU into the intercept and simply write

Y=Xβ.Y=X^{\prime} \beta .

Notation

  1. With a random sample where variables are indexed by ii, we would write Yi=XiβiY_i=X_i^{\prime} \beta_i, which makes it explicit that every individual has a unique effect βi\beta_i.

  2. Assume k=1k=1 and write DD in place of X1X_1, which is assumed to take values in {0,1}\{0,1\}, i.e. DD is binary. Then,

    Y=β0+β1D.Y=\beta_0+\beta_1 D .
  3. We interpret β0\beta_0 as Y(0)Y(0) and β1\beta_1 as Y(1)Y(0)Y(1)-Y(0), where Y(1)Y(1) and Y(0)Y(0) are potential or counterfactual outcomes. Using this notation, we may rewrite the equation as Potential Outcome Model:

    Y=DY(1)+(1D)Y(0)orY={Y(1) if D=1Y(0) if D=0.Y=D Y(1)+(1-D) Y(0)\\\text{or}\\Y= \begin{cases}Y(1) & \text { if } D=1 \\ Y(0) & \text { if } D= 0 .\end{cases}
    1. Y(0)Y(0) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) DD were 0 ;

    2. Y(1)Y (1) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) DD were 1.

  4. The variable DD is typically called the treatment

  5. Y(1)Y(0)Y(1)-Y(0) is called the treatment effect. The quantity E[Y(1)Y(0)]E[Y(1)-Y(0)] is usually referred to as the average treatment effect (ATE). This denotes the "treatment effect on the overall population".

  6. The quantity E[Y(1)Y(0)D=1]E[Y(1)-Y(0) \mid D=1] is called the average treatment effect on treatment group (ATET). This denotes the "treatment effect on populations who are treated".

Note that:

The treatment effect is for one individual, however, we cannot observe both outcomes (Y(0)&Y(1)Y(0) \& Y(1)) from one individual at the same time.

Four Types of People in treatment

  1. Always Taker: D(1)=1D(0)=1D(1)=1 \quad D(0)=1

  2. Complier: D(1)=1D(0)=0D(1)=1 \quad D(0)=0

  3. Never Taker: D(1)=0D(0)=0D(1)=0 \quad D(0)=0

  4. Defier: D(1)=0D(0)=1D(1)=0 \quad D(0)=1

Random Assignment

If DD were randomly assigned (e.g., by the flip of a coin), then

(Y(0),Y(1))D(Y(0), Y(1)) \perp D

In this case, under mild assumptions, the slope coefficient from OLS regression of YY on a constant and DD yields a consistent estimate of the average treatment effect.

Note that if DD is randomly assigned or the treated group is randomly drawn from sample (Treated group represents for whole population). Then ATE = ATET.

If DD is binary, we have that: Y=β0+β1D+UY=\beta_0+\beta_1 D+U

β1=Cov(Y,D)Var(D)=E[YD=1]E[YD=0]=E[Y(1)D=1]E[Y(0)D=0]\beta_1=\frac{\operatorname{Cov} ( Y, D)}{\operatorname{Var}(D)}=E[Y \mid D=1]-\mathbb{E}[Y \mid D=0]=\mathbb{E}[Y(1) \mid D=1]-\mathbb{E}[Y(0) \mid D=0]

As (Y(0),Y(1))D(Y(0), Y(1)) \perp D, we can have that β1=E[Y(1)]E[Y(0)]\beta_1 = E[Y(1)]-E[Y(0)] = ATE / ATET

The estimator of β1\beta_1 is

β^1=1ni=1nYiDi1ni=1nYi(1Di)=YˉD=1YˉD=0.\hat{\beta}_1=\frac{1}{n} \sum_{i=1}^n Y_i D_i-\frac{1}{n} \sum_{i=1}^n Y_i \left(1-D_i\right)=\bar{Y}_{D=1}-\bar{Y}_{D=0} .

Therefore, we got plimnβ1^=β1plim_{n \rightarrow \infin} \hat{\beta_1}=\beta_1, which is ATE / ATET.

However, if DD is not randomly assigned, β1=\beta_1 = ATET + Bias. Since under this case,

E[YD=1]E[YD=0]=E[Y1iDi=1]E[Y0iDi=1]average treatment effect on the treated +E[Y0iDi=1]E[Y0iDi=0]selection bias E[Y \mid D=1]-\mathbb{E}[Y \mid D=0] \\= \underbrace{E\left[\mathrm{Y}_{1 i} \mid \mathrm{D}_i=1\right]-E\left[\mathrm{Y}_{0 i} \mid \mathrm{D}_i=1\right]}_{\text {average treatment effect on the treated }}+\underbrace{E\left[\mathrm{Y}_{0 i} \mid \mathrm{D}_i=1\right]-E\left[\mathrm{Y}_{0 i} \mid \mathrm{D}_i=0\right]}_{\text {selection bias }}

This can also be reformed to β1=\beta_1 = ATE + Bias.

Selction

In general, we expect DD to depend on (Y(1),Y(0))(Y(1), Y(0)). Under this condition, OLS does not yield a consistent estimate of the average treatment effect.

Note that as treatment DD is not randomly assigned, which means it is for a selected group, we have that ATE \neq ATET.

To proceed further, we therefore assume, as usual, that there is an instrument ZZ. Let Z0,1Z \in{0,1}.

Consider the slope coefficient from TSLS/IV regression of YY on DD with ZZ as an instrument, ZZ is also binary:

Cov[Y,Z]Cov[D,Z]=E[YZ=1]E[YZ=0]E[DZ=1]E[DZ=0]\frac{\operatorname{Cov}[Y, Z]}{\operatorname{Cov}[D, Z]}=\frac{E[Y \mid Z=1]-E[Y \mid Z=0]}{E[D \mid Z=1]-E[D \mid Z=0]}

Expressing in Terms of Expected Values:

  • The numerator, E[YZ=1]E[YZ=0]E[Y \mid Z=1]-E[Y \mid Z=0], represents the difference in the expected value of Y\mathrm{Y} when Z\mathrm{Z} changes from 0 to 1 .

  • The denominator, E[DZ=1]E[DZ=0]E[D \mid Z=1]-E[D \mid Z=0], represents the difference in the expected value of D\mathrm{D} (take-up of the treatment) when Z\mathrm{Z} changes from 0 to 1.

Take-Up Ratio: The ratio E[DZ=1]E[DZ=0]Var[Z]\frac{E[D \mid Z=1]-E[D \mid Z=0]}{\operatorname{Var}[Z]} can be interpreted as the take-up ratio. It indicates the percentage of people who will take the treatment when offered (i.e., when Z\mathrm{Z} changes from 0 to 1 ). It's a measure of the effectiveness of the instrument in inducing changes in the treatment variable.

Proof:

Standard Regression Formula: In a simple linear regression, without considering endogeneity, the coefficient β1\beta_1 is given by:

β1=Cov(Y,D)Var(D)\beta_1=\frac{\operatorname{Cov}(Y, D)}{\operatorname{Var}(D)}

IV Regression Setup: However, in the presence of endogeneity (where D is correlated with the error term), this formula doesn't yield a consistent estimate. Here's where IV regression using an instrument ZZ comes into play.

Assumptions:

  1. Relevance: ZZ is correlated with D\mathrm{D} (i.e., Cov(D,Z)0\operatorname{Cov}(D, Z) \neq 0 ).

  2. Exogeneity: ZZ is not correlated with the error term in the YY regression.

Two-Stage Least Squares (TSLS) Process:

  1. First Stage: Regress D\mathrm{D} on Z\mathrm{Z} and get the fitted values D^\hat{D} :

D=π0+π1Z+ϵD^=π0+π1Z\begin{aligned} & D=\pi_0+\pi_1 Z+\epsilon \\ & \hat{D}=\pi_0+\pi_1 Z \end{aligned}
  1. Second Stage: Regress Y on D^\hat{D} :

Y=α0+β1D^+uY=\alpha_0+\beta_1 \hat{D}+u

Now, let's derive the IV estimate of β1\beta_1 :

  1. Covariance of YY and D^\hat{D} : The covariance of YY with D^\hat{D} is given by:

Cov(Y,D^)=Cov(Y,π0+π1Z)\operatorname{Cov}(Y, \hat{D})=\operatorname{Cov}\left(Y, \pi_0+\pi_1 Z\right)

Since π0\pi_0 is a constant, it drops out in the covariance, leaving:

Cov(Y,D^)=π1Cov(Y,Z)\operatorname{Cov}(Y, \hat{D})=\pi_1 \operatorname{Cov}(Y, Z)
  1. Variance of D^\hat{D} : Similarly, the variance of $\hat{D}$ is:

Var(D^)=Var(π0+π1Z)\operatorname{Var}(\hat{D})=\operatorname{Var}\left(\pi_0+\pi_1 Z\right)

Again, π0\pi_0 being constant, it drops out, leaving:

Var(D^)=π12Var(Z)\operatorname{Var}(\hat{D})=\pi_1^2 \operatorname{Var}(Z)
  1. Substituting in the Second Stage: In the second stage, β1\beta_1 is estimated as the coefficient of D^\hat{D} in the regression of YY on D^\hat{D} :

β1=Cov(Y,D^)Var(D^)\beta_1=\frac{\operatorname{Cov}(Y, \hat{D})}{\operatorname{Var}(\hat{D})}

Substituting our expressions for Cov(Y,D^)\operatorname{Cov}(Y, \hat{D}) and Var(D^)\operatorname{Var}(\hat{D}) :

β1=π1Cov(Y,Z)π12Var(Z)β1=Cov(Y,Z)π1Var(Z)\begin{aligned} & \beta_1=\frac{\pi_1 \operatorname{Cov}(Y, Z)}{\pi_1^2 \operatorname{Var}(Z)} \\ & \beta_1=\frac{\operatorname{Cov}(Y, Z)}{\pi_1 \operatorname{Var}(Z)} \end{aligned}
  1. Relation to Cov(D,Z)\operatorname{Cov}(D, Z) : From the first stage, we know that π1=Cov(D,Z)Var(Z)\pi_1=\frac{\operatorname{Cov}(D, Z)}{\operatorname{Var}(Z)}. Substituting π1\pi_1 :

β1=Cov(Y,Z)Cov(D,Z)Var(Z)Var(Z)\beta_1=\frac{\operatorname{Cov}(Y, Z)}{\frac{\operatorname{Cov}(D, Z)}{\operatorname{Var}(Z)} \cdot \operatorname{Var}(Z)}

Simplifying, we get:

β1=Cov(Y,Z)Cov(D,Z)\beta_1=\frac{\operatorname{Cov}(Y, Z)}{\operatorname{Cov}(D, Z)}

WALD estimator is the estimator used for this TSLS regression β1\beta_1, it can be represented as:

β^1=E[YZ=1]^E[YZ=0]^E[DZ=1]^E[DZ=0]^\hat{\beta}_1=\frac{\hat{E[Y \mid Z=1]}-\hat{E[Y \mid Z=0]}}{\hat{E[D \mid Z=1]}-\hat{E[D \mid Z=0]}}

Potential Treatments

Now, we want to express this quantity

β1=Cov[Y,Z]Cov[D,Z]=E[YZ=1]E[YZ=0]E[DZ=1]E[DZ=0]\beta_1 = \frac{\operatorname{Cov}[Y, Z]}{\operatorname{Cov}[D, Z]}=\frac{E[Y \mid Z=1]-E[Y \mid Z=0]}{E[D \mid Z=1]-E[D \mid Z=0]}

in terms of the treatment effect Y(1)Y(0)Y(1)-Y(0) somehow.

Towards our goal, it is useful to also introduce the following equation for DD:

D=ZD(1)+(1Z)D(0)=D(0)+(D(1)D(0))Z=π0+π1Z\begin{aligned} D &= Z D(1)+(1-Z) D(0) \\ & =D(0)+(D(1)-D(0)) Z \\ & =\quad \pi_0+\pi_1 Z \end{aligned}

where π0=D(0),π1=D(1)D(0)\pi_0=D(0), \pi_1=D(1)-D(0), and D(1)D(1) and D(0)D(0) are potential or counterfactual treatments.

We impose the following versions of instrument exogeneity and instrument relevance, respectively:

(Y(1),Y(0),D(1),D(0))Z(Y(1), Y(0), D(1), D(0)) \perp Z

Note that, this assumption is actually stronger than the exogeneity.

P{D(1)D(0)}=P{π10}>0P\{D(1) \neq D(0)\}=P\left\{\pi_1 \neq 0\right\}>0

We further assume the following Monotonicity Condition:

P{D(1)D(0)}=P{π10}=1P\{D(1) \geq D(0)\}=P\left\{\pi_1 \geq 0\right\}=1

This monotonicity condition eliminates the defiers. If the monotonicity does not hold, we can have that under this condition, D(0)=0(Z=1)<D(0)=1(Z=0)D(0)=0 \quad (Z=1) < D(0)=1\quad(Z=0), which will include the defiers.

TSLS Estimator to Form Y(1) - Y(0)

Since the potential outcome is E[YZ=1]E[YZ=0]E[Y \mid Z=1]-E[Y \mid Z=0]

as we have

{Y=DY(1)+(1D)Y(0)D=ZD(1)+(1Z)D(0)\left\{\begin{array}{l} Y=D Y(1)+(1-D) Y(0) \\ D=Z D(1)+(1-Z) D(0) \end{array}\right.

We got

E[YZ=1]E[YZ=0]=E[DY(1)+(1D)Y(0)Z=1]E[DY(1)(1D)Y(0)Z=0].=E[D(1)Y(1)+(1D(1))Y(0)Z=1]E[D(0)Y(1)(1D(0))Y(0)Z=0]\begin{aligned} &E[Y \mid Z=1]-E[Y \mid Z=0]\\ & =\mathbb{E}[D Y(1)+(1-D) Y(0) \mid Z=1]-\mathbb{E}[D Y(1)-(1-D) Y(0) \mid Z=0] . \\ & =\mathbb{E}[D(1) Y(1)+(1-D(1)) Y(0) \mid Z=1]-\mathbb{E}[D( 0) Y(1)-(1-D(0)) Y(0) \mid Z=0] \end{aligned}

since the instrument ZZ here is exogenous

=E[D(1)Y(1)+(1D(1))Y0]E[D(0)Y(1)(1D(0))Y(0)]=E{[D(1)D(0)]Y(1)[D(1)D(0)]Y(0)}=E{(D(1)D(0))(Y(1)Y(0))}\begin{gathered} =\mathbb{E}\left[D(1) Y(1)+(1-D(1)) Y_0\right]-\mathbb{E}[D(0) Y(1)-(1-D(0)) Y(0)] \\ =\mathbb{E}\{[D(1)-D(0)] Y(1)-[D(1)-D(0)] Y(0)\} \\ =\mathbb{E}\{(D(1)-D(0))(Y(1)-Y(0))\} \end{gathered}

As D(1)D(0)D(1)-D(0) is always 0 for Always Taker and Never Taker, and by monotonicity condition, we ruled out the Defiers.

=E[(D(1)D(0))(Y(1)Y(0))D(1)D(0)=1]P(D(1)D(0)=1)= \mathbb{E}\left[( D ( 1 ) - D ( 0 ) ) \left(Y(1)-Y\left(0\right)\right) \mid D(1)-D(0)=1\right] P(D(1)-D(0)=1)
=E[(D(1)D(0))(Y(1)Y(0))D(1)D(0)>1]= \mathbb{E}\left[( D ( 1 ) - D ( 0 ) ) \left(Y(1)-Y\left(0\right)\right) \mid D(1)-D(0)>1\right]

Which only focuses on the Compliers.

LATE

The TSLS/IV estimand equals

Cov[Y,Z]Cov[D,Z]=E[Y(1)Y(0)TED(1)>D(0)local ]LATE\frac{\operatorname{Cov}[Y, Z]}{\operatorname{Cov}[D, Z]}=E[\underbrace{Y(1)-Y(0)}_{\mathrm{TE}} \mid \underbrace{D(1)>D(0)}_{\text {local }}] \equiv \mathrm{LATE}

This is called the local average treatment effects. Which denotes the Average treatment effect of the subpopulation of people for whom a change in the value of the instrument switched them from being non-treated to treated: the so-called compliers.

Since Cov[Y,Z]=E[YZ=1]E[YZ=0]=E[Y(1)Y(0)D(1)>D(0)]P(D(1)D(0)=1)\operatorname{Cov}[Y, Z] = E[Y \mid Z=1]-E[Y \mid Z=0] = E\left[Y(1)-Y{(0)} \mid D(1)>D(0)\right] P(D(1)-D(0)=1)

Cov[D,Z]=E[DZ=1]E[DZ=0]=P(D(1)>D(0))\operatorname{Cov}[D, Z] = E[D \mid Z=1]-E[D \mid Z=0]=P\left(D(1)>D(0)\right)

This is because D=ZD(1)+(1Z)D(0)D=Z D(1)+(1-Z) D(0), then

E[DZ=1]E[DZ=0]=E[D(1)Z=1]E[D(0)Z=0]=E[D(1)D(0)]=1P(D(1)D(0)=1)+(1)P(D(1)D(0)=1)+0P(D(1)D(0)=0)=P(D(1)D(0)=1)\begin{aligned} E[D \mid Z=1]-E[D \mid Z=0] = & E[D(1) \mid Z=1]-E[D(0) \mid Z=0]\\ = & E[D(1)-D(0)] \\ = & 1 \cdot P(D(1)-D(0)=1)+(-1) P(D(1)-D(0)=-1) \\ & +0 \cdot P(D(1)-D(0)=0)\\ =&P(D(1)-D(0)=1) \end{aligned}

Because we ruled out the defiers.

Monotonicity

As before, we have shown that monotonicity will rule out the defiers.

Monotonicity: while the instrument may have no effect on some people, all those who are affected are affected in the same way. Without monotonicity, we would have

E[YZ=1]E[YZ=0]=E[Y(1)Y(0)D(1)>D(0)]P{D(1)>D(0)}E[Y(1)Y(0)D(1)<D(0)]P{D(1)<D(0)}.\begin{gathered} E[Y \mid Z=1]-E[Y \mid Z=0]=E[Y(1)-Y(0) \mid D(1)>D(0)] P\{D(1)> \\ D(0)\}-E[Y(1)-Y(0) \mid D(1)<D(0)] P\{D(1)<D(0)\} . \end{gathered}

Treatment effects may be positive for everyone (i.e., Y(1)Y(0)>0Y(1)-Y(0)>0 ) yet the reduced form is zero because effects on compliers are canceled out by effects on defiers, i.e., those individuals for which the instrument pushes them out of treatment (D(1)=0(D(1)=0 and D(0)=1)D(0)=1).

This doesn't come up in a constant effect model where β=Y(1)Y(0)\beta=Y(1)-Y(0) is constant, as in such case

E[YZ=1]E[YZ=0]=β{P{D(1)>D(0)}P{D(1)<D(0)}}=βE[D(1)D(0)],\begin{aligned} E[Y \mid Z=1]-E[Y \mid Z=0] & =\beta\{P\{D(1)>D(0)\}-P\{D(1)<D(0)\}\} \\ & =\beta E[D(1)-D(0)], \end{aligned}

and so a zero reduced-form effect means either the first stage is zero or β=0\beta=0.

Monotonicity with one-sided Compliance

Randomized trial with non-compliance: the treatment assignment as an "offer of treatment" ZZ (the instrument) and the actual treatment DD determines whether the subject actually had the treatment.

Assume no one in the control group has access to the treatment: D(0)=0D(0)=0 while D(1){0,1}D(1) \in\{0,1\}, in this case, Monotonicity automatically holds: D(1)D(0)D(1) \geq D(0)

Since D(1)D(1) is a choice, a comparison between those actually treated (D=1)(D=1) and the control (D=0)(D=0) group is misleading. Two alternatives are frequently used.

Intention to Treat Effect: a comparison between those who were offered treatment (Z=1)(Z=1) and the control (Z=0)(Z=0) group.

In this case: LATE = ATT: IV using ZZ as an instrumental variable for DD, which leads to LATE. Since D(0)=0D(0)=0, LATE returns the effect of treatment on the treated (ATT).

LATE: E[Y(1)Y(0)D(1)>D(0)]E[Y(1)-Y(0) \mid D(1)>D(0)]

ATT: E[Y(1)Y(0)D=1]E[Y(1)-Y(0) \mid D=1]

Last updated