the effect of a change in X (say, from X=x to X=x′ ) is the same for everybody.
However, the effect of a change in X on Y can be different for different people. To capture this: allow for β to be random. When β is random, we may absorb U into the intercept and simply write
Y=X′β.
Notation
With a random sample where variables are indexed by i, we would write Yi=Xi′βi, which makes it explicit that every individual has a unique effect βi.
Assume k=1 and write D in place of X1, which is assumed to take values in {0,1}, i.e. D is binary. Then,
Y=β0+β1D.
We interpret β0 as Y(0) and β1 as Y(1)−Y(0), where Y(1) and Y(0) are potential or counterfactual outcomes. Using this notation, we may rewrite the equation as Potential Outcome Model:
Y=DY(1)+(1−D)Y(0)orY={Y(1)Y(0) if D=1 if D=0.
Y(0) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) D were 0 ;
Y(1) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) D were 1.
The variable D is typically called the treatment
Y(1)−Y(0) is called the treatment effect. The quantity E[Y(1)−Y(0)] is usually referred to as the average treatment effect (ATE). This denotes the "treatment effect on the overall population".
The quantity E[Y(1)−Y(0)∣D=1] is called the average treatment effect on treatment group (ATET). This denotes the "treatment effect on populations who are treated".
Note that:
The treatment effect is for one individual, however, we cannot observe both outcomes (Y(0)&Y(1)) from one individual at the same time.
Four Types of People in treatment
Always Taker: D(1)=1D(0)=1
Complier: D(1)=1D(0)=0
Never Taker: D(1)=0D(0)=0
Defier: D(1)=0D(0)=1
Random Assignment
If D were randomly assigned (e.g., by the flip of a coin), then
(Y(0),Y(1))⊥D
In this case, under mild assumptions, the slope coefficient from OLS regression of Y on a constant and D yields a consistent estimate of the average treatment effect.
Note that if D is randomly assigned or the treated group is randomly drawn from sample (Treated group represents for whole population). Then ATE = ATET.
The numerator, E[Y∣Z=1]−E[Y∣Z=0], represents the difference in the expected value of Y when Z changes from 0 to 1 .
The denominator, E[D∣Z=1]−E[D∣Z=0], represents the difference in the expected value of D (take-up of the treatment) when Z changes from 0 to 1.
Take-Up Ratio: The ratio Var[Z]E[D∣Z=1]−E[D∣Z=0] can be interpreted as the take-up ratio. It indicates the percentage of people who will take the treatment when offered (i.e., when Z changes from 0 to 1 ). It's a measure of the effectiveness of the instrument in inducing changes in the treatment variable.
Proof:
Standard Regression Formula: In a simple linear regression, without considering endogeneity, the coefficient β1 is given by:
β1=Var(D)Cov(Y,D)
IV Regression Setup: However, in the presence of endogeneity (where D is correlated with the error term), this formula doesn't yield a consistent estimate. Here's where IV regression using an instrument Z comes into play.
Assumptions:
Relevance: Z is correlated with D (i.e., Cov(D,Z)=0 ).
Exogeneity: Z is not correlated with the error term in the Y regression.
Two-Stage Least Squares (TSLS) Process:
First Stage: Regress D on Z and get the fitted values D^ :
D=π0+π1Z+ϵD^=π0+π1Z
Second Stage: Regress Y on D^ :
Y=α0+β1D^+u
Now, let's derive the IV estimate of β1 :
Covariance of Y and D^ : The covariance of Y with D^ is given by:
Cov(Y,D^)=Cov(Y,π0+π1Z)
Since π0 is a constant, it drops out in the covariance, leaving:
Cov(Y,D^)=π1Cov(Y,Z)
Variance of D^ : Similarly, the variance of $\hat{D}$ is:
Var(D^)=Var(π0+π1Z)
Again, π0 being constant, it drops out, leaving:
Var(D^)=π12Var(Z)
Substituting in the Second Stage: In the second stage, β1 is estimated as the coefficient of D^ in the regression of Y on D^ :
β1=Var(D^)Cov(Y,D^)
Substituting our expressions for Cov(Y,D^) and Var(D^) :
in terms of the treatment effect Y(1)−Y(0) somehow.
Towards our goal, it is useful to also introduce the following equation for D:
D=ZD(1)+(1−Z)D(0)=D(0)+(D(1)−D(0))Z=π0+π1Z
where π0=D(0),π1=D(1)−D(0), and D(1) and D(0) are potential or counterfactual treatments.
We impose the following versions of instrument exogeneity and instrument relevance, respectively:
(Y(1),Y(0),D(1),D(0))⊥Z
Note that, this assumption is actually stronger than the exogeneity.
P{D(1)=D(0)}=P{π1=0}>0
We further assume the following Monotonicity Condition:
P{D(1)≥D(0)}=P{π1≥0}=1
This monotonicity condition eliminates the defiers. If the monotonicity does not hold, we can have that under this condition, D(0)=0(Z=1)<D(0)=1(Z=0), which will include the defiers.
This is called the local average treatment effects. Which denotes the Average treatment effect of the subpopulation of people for whom a change in the value of the instrument switched them from being non-treated to treated: the so-called compliers.
Since Cov[Y,Z]=E[Y∣Z=1]−E[Y∣Z=0]=E[Y(1)−Y(0)∣D(1)>D(0)]P(D(1)−D(0)=1)
As before, we have shown that monotonicity will rule out the defiers.
Monotonicity: while the instrument may have no effect on some people, all those who are affected are affected in the same way. Without monotonicity, we would have
Treatment effects may be positive for everyone (i.e., Y(1)−Y(0)>0 ) yet the reduced form is zero because effects on compliers are canceled out by effects on defiers, i.e., those individuals for which the instrument pushes them out of treatment (D(1)=0 and D(0)=1).
This doesn't come up in a constant effect model where β=Y(1)−Y(0) is constant, as in such case
and so a zero reduced-form effect means either the first stage is zero or β=0.
Monotonicity with one-sided Compliance
Randomized trial with non-compliance: the treatment assignment as an "offer of treatment" Z (the instrument) and the actual treatment D determines whether the subject actually had the treatment.
Assume no one in the control group has access to the treatment: D(0)=0 while D(1)∈{0,1}, in this case, Monotonicity automatically holds: D(1)≥D(0)
Since D(1) is a choice, a comparison between those actually treated (D=1) and the control (D=0) group is misleading. Two alternatives are frequently used.
Intention to Treat Effect: a comparison between those who were offered treatment (Z=1) and the control (Z=0) group.
In this case: LATE = ATT: IV using Z as an instrumental variable for D, which leads to LATE. Since D(0)=0, LATE returns the effect of treatment on the treated (ATT).