the effect of a change in X (say, from X=x to X=x′ ) is the same for everybody.
However, the effect of a change in X on Y can be different for different people. To capture this: allow for β to be random. When β is random, we may absorb U into the intercept and simply write
Y=X′β.
Notation
With a random sample where variables are indexed by i, we would write Yi=Xi′βi, which makes it explicit that every individual has a unique effect βi.
Assume k=1 and write D in place of X1, which is assumed to take values in {0,1}, i.e. D is binary. Then,
Y=β0+β1D.
We interpret β0 as Y(0) and β1 as Y(1)−Y(0), where Y(1) and Y(0) are potential or counterfactual outcomes. Using this notation, we may rewrite the equation as Potential Outcome Model:
Y=DY(1)+(1−D)Y(0)orY={Y(1)Y(0) if D=1 if D=0.
Y(0) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) D were 0 ;
Y(1) denotes the value of the outcome that would have been observed if (possibly counter-to-fact) D were 1.
The variable D is typically called the treatment
Y(1)−Y(0) is called the treatment effect. The quantity E[Y(1)−Y(0)] is usually referred to as the average treatment effect (ATE). This denotes the "treatment effect on the overall population".
The quantity E[Y(1)−Y(0)∣D=1] is called the average treatment effect on treatment group (ATET). This denotes the "treatment effect on populations who are treated".
Note that:
Four Types of People in treatment
Random Assignment
Selction
Expressing in Terms of Expected Values:
Proof:
Assumptions:
Two-Stage Least Squares (TSLS) Process:
Simplifying, we get:
Potential Treatments
Now, we want to express this quantity
We impose the following versions of instrument exogeneity and instrument relevance, respectively:
Note that, this assumption is actually stronger than the exogeneity.
We further assume the following Monotonicity Condition:
TSLS Estimator to Form Y(1) - Y(0)
as we have
We got
Which only focuses on the Compliers.
LATE
The TSLS/IV estimand equals
This is called the local average treatment effects. Which denotes the Average treatment effect of the subpopulation of people for whom a change in the value of the instrument switched them from being non-treated to treated: the so-called compliers.
Because we ruled out the defiers.
Monotonicity
As before, we have shown that monotonicity will rule out the defiers.
Monotonicity: while the instrument may have no effect on some people, all those who are affected are affected in the same way. Without monotonicity, we would have
Monotonicity with one-sided Compliance
The treatment effect is for one individual, however, we cannot observe both outcomes (Y(0)&Y(1)) from one individual at the same time.
Always Taker: D(1)=1D(0)=1
Complier: D(1)=1D(0)=0
Never Taker: D(1)=0D(0)=0
Defier: D(1)=0D(0)=1
If D were randomly assigned (e.g., by the flip of a coin), then
(Y(0),Y(1))⊥D
In this case, under mild assumptions, the slope coefficient from OLS regression of Y on a constant and D yields a consistent estimate of the average treatment effect.
Note that if D is randomly assigned or the treated group is randomly drawn from sample (Treated group represents for whole population). Then ATE = ATET.
The numerator, E[Y∣Z=1]−E[Y∣Z=0], represents the difference in the expected value of Y when Z changes from 0 to 1 .
The denominator, E[D∣Z=1]−E[D∣Z=0], represents the difference in the expected value of D (take-up of the treatment) when Z changes from 0 to 1.
Take-Up Ratio: The ratio Var[Z]E[D∣Z=1]−E[D∣Z=0] can be interpreted as the take-up ratio. It indicates the percentage of people who will take the treatment when offered (i.e., when Z changes from 0 to 1 ). It's a measure of the effectiveness of the instrument in inducing changes in the treatment variable.
Standard Regression Formula: In a simple linear regression, without considering endogeneity, the coefficient β1 is given by:
β1=Var(D)Cov(Y,D)
IV Regression Setup: However, in the presence of endogeneity (where D is correlated with the error term), this formula doesn't yield a consistent estimate. Here's where IV regression using an instrument Z comes into play.
Relevance: Z is correlated with D (i.e., Cov(D,Z)=0 ).
Exogeneity: Z is not correlated with the error term in the Y regression.
First Stage: Regress D on Z and get the fitted values D^ :
D=π0+π1Z+ϵD^=π0+π1Z
Second Stage: Regress Y on D^ :
Y=α0+β1D^+u
Now, let's derive the IV estimate of β1 :
Covariance of Y and D^ : The covariance of Y with D^ is given by:
Cov(Y,D^)=Cov(Y,π0+π1Z)
Since π0 is a constant, it drops out in the covariance, leaving:
Cov(Y,D^)=π1Cov(Y,Z)
Variance of D^ : Similarly, the variance of $\hat{D}$ is:
Var(D^)=Var(π0+π1Z)
Again, π0 being constant, it drops out, leaving:
Var(D^)=π12Var(Z)
Substituting in the Second Stage: In the second stage, β1 is estimated as the coefficient of D^ in the regression of Y on D^ :
β1=Var(D^)Cov(Y,D^)
Substituting our expressions for Cov(Y,D^) and Var(D^) :
in terms of the treatment effect Y(1)−Y(0) somehow.
Towards our goal, it is useful to also introduce the following equation for D:
D=ZD(1)+(1−Z)D(0)=D(0)+(D(1)−D(0))Z=π0+π1Z
where π0=D(0),π1=D(1)−D(0), and D(1) and D(0) are potential or counterfactual treatments.
(Y(1),Y(0),D(1),D(0))⊥Z
P{D(1)=D(0)}=P{π1=0}>0
P{D(1)≥D(0)}=P{π1≥0}=1
This monotonicity condition eliminates the defiers. If the monotonicity does not hold, we can have that under this condition, D(0)=0(Z=1)<D(0)=1(Z=0), which will include the defiers.
Treatment effects may be positive for everyone (i.e., Y(1)−Y(0)>0 ) yet the reduced form is zero because effects on compliers are canceled out by effects on defiers, i.e., those individuals for which the instrument pushes them out of treatment (D(1)=0 and D(0)=1).
This doesn't come up in a constant effect model where β=Y(1)−Y(0) is constant, as in such case
and so a zero reduced-form effect means either the first stage is zero or β=0.
Randomized trial with non-compliance: the treatment assignment as an "offer of treatment" Z (the instrument) and the actual treatment D determines whether the subject actually had the treatment.
Assume no one in the control group has access to the treatment: D(0)=0 while D(1)∈{0,1}, in this case, Monotonicity automatically holds: D(1)≥D(0)
Since D(1) is a choice, a comparison between those actually treated (D=1) and the control (D=0) group is misleading. Two alternatives are frequently used.
Intention to Treat Effect: a comparison between those who were offered treatment (Z=1) and the control (Z=0) group.
In this case: LATE = ATT: IV using Z as an instrumental variable for D, which leads to LATE. Since D(0)=0, LATE returns the effect of treatment on the treated (ATT).