Let (Y,X,U) be a random vector where Y and U take values in R and X takes values in Rk+1. Assume further that the first component of X is a constant equal to one. Let β∈Rk+1 be such that
Y=X′β+U
We assume that our model satisfies the following conditions:
E[XU]=0
E[XX′]<∞
There is no perfect collinearity in X
Denote the marginal distribution of (Y,X) by P. And let (Y1,X1),…,(Yn,Xn) be an i.i.d. sample of random vectors with distribution P.
And the properties we will discuss next are:
Bias
Gauss-Markov Theorem
Consistency
Asymptotic Normality
Bias
Unbiasedness
Proof:
Based on the OLS formula, we have that:
Note that:
So we can have that:
Now we finished the proof for unbiasedness.
Biasedness from Omitted Variable
For example, in a regression about factors that affects wage:
To calculate the omitted variable bias, we can use the following steps:
By the formula of OLS, we have that:
Note that:
Gauss-Markov Theorem
Homoskedastic and Heteroskedastic
And there are two ways to check the homoskedastic and heteroskedastic:
Gauss-Markov Theorem
Now we do the following proofs:
Proof:
Proof:
Consistency
Proof:
Based on the OLS formula, we can have that:
Therefore, we can have that :
Now we finished our proof for consistency.
Asymptotic Normality
We already assume that our model satisfies the three assumptions in the previous analysis:
Now, we add a forth assumption, which is:
As we have calculated before,
Therefore, we have
We have shown that:
Under the first assumption, E[U∣X]=0 (i.e., E[Y∣X]=X′β) it follows that E[β^]=β
E[XU]=E[E[XU∣X]]=E[XE[U∣X]]=0⇒E[U∣X]=0
β^=(i=1∑nXiXi′)−1(i=1∑nXiYi)
Take Yi=Xi′β+ui into our formula, we can have that:
If Xis are i.i.d, then E[ui∣X1,⋯,Xn]=E[ui∣Xi]. Otherwise, this equation may not hold.
Both our assupmtion E[U∣X]=0 and E[ui∣X1,⋯,Xn] can be a sufficient condition for us to finish the proof.
As E[ui∣X1,⋯,Xn]=E[ui∣Xi]=0, we can have that E[β^∣X1,⋯Xn]=β
E[β^]=E[E[β^∣X]]=E[β]=β
If E[β^]=β, then β^ is biased, bias(β^)=E[β^]−β, this may because of the omitted variable (some related variable).
If bias(β^)>0: β^ is overestimated.
If bias(β^)<0: β^ is underestimated.
wage =β0+β1 edu+u
We regressed wage on the education level (edu). However, there are other factors that may not be independent of the education level that also affects wage, like ability, motivation, .etc. So in this example, we have Cov(edu,u)=0, therefore, this will lead to the biased estimation of β1.
Build a long regression: Y=X1′β1+X2′β2+e, where E[X1e]=0,E[X2e]=0
Build a short regression: Y=X1′γ1+u, where E[X1u]=0
As E[X1e]=0, and denote Γ12 as E[X1X1′]−1E[X1X2′], we can have that:
γ1=β1+Γ12β2
Γ12 can be defined as X2=X1′Γ12+e, which is the projection of X2 on X1. However, in the real world, it is very hard to estimate because we usually cannot observe omitted variables.
As X1 stands for education, and X2 stands for ability or motivation, based on the real world experience, we can easily conclude that:
Better ability/motivation are more likely leads to higher wage: β2>0 is highly likely.
Better ability/motivation are more likely leads to higher education: Γ12>0 is highly likely.
Therefore, we have that the bias we got from the omitted variable: Γ12β2 is highly likely to be greater than 0. Therefore, in our short regression, we are very likely overestimated the effect of education on wage.
Suppose E[U∣X]=0 and that Var[U∣X]=σ2.
When Var[U∣X] is constant (and therefore does not depend on X ) we say that U is homoskedastic.
Otherwise, we say that U is heteroskedastic.
We can do regression for Var[U∣X] on X, if covariance for X is significantly different with 0.
We can also plot the value of Var[U∣X], and check the plot to see if there is any trend in the plot.
Gauss-Markov Theorem: under these assumptions the OLS estimator is "best" in the sense that it has the "smallest" value of Var[A′Y∣X1,…,Xn]among all estimators of the form
A′Y
for some matrix A=A(X1,…,Xn) satisfying
E[A′Y∣X1,…,Xn]=β
Note that A′Y is linear in Y, so we might be able to get better in variance using the WLS. Gauss-Markove Theorem is just for OLS case.
Denote the variance matrix as B, we can have that the "smallest" is understood as the partial order obtained by B≥B~ if B−B~ is positive semi-definite.
Taking above matrix comparison skill in to our theorem, the "best" unbiased linear estimator is obtained by finding the matrix A0 satisfying A0′X=Ik such that A0′A0 is minimized in the positive definite sense, which means that for any other matrix A satisfying A′X=Ik then A′A−A0′A0 is positive semi-definite.
This class of estimators includes the OLS estimator as a special case (by setting A′=(X′X)−1X′. The property is sometimes expressed as saying that OLS estimator is the "best linear unbiased estimator (BLUE)"of β under these assumptions.
The Gauss-Markov theorem provides a lower bound on the variance matrix of unbiased linear estimators under the assumption of homoskedasticity. It says that no unbiased linear estimator can have a variance matrix smaller (in the positive definite sense) than σ2(X′X)−1.
The estimator is A′Y for A=A(X1,…,Xn) and satisfies E[A′Y∣X1,…,Xn]=β.
As we have Y=Xβ+U⇒A′Y=A′(Xβ+U)⇒A′Y=A′Xβ+A′U
Now we can have that, E[A′Y∣X1,…,Xn]=E[A′Xβ+A′U∣X1,…,Xn]
As A=A(X1,…,Xn) depend on X1,…,Xn, and X⊥U, we can have that:
E[A′Xβ∣X1,…,Xn]=A′Xβ & E[A′U∣X1,…,Xn]=0
So we got that E[A′Y∣X1,…,Xn]=A′Xβ, combine this with A′X=I, we finished the proof that
E[A′Y∣X1,…,Xn]=β
Show A′A−(X′X)−1 is positive semi-definite for any A satisfying A′X=I