# Properties of LS

Let $$(Y, X, U)$$ be a random vector where $$Y$$ and $$U$$ take values in $$\mathbf{R}$$ and $$X$$ takes values in $$\mathbf{R}^{k+1}$$. Assume further that the first component of $$X$$ is a constant equal to one. Let $$\beta \in \mathbf{R}^{k+1}$$ be such that

$$
Y=X^{\prime} \beta+U
$$

We assume that our model satisfies the following conditions:

1. $$E\[X U]=0$$
2. $$E\left\[X X^{\prime}\right]<\infty$$
3. There is no perfect collinearity in $$X$$

Denote the marginal distribution of $$(Y,X)$$ by $$P$$. And let $$\left(Y\_1, X\_1\right), \ldots,\left(Y\_n, X\_n\right)$$ be an i.i.d. sample of random vectors with distribution $$P$$.

And the properties we will discuss next are:

* Bias
* Gauss-Markov Theorem
* Consistency
* Asymptotic Normality

## Bias

### Unbiasedness

Under the first assumption, $$E\[U \mid X]=0$$ (i.e., $$E\[Y \mid X]=X^{\prime} \beta$$) it follows that $$E\[\hat{\beta}]=\beta$$

**Proof:**

$$E\[X U]=E\[E\[XU \mid X]] = E\[XE\[U \mid X]] =0 \Rightarrow E\[U \mid X] =0$$

Based on the OLS formula, we have that:&#x20;

$$
\hat{\beta}=\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1}\left(\sum\_{i=1}^n X\_i Y\_i\right)
$$

Take $$Y\_i = X\_i^{\prime}\beta + u\_i$$ into our formula, we can have that:

$$
\begin{aligned} \hat{\beta} & =\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \sum\_{i=1}^n X\_i\left(X\_i^{\prime} \beta+u\_i\right) \ & =\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \sum\_{i=1}^n X\_i X\_i^{\prime} \beta+\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \sum\_{i=1}^n X\_i u\_i \ & =\beta+\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \sum\_{i=1}^n X\_i u\_i \end{aligned}
$$

As we can have that $$E\[\hat{\beta}]=E\[E\[\hat{\beta} \mid X]]$$, we can have the following proof:

$$
\mathbb{E}\left\[\hat{\beta} \mid X\_1, \cdots X\_n\right]=\beta+\left(\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \sum\_{i=1}^n X\_i \mathbb{E}\[u\_i \mid X\_1, \cdots ,X\_n]
$$

Note that:&#x20;

* If $$X\_i$$s are i.i.d, then $$\mathbb{E}\left\[u\_i \mid X\_1, \cdots, X\_n\right] = \mathbb{E}\[u\_i \mid X\_i]$$. Otherwise, this equation may not hold.
* Both our assupmtion $$E\[U \mid X]=0$$ and $$\mathbb{E}\left\[u\_i \mid X\_1, \cdots, X\_n\right]$$ can be a sufficient condition for us to finish the proof.

As $$\mathbb{E}\left\[u\_i \mid X\_1, \cdots, X\_n\right] = \mathbb{E}\left\[u\_i \mid X\_i\right]=0$$, we can have that $$\mathbb{E}\left\[\hat{\beta} \mid X\_1, \cdots X\_n\right]=\beta$$

So we can have that:&#x20;

$$
E\[\hat{\beta}]=E\[E\[\hat{\beta} \mid X]] = E\[\beta] = \beta
$$

Now we finished the proof for unbiasedness.

### Biasedness from Omitted Variable

If $$\mathbb{E}\[\hat{\beta}] \neq \beta$$, then $$\hat{\beta}$$ is biased, $$\text{bias}(\hat{\beta}) = \mathbb{E}\[\hat{\beta}] - \beta$$, this may because of the omitted variable (some related variable).

* If $$\operatorname{bias}(\hat{\beta}) > 0$$: $$\hat{\beta}$$ is overestimated.
* If $$\operatorname{bias}(\hat{\beta}) < 0$$: $$\hat{\beta}$$ is underestimated.

For example, in a regression about factors that affects wage:&#x20;

$$
\text { wage }=\beta\_0+\beta\_1 \text { edu}+u
$$

We regressed wage on the education level (edu). However, there are other factors that may not be independent of the education level that also affects wage, like ability, motivation, .etc. So in this example, we have $$\operatorname{Cov}(e d u, u) \neq 0$$, therefore, this will lead to the biased estimation of $$\beta\_1$$.

To calculate the omitted variable bias, we can use the following steps:

1. Build a long regression: $$Y=X\_1^{\prime} \beta\_1+X\_2^{\prime} \beta\_2+e$$, where $$\mathbb{E}\left\[X\_1 e\right]=0, \mathbb{E}\left\[X\_2 e\right]=0$$
2. Build a short regression: $$Y=X\_1^{\prime} \gamma\_1+u$$, where $$\mathbb{E}\left\[X\_1 u\right]=0$$
3. By the formula of OLS, we have that:

$$
\begin{aligned} \gamma\_1& =\mathbb{E}\left\[X\_1X\_1^{\prime}\right]^{-1} \mathbb{E}\left\[X\_1Y\right] \ & =\mathbb{E}\left\[X\_1 X\_1^{\prime}\right]^{-1} \mathbb{E}\left\[X\_1\left(X\_1^{\prime} \beta\_1+X\_2^{\prime} \beta\_2+e\right)\right] \ & =\beta\_1+\left\[\mathbb{E}\left\[X\_1 X\_1^{\prime}\right]^{-1} \mathbb{E}\left\[X\_1 X\_2^{\prime}\right] \beta\_2\right]+\left\[\mathbb{E}\left\[X\_1 X\_1^{\prime}\right]^{-1} \mathbb{E}\left\[X\_1 e\right]\right] \end{aligned}
$$

As $$\mathbb{E}\left\[X\_1 e\right]=0$$, and denote $$\Gamma\_{12}$$ as $$\mathbb{E}\left\[X\_1 X\_1^{\prime}\right]^{-1} \mathbb{E}\left\[X\_1 X\_2^{\prime}\right]$$, we can have that:

$$
\gamma\_1 =\beta\_1+\Gamma\_{12} \beta\_2
$$

Note that:

* $$\Gamma\_{12}$$ can be defined as $$X\_2 = X\_1^{\prime} \Gamma\_{12}+e$$, which is the projection of $$X\_2$$ on $$X\_1$$. However,  in the real world, it is very hard to estimate because we usually cannot observe omitted variables.
* As $$X\_1$$ stands for education, and $$X\_2$$ stands for ability or motivation, based on the real world experience, we can easily conclude that:
  * Better ability/motivation are more likely leads to higher wage: $$\beta\_2 > 0$$ is highly likely.
  * Better ability/motivation are more likely leads to higher education: $$\Gamma\_{12} > 0$$ is highly likely.
* Therefore, we have that the bias we got from the omitted variable: $$\Gamma\_{12} \beta\_2$$ is highly likely to be greater than 0. Therefore, in our short regression, we are very likely overestimated the effect of education on wage.

## Gauss-Markov Theorem

### Homoskedastic and Heteroskedastic

Suppose $$E\[U \mid X]=0$$ and that $$\operatorname{Var}\[U \mid X]=\sigma^2$$.

* When $$\operatorname{Var}\[U \mid X]$$ is constant (and therefore does not depend on $$X$$ ) we say that $$U$$ is homoskedastic.&#x20;
* Otherwise, we say that $$U$$ is heteroskedastic.

And there are two ways to check the homoskedastic and heteroskedastic:

1. We can do regression for $$\operatorname{Var}\[U \mid X]$$ on $$X$$, if covariance for $$X$$ is significantly different with 0.
2. We can also plot the value of $$\operatorname{Var}\[U \mid X]$$, and check the plot to see if there is any trend in the plot.

### Gauss-Markov Theorem

**Gauss-Markov Theorem**: under these assumptions **the OLS estimator is "best" in the sense that it has the "smallest" value of** $$\operatorname{Var}\left\[\mathbb{A}^{\prime} \mathbb{Y} \mid X\_1, \ldots, X\_n\right]$$ among all estimators of the form

$$
\mathbb{A}^{\prime} \mathbb{Y}
$$

for some matrix $$\mathbb{A}=\mathbb{A}\left(X\_1, \ldots, X\_n\right)$$ satisfying

$$
E\left\[\mathbb{A}^{\prime} \mathbb{Y}\mid X\_1, \ldots, X\_n\right]=\beta
$$

Note that $$\mathbb{A}^{\prime} \mathbb{Y}$$ is linear in $$\mathbb{Y}$$, so we might be able to get better in variance using the WLS. Gauss-Markove Theorem is just for OLS case.

Denote the variance matrix as $$B$$, we can have that the "smallest" is understood as the partial order obtained by $$B \geq \tilde{B}$$ if $$B-\tilde{B}$$ is positive semi-definite.

Taking above matrix comparison skill in to our theorem, the "best" unbiased linear estimator is obtained by finding the matrix $$\mathbb{A}\_0$$ satisfying $$\mathbb{A}*0^{\prime} \mathbb{X}=\mathbb{I}*{k}$$ such that $$\mathbb{A}\_0^{\prime} \mathbb{A}\_0$$ is minimized in the positive definite sense, which means that for any other matrix $$\mathbb{A}$$ satisfying $$\mathbb{A}^{\prime} \mathbb{X}=\mathbb{I}\_k$$ then $$\mathbb{A}^{\prime} \mathbb{A}-\mathbb{A}\_0^{\prime} \mathbb{A}\_0$$ is positive semi-definite.

This class of estimators includes the OLS estimator as a special case (by setting $$\mathbb{A}^{\prime}=\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} \mathbb{X}^{\prime}$$. The property is sometimes expressed as saying that **OLS estimator is the "best linear unbiased estimator (BLUE)"** **of** $$\beta$$ **under these assumptions.**

The Gauss-Markov theorem provides a lower bound on the variance matrix of unbiased linear estimators under the assumption of homoskedasticity. It says that no unbiased linear estimator can have a variance matrix smaller (in the positive definite sense) than $$\sigma^2\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}$$.

Now we do the following proofs:

* **The estimator is** $$\mathbb{A}^{\prime} Y$$ **for** $$\mathbb{A}=\mathbb{A}\left(X\_1, \ldots, X\_n\right)$$ **and satisfies** $$E\left\[\mathbb{A}^{\prime} \mathbf{Y} \mid X\_1, \ldots, X\_n\right]=\beta$$**.**

**Proof:**

As we have $$Y=X \beta+U$$ $$\Rightarrow$$ $$\mathbb{A}^{\prime} Y=\mathbb{A}^{\prime}(X \beta+U)$$ $$\Rightarrow$$ $$\mathbb{A}^{\prime} Y=\mathbb{A}^{\prime} X \beta+\mathbb{A}^{\prime} U$$

Now we can have that, $$E\left\[\mathbb{A}^{\prime} Y \mid X\_1, \ldots, X\_n\right]=E\left\[\mathbb{A}^{\prime} X \beta+\mathbb{A}^{\prime} U \mid X\_1, \ldots, X\_n\right]$$

As $$\mathbb{A}=\mathbb{A}\left(X\_1, \ldots, X\_n\right)$$ depend on $$X\_1, \ldots, X\_n$$, and $$X \perp U$$, we can have that:

$$E\left\[\mathbb{A}^{\prime} X \beta\mid X\_1, \ldots, X\_n\right]=\mathbb{A}^{\prime} X \beta$$ & $$E\left\[\mathbb{A}^{\prime} U \mid X\_1, \ldots, X\_n\right]=0$$

So we got that $$E\left\[\mathbb{A}^{\prime} Y \mid X\_1, \ldots, X\_n\right]=\mathbb{A}^{\prime} X \beta$$, combine this with $$\mathbb{A}^{\prime} \mathbb{X}=\mathbb{I}$$, we finished the proof that

$$
E\left\[\mathbb{A}^{\prime} \mathbf{Y} \mid \boldsymbol{X}\_1, \ldots, \boldsymbol{X}\_n\right]=\beta
$$

* **Show** $$\mathbb{A}^{\prime} \mathbb{A}-\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}$$ **is positive semi-definite for any** $$\mathbb{A}$$ **satisfying** $$\mathbb{A}^{\prime} \mathbb{X}=\mathbb{I}$$

**Proof:**

We need to show $$\mathbb{A}^{\prime} \mathbb{A}-\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} > 0$$.

Set . Note that $$\boldsymbol{X}^{\prime} \boldsymbol{C}=\mathbf{0}$$. We calculate that

$$
\begin{aligned} \mathbb{A}^{\prime} \mathbb{A}-\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} & =\left(\mathbb{C}+\mathbb{X}\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}\right)^{\prime}\left(\mathbb{C}+\mathbb{X}\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}\right)-\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} \ & =\mathbb{C}^{\prime} \mathbb{C}+\mathbb{C}^{\prime} \mathbb{X}\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}+\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} \mathbb{X}^{\prime} \mathbb{C} \ & +\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} \mathbb{X}^{\prime} \mathbb{X}\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1}-\left(\mathbb{X}^{\prime} \mathbb{X}\right)^{-1} \ & =\mathbb{C}^{\prime} \mathbb{C} \ & >0 \end{aligned}
$$

The final inequality states that the matrix $$\mathbb{C}^{\prime} \mathbb{C}$$ is positive semi-definite which is a property of quadratic form.

## Consistency

$$
\text { Under our three main assumptions, } \hat{\beta} \stackrel{P}{\rightarrow} \beta \text { as } n \rightarrow \infty \text {. }
$$

**Proof:**

Based on the OLS formula, we can have that:

$$
\hat{\beta}=\left(\frac{1}{n} \sum\_{1 \leq i \leq n} X\_i X\_i^{\prime}\right)^{-1}\left(\frac{1}{n} \sum\_{1 \leq i \leq n} X\_i Y\_i\right)
$$

Take $$Y\_i = X\_i^{\prime}\beta + u\_i$$ into our formula, we can have that:

$$
\begin{aligned} \hat{\beta} & =\left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n}\sum\_{i=1}^n X\_i\left(X\_i^{\prime} \beta+u\_i\right) \ & =\left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime} \beta+\left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n}\sum\_{i=1}^n X\_i u\_i \ & =\beta+\left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \left(\frac{1}{n}\sum\_{i=1}^n X\_i u\_i\right) \end{aligned}
$$

Now, we denote $$B\_n = \left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \left(\frac{1}{n}\sum\_{i=1}^n X\_i u\_i\right)$$, then we can have that, as $$X\_i$$s are i.i.d. and based on the LLN:

* $$\left(\frac{1}{n}\sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1}$$ $$\stackrel{P}{\rightarrow}$$ $$\mathbb{E}\left\[X\_i X\_i^{\prime}\right]$$, which is less than $$\infin$$
* $$\frac{1}{n} \sum\_{i=1}^n X\_i u\_i$$ $$\stackrel{P}{\rightarrow}$$ $$\mathbb{E}\left\[\begin{array}{lll}X\_iu\_i\end{array}\right]=0$$

Therefore, we can have that :

$$
\begin{aligned} p \lim \_{n \rightarrow \infty} \hat{\beta} & =\beta+p\lim \_{n \rightarrow \infty} B\_n \ & =\beta+\mathbb{E}\left\[X\_i X\_i^{\prime}\right] \mathbb{E}\left\[X\_i u\_i\right] \ & =\beta+0=\beta \end{aligned}
$$

Now we finished our proof for consistency.

## Asymptotic Normality

We already assume that our model satisfies the three assumptions in the previous analysis:

1. $$E\[X U]=0$$
2. $$E\left\[X X^{\prime}\right]<\infty$$
3. There is no perfect collinearity in $$X$$

Now, we add a forth assumption, which is:

4. $$\operatorname{Var}\[X U]=E\left\[X X^{\prime} U^2\right]<\infty$$

Then, as $$n \rightarrow \infty$$, we have

$$
\sqrt{n}(\hat{\beta}-\beta) \stackrel{d}{\rightarrow} N(0, \mathbb{V}) \text { where } \mathbb{V}=\left(E\left\[X X^{\prime}\right]\right)^{-1} E\left\[X X^{\prime} U^2\right]\left(E\left\[X X^{\prime}\right]\right)^{-1}
$$

As we have calculated before,&#x20;

$$
\begin{aligned} \hat{\beta} & =\left(\frac{1}{n} \sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n} \sum\_{i=1}^n X\_i Y\_i \ & =\left(\frac{1}{n} \sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n} \sum\_{i=1}^n X\_i\left(X\_i \beta+U\_i\right) \ & =\beta+\left(\frac{1}{n} \sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n} \sum\_{i=1}^n X\_i U\_i \end{aligned}
$$

Therefore, we have

$$
\hat{\beta}-\beta=\left(\frac{1}{n} \sum\_{i=1}^n X\_i X\_i^{\prime}\right)^{-1} \frac{1}{n} \sum\_{i=1}^n X\_i U\_i
$$

We have shown that:
