Best Predictor
Given a random vector X, we want to forecast Y, Let g(X) be a predictor of Y. For Prediction Error, it is defined as Y−g(X), and this prediction error can be treated as a random variable, and it can take positive and negative values. To minimize this prediction error, we define the the mean squared error (MSE) of predictor g(X) as E[(Y−g(X))2]. We can have that the CEF m(x)=E(Y∣X=x) is the best predictor, which has the smallest mean squared prediction error. Which means if we have E(Y2)<∞, then for any predictor g(X), we have:
E[(Y−g(X))2]≥E[(Y−m(X))2]
Proof:
E[u2]=E[(Y−g(X))2]=E[(Y−m(X)+m(X)−g(X))2]
=E[(Y−m(X))2]+E[(m(X)−g(X))2]+2E[(Y−m(X))(m(X)−g(X))]
≥E[(Y−m(X))2]
since: E[(Y−m(X))(m(X)−g(X))]=E[E[(Y−m(X))(m(X)−g(X))∣X]]
as under condition X, m(X)−g(X) is no longer a random variable, by the definition of m(X)
E[E[(Y−m(X))(m(X)−g(X))∣X]]=E[(m(X)−g(X))E[(Y−m(X))∣X]]=0
So above inequality becomes equality when m(X)=g(X), therefore, m(X) is the smallest.