Consider the regression model \[
Y = f(X) + \epsilon,
\] where \(\operatorname{E}(\epsilon) = 0\).
1.1 Optimal regression function
Show that the choice \[
f_{\text{opt}}(X) = \operatorname{E}(Y | X)
\] minimizes the mean squared prediction error \[
\operatorname{E}[Y - f(X)]^2,
\] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)
1.2 Bias-variance trade-off
Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[
\operatorname{E}[y_0 - \hat f(x_0)]^2 = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}},
\] where the expectation averages over the variability in \(y_0\) and \(\hat f\).
For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[
R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2.
\]