Econ 425T Homework 1

Due Jan 25, 2023 @ 11:59PM

Author

YOUR NAME and UID

Published

January 19, 2023

1 Filling gaps in lecture notes (10pts)

Consider the regression model $Y = f (X) + ϵ,$ where $E (ϵ) = 0$ .

1.1 Optimal regression function

Show that the choice $f_{opt} (X) = E (Y | X)$ minimizes the mean squared prediction error $E [Y - f (X)]^{2},$ where the expectations averages over variations in both $X$ and $Y$ . (Hint: condition on $X$ .)

1.2 Bias-variance trade-off

Given an estimate $\hat{f}$ of $f$ , show that the test error at a $x_{0}$ can be decomposed as $E [y_{0} - \hat{f} (x_{0})]^{2} = \underset{MSE of \hat{f} (x_{0}) for estimating f (x_{0})}{\underset{⏟}{Var (\hat{f} (x_{0})) + [Bias (\hat{f} (x_{0}))]^{2}}} + \underset{irreducible}{\underset{⏟}{Var (ϵ)}},$ where the expectation averages over the variability in $y_{0}$ and $\hat{f}$ .

2 ISL Exercise 2.4.3 (10pts)

3 ISL Exercise 2.4.4 (10pts)

4 ISL Exercise 2.4.10 (30pts)

Your can read in the boston data set directly from url https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv. A documentation of the boston data set is here.

import pandas as pd
import io
import requests

url = "https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv"
s = requests.get(url).content
Boston = pd.read_csv(io.StringIO(s.decode('utf-8')), index_col = 0)
Boston

        crim    zn  indus  chas    nox  ...  rad  tax  ptratio  lstat  medv
1    0.00632  18.0   2.31     0  0.538  ...    1  296     15.3   4.98  24.0
2    0.02731   0.0   7.07     0  0.469  ...    2  242     17.8   9.14  21.6
3    0.02729   0.0   7.07     0  0.469  ...    2  242     17.8   4.03  34.7
4    0.03237   0.0   2.18     0  0.458  ...    3  222     18.7   2.94  33.4
5    0.06905   0.0   2.18     0  0.458  ...    3  222     18.7   5.33  36.2
..       ...   ...    ...   ...    ...  ...  ...  ...      ...    ...   ...
502  0.06263   0.0  11.93     0  0.573  ...    1  273     21.0   9.67  22.4
503  0.04527   0.0  11.93     0  0.573  ...    1  273     21.0   9.08  20.6
504  0.06076   0.0  11.93     0  0.573  ...    1  273     21.0   5.64  23.9
505  0.10959   0.0  11.93     0  0.573  ...    1  273     21.0   6.48  22.0
506  0.04741   0.0  11.93     0  0.573  ...    1  273     21.0   7.88  11.9

[506 rows x 13 columns]

library(tidyverse)

Boston <- read_csv("https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv", col_select = -1) %>% 
  print(width = Inf)

# A tibble: 506 × 13
      crim    zn indus  chas   nox    rm   age   dis   rad   tax ptratio lstat
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl>
 1 0.00632  18    2.31     0 0.538  6.58  65.2  4.09     1   296    15.3  4.98
 2 0.0273    0    7.07     0 0.469  6.42  78.9  4.97     2   242    17.8  9.14
 3 0.0273    0    7.07     0 0.469  7.18  61.1  4.97     2   242    17.8  4.03
 4 0.0324    0    2.18     0 0.458  7.00  45.8  6.06     3   222    18.7  2.94
 5 0.0690    0    2.18     0 0.458  7.15  54.2  6.06     3   222    18.7  5.33
 6 0.0298    0    2.18     0 0.458  6.43  58.7  6.06     3   222    18.7  5.21
 7 0.0883   12.5  7.87     0 0.524  6.01  66.6  5.56     5   311    15.2 12.4 
 8 0.145    12.5  7.87     0 0.524  6.17  96.1  5.95     5   311    15.2 19.2 
 9 0.211    12.5  7.87     0 0.524  5.63 100    6.08     5   311    15.2 29.9 
10 0.170    12.5  7.87     0 0.524  6.00  85.9  6.59     5   311    15.2 17.1 
    medv
   <dbl>
 1  24  
 2  21.6
 3  34.7
 4  33.4
 5  36.2
 6  28.7
 7  22.9
 8  27.1
 9  16.5
10  18.9
# … with 496 more rows

5 ISL Exercise 3.7.3 (12pts)

6 ISL Exercise 3.7.15 (20pts)

7 Bonus question (20pts)

For multiple linear regression, show that $R^{2}$ is equal to the correlation between the response vector $y = (y_{1}, \dots, y_{n})^{T}$ and the fitted values $\hat{y} = ({\hat{y}}_{1}, \dots, {\hat{y}}_{n})^{T}$ . That is $R^{2} = 1 - \frac{RSS}{TSS} = [Cor (y, \hat{y})]^{2} .$