Filling gaps in lecture notes (10pts)
Consider the regression model where .
Optimal regression function
Show that the choice minimizes the mean squared prediction error where the expectations averages over variations in both and . (Hint: condition on .)
Bias-variance trade-off
Given an estimate of , show that the test error at a can be decomposed as where the expectation averages over the variability in and .
ISL Exercise 2.4.3 (10pts)
ISL Exercise 2.4.4 (10pts)
ISL Exercise 2.4.10 (30pts)
Your can read in the boston
data set directly from url https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv. A documentation of the boston
data set is here.
import pandas as pd
import io
import requests
url = "https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv"
s = requests.get(url).content
Boston = pd.read_csv(io.StringIO(s.decode('utf-8')), index_col = 0)
Boston
crim zn indus chas nox ... rad tax ptratio lstat medv
1 0.00632 18.0 2.31 0 0.538 ... 1 296 15.3 4.98 24.0
2 0.02731 0.0 7.07 0 0.469 ... 2 242 17.8 9.14 21.6
3 0.02729 0.0 7.07 0 0.469 ... 2 242 17.8 4.03 34.7
4 0.03237 0.0 2.18 0 0.458 ... 3 222 18.7 2.94 33.4
5 0.06905 0.0 2.18 0 0.458 ... 3 222 18.7 5.33 36.2
.. ... ... ... ... ... ... ... ... ... ... ...
502 0.06263 0.0 11.93 0 0.573 ... 1 273 21.0 9.67 22.4
503 0.04527 0.0 11.93 0 0.573 ... 1 273 21.0 9.08 20.6
504 0.06076 0.0 11.93 0 0.573 ... 1 273 21.0 5.64 23.9
505 0.10959 0.0 11.93 0 0.573 ... 1 273 21.0 6.48 22.0
506 0.04741 0.0 11.93 0 0.573 ... 1 273 21.0 7.88 11.9
[506 rows x 13 columns]
library(tidyverse)
Boston <- read_csv("https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/data/Boston.csv", col_select = -1) %>%
print(width = Inf)
# A tibble: 506 × 13
crim zn indus chas nox rm age dis rad tax ptratio lstat
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00632 18 2.31 0 0.538 6.58 65.2 4.09 1 296 15.3 4.98
2 0.0273 0 7.07 0 0.469 6.42 78.9 4.97 2 242 17.8 9.14
3 0.0273 0 7.07 0 0.469 7.18 61.1 4.97 2 242 17.8 4.03
4 0.0324 0 2.18 0 0.458 7.00 45.8 6.06 3 222 18.7 2.94
5 0.0690 0 2.18 0 0.458 7.15 54.2 6.06 3 222 18.7 5.33
6 0.0298 0 2.18 0 0.458 6.43 58.7 6.06 3 222 18.7 5.21
7 0.0883 12.5 7.87 0 0.524 6.01 66.6 5.56 5 311 15.2 12.4
8 0.145 12.5 7.87 0 0.524 6.17 96.1 5.95 5 311 15.2 19.2
9 0.211 12.5 7.87 0 0.524 5.63 100 6.08 5 311 15.2 29.9
10 0.170 12.5 7.87 0 0.524 6.00 85.9 6.59 5 311 15.2 17.1
medv
<dbl>
1 24
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
7 22.9
8 27.1
9 16.5
10 18.9
# … with 496 more rows
ISL Exercise 3.7.3 (12pts)
ISL Exercise 3.7.15 (20pts)
Bonus question (20pts)
For multiple linear regression, show that is equal to the correlation between the response vector and the fitted values . That is