Resampling methods involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model.
estimate the variability of a linear regression fit:
two of the most commonly used resampling methods
It involves randomly dividing the available set of observations into two parts, a training set and a validation set or hold-out set.
determine how well a given statistical learning procedure can be expected to perform on independent data; in this case, the actual estimate of the test MSE is of interest. we are interested only in the location of the minimum point in the estimated test MSE curve.
can be used to estimate the standard errors of the coefficients from a linear regression fit
two financial assets that yield returns of \(X\) and \(Y\)
\(\alpha X+(1-\alpha)Y\), where
\[ \alpha = \frac{\sigma_Y^2-\sigma_{XY}}{\sigma_X^2+\sigma^2_Y-2\sigma_{XY}} \]
we want to estimate
\[ \hat\alpha = \frac{\hat\sigma_Y^2-\hat\sigma_{XY}}{\hat\sigma_X^2+\hat\sigma^2_Y-2\hat\sigma_{XY}} \]
\[ SE_B(\hat\alpha)=\sqrt{\frac{1}{B-1}\sum\limits_{r=1}^B(\alpha^{*r}-\frac{1}{B}\sum\limits_{r'=1}^B\hat\alpha^{*r'})^2} \]
library(ISLR)
set.seed(1)
train = sample(392, 196)
attach(Auto)
## The following objects are masked from Auto (pos = 4):
##
## acceleration, cylinders, displacement, horsepower, mpg, name,
## origin, weight, year
## linear regression
lm.fit = lm(mpg ~ horsepower, data = Auto, subset = train)
mean((mpg - predict(lm.fit, Auto))[-train]^2)
## [1] 26.14142
## polynomial regression
lm.fit2 = lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
mean((mpg-predict(lm.fit2, Auto))[-train]^2)
## [1] 19.82259
## cubic regression
lm.fit3 = lm(mpg ~ poly(horsepower, 3), data = Auto, subset = train)
mean((mpg-predict(lm.fit3, Auto))[-train]^2)
## [1] 19.78252
library(boot)
glm.fit = glm(mpg ~ horsepower, data = Auto)
cv.err = cv.glm(Auto, glm.fit)
cv.err$delta
## [1] 24.23151 24.23114
It would return a vector of length two. The first component is the raw cross-validation estimate of prediction error. The second component is the adjusted cross-validation estimate. The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation.
cv.error = rep(0, 5)
for (i in 1:5){
glm.fit = glm(mpg ~ poly(horsepower, i), data = Auto)
cv.error[i] = cv.glm(Auto, glm.fit)$delta[1]
}
cv.error
## [1] 24.23151 19.24821 19.33498 19.42443 19.03321
set.seed(17)
cv.error.10 = rep(0, 10)
for (i in 1:10){
glm.fit = glm(mpg~poly(horsepower, i), data = Auto)
cv.error.10[i] = cv.glm(Auto, glm.fit, K = 10)$delta[1]
}
cv.error.10
## [1] 24.20520 19.18924 19.30662 19.33799 18.87911 19.02103 18.89609
## [8] 19.71201 18.95140 19.50196
Two steps:
boot()
function, which is part of the boot library, to perform the bootstrap by repeatedly sampling observations from the data set with replacement.alpha.fn = function(data, index){
X = data$X[index]
Y = data$Y[index]
return((var(Y)-cov(X,Y))/(var(X)+var(Y)-2*cov(X,Y)))
}
alpha.fn(Portfolio, 1:100)
## [1] 0.5758321
set.seed(1)
alpha.fn(Portfolio, sample(100, 100, replace = T))
## [1] 0.5963833
boot(Portfolio, alpha.fn, R = 1000)
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Portfolio, statistic = alpha.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.5758321 -7.315422e-05 0.08861826
The bootstrap approach can be used to assess the variability of the coefficient estimates and predictions from a statistical learning method.
boot.fn = function(data, index)
return(coef(lm(mpg~horsepower, data = data, subset = index)))
boot.fn(Auto, 1:392)
## (Intercept) horsepower
## 39.9358610 -0.1578447
set.seed(1)
boot.fn(Auto, sample(392, 392, replace = T))
## (Intercept) horsepower
## 38.7387134 -0.1481952
boot(Auto, boot.fn, 1000)
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Auto, statistic = boot.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 39.9358610 0.0296667441 0.860440524
## t2* -0.1578447 -0.0003113047 0.007411218
summary(lm(mpg~horsepower, data = Auto))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.9358610 0.717498656 55.65984 1.220362e-187
## horsepower -0.1578447 0.006445501 -24.48914 7.031989e-81
boot.fn = function(data, index)
coef(lm(mpg~horsepower+I(horsepower^2), data = data, subset = index))
set.seed(1)
boot(Auto, boot.fn, 1000)
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = Auto, statistic = boot.fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 56.900099702 6.098115e-03 2.0944855842
## t2* -0.466189630 -1.777108e-04 0.0334123802
## t3* 0.001230536 1.324315e-06 0.0001208339
summary(lm(mpg~horsepower+I(horsepower^2), data = Auto))$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.900099702 1.8004268063 31.60367 1.740911e-109
## horsepower -0.466189630 0.0311246171 -14.97816 2.289429e-40
## I(horsepower^2) 0.001230536 0.0001220759 10.08009 2.196340e-21
Copyright © 2016-2019 weiya