Behind the Scenes of R lm() function: OLS in Matrix Form
The following exercises aim to compare simple linear regression results computed in matrix form with the built in R function lm().
Solution file can be obtained here.
Task 1
Use matrix based OLS approach (do not use R) to fit a simple regression model for the following data:
x | y |
---|---|
2.5 | -8 |
4.5 | 16 |
5 | 40 |
8.2 | 115 |
9.3 | 122 |
a) OLS estimation of \(\beta_0\) and \(\beta_1\).
\(Y = X\hat{\beta}\)
\(Y = \begin{bmatrix} -8 \\ 16 \\ 40 \\ 115 \\ 122 \end{bmatrix}\)
\(X = \begin{bmatrix} 1 & 2.5 \\ 1 & 4.5 \\ 1 & 5 \\ 1 & 8.2 \\ 1 & 9.3 \end{bmatrix}\)
\(X'X = \begin{bmatrix} 5 & 29.5 \\ 29.5 & 205.23 \end{bmatrix}\)
\((X'X)^{-1} = \begin{bmatrix} 1.316 & -0.189 \\ -0.189 & 0.032 \end{bmatrix}\)
\(\hat{\beta} = (X'X)^{-1}X'Y = \begin{bmatrix} -65.636 \\ 20.786 \end{bmatrix}\)
\(\hat{\beta_0} = -65.636\), \(\hat{\beta_1} = 20.786\)
b) SSE
\(SSE = Y'Y - \hat{\beta}X'Y = 30029 - 29716.81 = 312.19\)
c) MSE
\(MSE = SSE/(n-p) = SSE/(5-2) = 312.19/3 = 104.063\)
d) R-squared and adjusted R-squared. Is your model a good fit? Why or why not?
\(SST = \sum_{i=1}^{5} (y_i - \bar{y})^2 = 13784\)
\(R^2 = 1 - (SSE/SST) = 1 - (312.19/13784) = 0.977\)
\(R^2adj = 1 - ((1 - R^2)(n-1)/(n - p)) = 1 - ((1 - 0.977)*4/3) = 0.969\)
The model is a good fit because it explains about 97% of the variation in y.
e) \(\hat{y}\), given \(x = 8.2\). Compare with the observed response in the original data. Why are these two not equal?
\(\hat{y} = -65.636 + 20.786*8.2 = 104.8092\)
The observed response in the original data was 115. These two are not equal because the model does not fit the data perfectly.
Task 2
Compare your results in 1. (a)-(e) with R function lm().
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## 5.672 -11.900 1.707 10.193 -5.672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -65.636 11.715 -5.603 0.01123 *
## x 20.786 1.829 11.368 0.00146 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.21 on 3 degrees of freedom
## Multiple R-squared: 0.9773, Adjusted R-squared: 0.9697
## F-statistic: 129.2 on 1 and 3 DF, p-value: 0.001461
## [1] 312.7489
## [1] 104.2496
## 1
## 104.8072
\(\beta_0 = -65.636\), \(\beta_1 = 20.786\)
\(SSE = 312.7489\), \(MSE = 104.2496\)
\(R^2 = 0.9773\), \(R^2adj = 0.9697\).
Given \(x = 8.2\), \(\hat{y} = 104.8072\).
My results are very similar to R function lm() results and the differences are most likely due to rounding errors.
Task 3
Use R matrix operations (Not lm()) and repeat 1. (a) and (b). Compare your results with lm() function results.
a) OLS estimation of \(\beta_0\) and \(\beta_1\).
X <- matrix(c(rep(1, 5), 2.5, 4.5, 5, 8.2, 9.3), nrow = 5)
Y <- matrix(c(-8, 16, 40, 115, 122), nrow = 5)
beta <- solve(t(X) %*% X) %*% t(X) %*% Y
beta
## [,1]
## [1,] -65.63598
## [2,] 20.78576
\(\beta_0 = -65.636\), \(\beta_1 = 20.786\)
b) SSE
## [,1]
## [1,] 312.7489
\(SSE = 312.7489\).
The results are identical to lm() function results.