# Behind the Scenes of R *lm()* function: OLS in Matrix Form

The following exercises aim to compare simple linear regression results computed in matrix form with the built in R function *lm()*.

Solution file can be obtained here.

## Task 1

Use matrix based OLS approach (do not use R) to fit a simple regression model for the following data:

x | y |
---|---|

2.5 | -8 |

4.5 | 16 |

5 | 40 |

8.2 | 115 |

9.3 | 122 |

**a) OLS estimation of \(\beta_0\) and \(\beta_1\).**

\(Y = X\hat{\beta}\)

\(Y = \begin{bmatrix} -8 \\ 16 \\ 40 \\ 115 \\ 122 \end{bmatrix}\)

\(X = \begin{bmatrix} 1 & 2.5 \\ 1 & 4.5 \\ 1 & 5 \\ 1 & 8.2 \\ 1 & 9.3 \end{bmatrix}\)

\(X'X = \begin{bmatrix} 5 & 29.5 \\ 29.5 & 205.23 \end{bmatrix}\)

\((X'X)^{-1} = \begin{bmatrix} 1.316 & -0.189 \\ -0.189 & 0.032 \end{bmatrix}\)

\(\hat{\beta} = (X'X)^{-1}X'Y = \begin{bmatrix} -65.636 \\ 20.786 \end{bmatrix}\)

\(\hat{\beta_0} = -65.636\), \(\hat{\beta_1} = 20.786\)

**b) SSE**

\(SSE = Y'Y - \hat{\beta}X'Y = 30029 - 29716.81 = 312.19\)

**c) MSE**

\(MSE = SSE/(n-p) = SSE/(5-2) = 312.19/3 = 104.063\)

**d) R-squared and adjusted R-squared. Is your model a good fit? Why or why not?**

\(SST = \sum_{i=1}^{5} (y_i - \bar{y})^2 = 13784\)

\(R^2 = 1 - (SSE/SST) = 1 - (312.19/13784) = 0.977\)

\(R^2adj = 1 - ((1 - R^2)(n-1)/(n - p)) = 1 - ((1 - 0.977)*4/3) = 0.969\)

The model is a good fit because it explains about 97% of the variation in y.

**e) \(\hat{y}\), given \(x = 8.2\). Compare with the observed response in the original data. Why are these two not equal?**

\(\hat{y} = -65.636 + 20.786*8.2 = 104.8092\)

The observed response in the original data was 115. These two are not equal because the model does not fit the data perfectly.

## Task 2

Compare your results in 1. (a)-(e) with R function *lm()*.

```
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## 5.672 -11.900 1.707 10.193 -5.672
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -65.636 11.715 -5.603 0.01123 *
## x 20.786 1.829 11.368 0.00146 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.21 on 3 degrees of freedom
## Multiple R-squared: 0.9773, Adjusted R-squared: 0.9697
## F-statistic: 129.2 on 1 and 3 DF, p-value: 0.001461
```

`## [1] 312.7489`

`## [1] 104.2496`

```
## 1
## 104.8072
```

\(\beta_0 = -65.636\), \(\beta_1 = 20.786\)

\(SSE = 312.7489\), \(MSE = 104.2496\)

\(R^2 = 0.9773\), \(R^2adj = 0.9697\).

Given \(x = 8.2\), \(\hat{y} = 104.8072\).

My results are very similar to R function *lm()* results and the differences are most likely due to rounding errors.

## Task 3

Use R matrix operations (Not *lm()*) and repeat 1. (a) and (b). Compare your results with *lm()* function results.

**a) OLS estimation of \(\beta_0\) and \(\beta_1\).**

```
X <- matrix(c(rep(1, 5), 2.5, 4.5, 5, 8.2, 9.3), nrow = 5)
Y <- matrix(c(-8, 16, 40, 115, 122), nrow = 5)
beta <- solve(t(X) %*% X) %*% t(X) %*% Y
beta
```

```
## [,1]
## [1,] -65.63598
## [2,] 20.78576
```

\(\beta_0 = -65.636\), \(\beta_1 = 20.786\)

**b) SSE**

```
## [,1]
## [1,] 312.7489
```

\(SSE = 312.7489\).

The results are identical to *lm()* function results.