- Details
- Parent Category: Programming Assignments' Solutions

# We Helped With This R Studio Economics Homework: Have A Similar One?

Category | Economics |
---|---|

Subject | R | R Studio |

Difficulty | Graduate |

Status | Solved |

More Info | Economics Rstudio Homework |

## Short Assignment Requirements

## Assignment Code

```
#
install.packages("ISLR")
library(ISLR)
attach(Hitters)
names(Hitters)
dim(Hitters)
sum(is.na(Hitters$Salary))
Hitters=na.omit(Hitters)
dim(Hitters)
sum(is.na(Hitters))
library(glmnet)
grid=10^seq(10,-2,length=100)
x=model.matrix(Salary~.,Hitters)[,-1]
y=Hitters$Salary
set.seed(1)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
y.test=y[test]
# The Lasso
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam=cv.out$lambda.min
bestlam
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
out=glmnet(x,y,alpha=1,lambda=grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:20,]
lasso.coef
lasso.coef[lasso.coef!=0]
# OLS on full model
# OLS on Best Subset Model derived from Training data set with 8 inputs
library(leaps)
```

## Assignment Description

ECO 6380 Prof. Tom Fomby

Predictive Analytics for Economists Spring 2019

**EXERCISE 4 **

** **

**Purpose: **To learn how to
use R to “validate” Principal Component (PC) regressions.

First we will use 10-fold
cross-validation to determine the optimal number of Principal Components to use
in the PC regression and then apply it to the test data set while collecting
the test MSE, test RMSE, and test MAE. Then we will do the same for the full
OLS Boston Housing regression. The program to use for this exercise is EX4.R.
The data we are going to be using is the Boston Housing data. In the PC model
we will be constructing PCs from the standardized versions of all of the input
variables **except for the indicator variable CHAS**. In PC analysis we
usually **use only** **numeric inputs not categorical ones**.
Essentially in this exercise, all you have to do is run the R program and cut
and paste output. This exercise is due **Tuesday, February 19** on CANVAS.

a) Based upon the **entire** Boston Housing data set, report the
Importance of Principal Components table. (Be sure you take the time to
understand the contents of this table.)

b) The
percentage of the total variation explained by the **first four components** is ______________%.

c) Report the Skree plot based on the **entire** Boston Housing
data set. Which row of the Importance of Principal Components table is being
used to form the plot? _________.

d) Using
the** entire** Boston Housing data set, report the 10-fold Cross-Validation
graph of RMSEP versus Number of Components. What is the meaning of this
graph? At what number of components is the majority of the reduction in RMSEP
attained.

e) Report the **TEST** 10-fold Cross-Validation graph of RMSEP
versus the Number of Components. Is there much difference in this RMSEP graph
and the one we obtained using the entire Boston Housing data set? Explain your
answer.

f) Report the following **TEST** data set numbers for PC4:

C4MSE = _______________.

C4RMSE = ______________.

C4MAE = _______________.

g) Report
the following **TEST** data set numbers for PC5:

C5MSE = _______________.

C5RMSE = ______________.

C5MAE = _______________.

h) Report
the following **TEST** data set numbers for full OLS:

C4MSE = _______________.

C4RMSE = ______________.

C4MAE = _______________.

i) What is your conclusion about the relative merits of the PC regressions and full OLS based on the results of the Validation Data Set experience? Do the results make sense? Explain your answer.