- Details
- Parent Category: Programming Assignments' Solutions
We Helped With This R Studio Economics Homework: Have A Similar One?

Category | Economics |
---|---|
Subject | R | R Studio |
Difficulty | Graduate |
Status | Solved |
More Info | Economics Rstudio Homework |
Short Assignment Requirements
Assignment Code
#
install.packages("ISLR")
library(ISLR)
attach(Hitters)
names(Hitters)
dim(Hitters)
sum(is.na(Hitters$Salary))
Hitters=na.omit(Hitters)
dim(Hitters)
sum(is.na(Hitters))
library(glmnet)
grid=10^seq(10,-2,length=100)
x=model.matrix(Salary~.,Hitters)[,-1]
y=Hitters$Salary
set.seed(1)
train=sample(1:nrow(x), nrow(x)/2)
test=(-train)
y.test=y[test]
# The Lasso
lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam=cv.out$lambda.min
bestlam
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
out=glmnet(x,y,alpha=1,lambda=grid)
lasso.coef=predict(out,type="coefficients",s=bestlam)[1:20,]
lasso.coef
lasso.coef[lasso.coef!=0]
# OLS on full model
# OLS on Best Subset Model derived from Training data set with 8 inputs
library(leaps)
Assignment Description
ECO 6380 Prof. Tom Fomby
Predictive Analytics for Economists Spring 2019
EXERCISE 4
Purpose: To learn how to use R to “validate” Principal Component (PC) regressions.
First we will use 10-fold cross-validation to determine the optimal number of Principal Components to use in the PC regression and then apply it to the test data set while collecting the test MSE, test RMSE, and test MAE. Then we will do the same for the full OLS Boston Housing regression. The program to use for this exercise is EX4.R. The data we are going to be using is the Boston Housing data. In the PC model we will be constructing PCs from the standardized versions of all of the input variables except for the indicator variable CHAS. In PC analysis we usually use only numeric inputs not categorical ones. Essentially in this exercise, all you have to do is run the R program and cut and paste output. This exercise is due Tuesday, February 19 on CANVAS.
a) Based upon the entire Boston Housing data set, report the Importance of Principal Components table. (Be sure you take the time to understand the contents of this table.)
b) The percentage of the total variation explained by the first four components is ______________%.
c) Report the Skree plot based on the entire Boston Housing data set. Which row of the Importance of Principal Components table is being used to form the plot? _________.
d) Using the entire Boston Housing data set, report the 10-fold Cross-Validation graph of RMSEP versus Number of Components. What is the meaning of this graph? At what number of components is the majority of the reduction in RMSEP attained.
e) Report the TEST 10-fold Cross-Validation graph of RMSEP versus the Number of Components. Is there much difference in this RMSEP graph and the one we obtained using the entire Boston Housing data set? Explain your answer.
f) Report the following TEST data set numbers for PC4:
C4MSE = _______________.
C4RMSE = ______________.
C4MAE = _______________.
g) Report the following TEST data set numbers for PC5:
C5MSE = _______________.
C5RMSE = ______________.
C5MAE = _______________.
h) Report the following TEST data set numbers for full OLS:
C4MSE = _______________.
C4RMSE = ______________.
C4MAE = _______________.
i) What is your conclusion about the relative merits of the PC regressions and full OLS based on the results of the Validation Data Set experience? Do the results make sense? Explain your answer.