 Details
 Parent Category: Programming Assignments' Solutions
We Helped With This R Language Economics Assignment: Have A Similar One?
Category  Economics 

Subject  R  R Studio 
Difficulty  Undergraduate 
Status  Solved 
More Info  Economics Rstudio Homework 
Short Assignment Requirements
Assignment Description
Stock and Watson’s Introduction to Econometrics, 3^{rd} Updated Edition
Documentation for Birthweight_Smoking
The datafile Birthweight_Smoking is from the 1989 linked National NatalityMortality Detail files, which contains a census of infant births and deaths. The data in bw_smoking.data are for births in Pennsylvania in 1989.
These data were provided by Porfessor Douglas Almond, Kenneth Chay, and David Lee and are a subset of the data used in their paper “The Costs of Low Birth Weight,” Quarterly Journal of Economics, August 2005, 120(3): 10311083.
The file contains 3,000 observations on the variables described below
 Variable  Description 

 Birthweight and Smoking 
1  birthweight  birth weight of infant (in grams) 
2  smoker  indicator equal to one if the mother smoked during pregnancy and zero, otherwise. 

 Mother’s Attributes 
3  age  age 
4  educ  years of educational attainment (more than 16 years coded as 17) 
5  unmarried  indicator =1 if mother is unmarried 

 This Pregnancy 
6  alcohol  indicator=1 if mother drank alcohol during pregnancy 
7  drinks  number of drinks per week 
8  tripre1  indicator=1 if 1^{st} prenatal care visit in 1^{st} trimester 
9  tripre2  indicator=1 if 1^{st} prenatal care visit in 2^{nd} trimester 
10  tripre3  indicator=1 if 1^{st} prenatal care visit in 2^{nd} trimester 
11  tripre0  indicator=1 if no prenatal visits 
12  nprevist  total number of prenatal visits 
Assignment Description
Stony Brook University Spring 2018
Alejandro Melo Ponce
ASSIGNMENT 4
Due: April 25, 2018, in class
Instructions: Show all your work to get full points. Please also cut and paste at the end of your submission the R code you have used for the last two problems.
1. Multiple Regression Model
The following table reports the results of three regressions of hourly earnings (AHE) on a number of regressors using a survey of full time workers in U.S. in 1998. The highest educational achievement for each worker was either a high school diploma or a bachelor’s degree. Workers’ ages ranged from 25 to 34 years. The data set also contained information on the region of the country where the person lived.
Regressor  .1/  .2/  .3/ 
College .X_{1}/  5:46 .0:21/  5:48 .0:21/  5:44 .0:21/ 
Female .X_{2}/  2:64 .0:20/  2:62 .0:20/  2:62 .0:20/ 
Age .X_{3}/ 
 0:29 .0:04/  0:29 .0:04/ 
Northeast .X_{4}/ 

 0:69 .0:3/ 
Midwest .X_{5}/ 

 0:60 .0:28/ 
South 

 0:27 .0:26/ 
Intercept  12:69 .0:14/  4:40 .1:05/  3:75 .1:06/ 
• AHE = average hourly earnings (in 1998 dollars)
• College = binary variable (1 if college, 0 if high school)
• Female = binary variable (1 if female, 0 if male)
• Age = age (in years)
• Northeast = binary variable (1 if Region = Northeast, 0 otherwise)
Summary Statistics SER  6.27  6.22  6.21 
R2  0.176  0.190  0.194 
2 R n  4000  4000  4000 
• Midwest = binary variable (1 if Region = Midwest, 0 otherwise)
• South = binary variable (1 Region = South, 0 otherwise)
• West = binary variable (1 Region = West, 0 otherwise)
(a) Compute _{R}^{2 }for every regression.
(b) Consider regression .1/. Do workers with college degree earn more, on average, than workers with only high school degree? How much more?
(c) Consider regression .1/. Do men earn more than women on average? How much more?
(d) Consider regression .2/. Is Age an important determinant of earnings? Explain.
(e) Consider regression .2/. John is a 31yearold male without a college degree while Bob is a 24 yearold male with college degree. Predict John’s and Bob’s earnings.
(f) Consider regression .3/. Do there appear to be important regional differences? Why or why not?
(g) Consider regression .3/. Why was West excluded? What would happen if it was included?
2. Data were collected from a random sample of 220 home sales from a community in 2013. Let Price denote the selling price in (in $1000), BDR denote the number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in square feet), lsize denote the lot size (in square feet), Age denote the age of the house (in years), and Poor denote a binary variable that is equal to 1 if the condition of the house is reported as “poor”. An Estimated regression yields
Priceb D119:2.23:9/ C 0:485.2:61/BDR C .8:9423:4/Bath C .0:0110:156/Hsize C .0:000480:002 /Lsize C .0:3110:090/Age C .10:548:8/Poor; RN2 D 0:72; SER D 41:5:
(a) Suppose that a homeowner converts part of an existing family room in her house into a new bathroom. What is the expected increase in the value of the house? Suppose that a homeowner adds a new bathroom to her house, which increases the size of the house by 100 square feet. What is the expected increase in the value of the house?
(b) What is the loss in value if a homeowner lets his house run down so that its condition becomes “poor”?
(c) Compute the R^{2 }for the regression.
(d) Is the coefficient on BDR statistically significantly different from zero?
(e) Typically fivebedroom houses sell for much more than twobedroom houses. Is this consistent with your answer to (a) and with the regression more generally?
(f) A homeowner purchases 2000 square feet from an adjacent lot. Construct a 99% confidence interval for the change in the value of her house.
(g) Lot size is measured in square feet. Do you think that another scale might be more appropriate? Why or why not?
(h) Theon BDRF statistic for omittingand Age statistically different from zero at the 10% level?BDR and Age from the regression is F D 0:08. Are the coefficients
3. In a study relating college grade point average to time spent in various activities, you distribute a survey to several students. The students are asked how many hours they spend each week in four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student, the sum of hours in the four activities must be 168.
(a) In the model
GPA D ˇ0 C ˇ1study C ˇ2sleep C ˇ3work C ˇ4leisure C u;
doest it make sense to hold sleep, work and leisure fixed, while changing study?
(b) Explain why this model violates the no perfect multicollinearity assumption.
(c) How could you reformulate the model so that its parameters have a useful interpretation and it satisfies the no perfect multicollinearity assumption?
4. Consider the multiple regression model containing three independent variables, under the four assumptions discussed in class:
y D ˇ0 C ˇ1x1 C ˇ2x2 C ˇ3x3 C u:
You are interested in estimating the sum of the parameters on x_{1 }and x_{2}; call this _{1 }D ˇ_{1 }C ˇ_{2}:
(a) Show that O1 D ˇO1 C ˇO2 is an unbiased estimator of _{1}.
(b) Find Var.O1/ in terms of Var.ˇO1/, Var.ˇO2/, and Cov.ˇO1;ˇO2/. Hint: use the formula for the variance of the sum of two random variables. (c) The expression for Var.O1/ that you found in the previous item is the theoretical variance of O1. How would you estimate the standard error, i.e. SE.O1/.
5. Consider the multiple regression model with three independent variables, under the four OLS assumptions.
y D ˇ0 C ˇ1x1 C ˇ2x2 C ˇ3x3 C u:
You would like to test the null hypothesis H_{0 }W ˇ_{1 }3ˇ_{2 }D 1: (a) You can follow the following procedure: Let ˇO1 and ˇO2 denote the OLS estimators of ˇ_{1 }and ˇ_{2}.
Find Var.ˇO1 3ˇO2/ in terms of the variances of ˇO1 and ˇO2 and the covariance between them. What is the standard error of ˇO1 3ˇO2? Now, using the standard error that you found, write the tstatistic for testing H_{0 }W ˇ_{1 }3ˇ_{2 }D 1:
(b) Define _{1 }D ˇ_{1 }3ˇ_{2 }and O1 D ˇO1 3ˇO2: Write a regression equation involving ˇ_{0};_{1};ˇ_{2 }and ˇ_{3 }that allows you to directly obtain O1 and its standard error.
6. Consider the homoskedasticity only F statistic
.R2
F D .1 Runrestricted2 unrestricted/=.nRrestricted2kunrestricted/=q 1/;
and show that it can also be written as
.SSRrestricted SSRunrestricted/=q
F D SSRunrestricted=.n kunrestricted 1/
7. Empirical exercise. Multiple Linear Regression.
For this exercise, use the Birthweight_Smoking data set posted on BlackBoard, which contains data for a random sample of babies born in Pennsylvania in 1989. The data include the baby’s birth weight together with various characteristics of the mother, including whether she smoked during the pregnancy. You can find a detailed description of the dataset in the file Birthweight_Smoking_Description. Use this dataset to answer the following questions.
(a) Regress Birthweight on Smoker. What is the estimated effect of smoking on birth weight?
(b) Regress Birthweight on Smoker, Alcohol, and Nprevist.
i. Explain why the exclusion of Alcohol and Nprevist could lead to ommited variable bias in the regression estimated in (a). ii. Is the estimated effect of smoking on birth weight substantially different from the regression that excludes Alcohol and Nprevist? Does the regression in (a) seem to suffer from omitted variable bias?
iii. Jane smoked during her pregnancy, did not drink alcohol, and had 8 prenatal care visits. use the regression to predict the birth weight of Jane’s child.
iv. Compute R and RN^{2}. Why are they so similar?
(c) An alternative way to control for prenatal visits is to use the binary variables Tripre0, through Tripre3. Regress Birthweight on Smoker, Alcohol, Tripre0, Tripre2 and Tripre3.
i. Why is Tripre1 excluded from the regression? What would happen if you included it in the regression?
ii. The estimated coefficient on Tripre0 is large and negative. What does this coefficient measure? Interpret its value.
iii. Interpret the value of the estimated coefficients on Tripe2 and Tripe3. iv. Does the regression in (c) explain a larger fraction of the variance in birth weight than the regression in (b)?
8. Empirical exercise. Use the data in HPRICE1 for this exercise.
(a) Estimate the model
price D ˇ0 C ˇ1lotsize C ˇ2sqrft C ˇ3bdrms C ˇ4colonial C u;
where lotsize is size of lot in square feet, sqrft is size of house in square feet, bdrms is number of bedrooms and colonial is a dummy variable that equals 1 if the home is colonial style. Report the results of your estimations in the usual form, including the standard error of the regression.
(b) colonialObtain a predicted price, when we plug inD 1; round this price to the nearest dollar.lotsize D 10000; sqrft D 2300; bdrms D 4 and
(c) Run a regression that allows you to put a 95% confidence interval around the predicted value in
(b).
(d) Suppose that house A has characteristics as in (b). Now consider another house, B which has lotsize D 10000; sqrft D 2400; bdrms D 5 and colonial D 0. Let POA and POB be the predicted price confidence interval for this change in price. Do this by transforming the regression, similarly asb A B of A and B respectively. Let P D PO PO denote the expected change in price. Compute a 95% you did in items 4–5, to get the standard error.