# We Helped With This R Language Economics Assignment: Have A Similar One?

SOLVED
Category Economics R | R Studio Undergraduate Solved Economics Rstudio Homework

## Short Assignment Requirements

Here is a problem set about Econometrics. 8 problems in total. Could you help solve them by using statistic formula and R studio code?

## Assignment Description

Stock and Watson’s Introduction to Econometrics, 3rd Updated Edition

Documentation for Birthweight_Smoking

The datafile Birthweight_Smoking is from the 1989 linked National Natality-Mortality Detail files, which contains a census of infant births and deaths.  The data in bw_smoking.data are for births in Pennsylvania in 1989.

These data were provided by Porfessor Douglas Almond, Kenneth Chay, and David Lee and are a subset of the data used in their paper “The Costs of Low Birth Weight,” Quarterly Journal of Economics, August 2005, 120(3): 1031-1083.

The file contains 3,000 observations on the variables described below

 Variable Description Birthweight and Smoking 1 birthweight birth weight of infant (in grams) 2 smoker indicator equal to one if the mother smoked during pregnancy and zero, otherwise. Mother’s Attributes 3 age age 4 educ years of educational attainment (more than 16 years coded as 17) 5 unmarried indicator =1 if mother is unmarried This Pregnancy 6 alcohol indicator=1 if mother drank alcohol during pregnancy 7 drinks number of drinks per week 8 tripre1 indicator=1 if 1st  prenatal care visit in 1st trimester 9 tripre2 indicator=1 if 1st  prenatal care visit in 2nd trimester 10 tripre3 indicator=1 if 1st  prenatal care visit in 2nd trimester 11 tripre0 indicator=1 if no prenatal visits 12 nprevist total number of prenatal visits

## Assignment Description

Stony Brook University                                                                                                    Spring 2018

Alejandro Melo Ponce

ASSIGNMENT 4

Due: April 25, 2018, in class

Instructions: Show all your work to get full points. Please also cut and paste at the end of your submission the R code you have used for the last two problems.

1. Multiple Regression Model

The following table reports the results of three regressions of hourly earnings (AHE) on a number of regressors using a survey of full time workers in U.S. in 1998. The highest educational achievement for each worker was either a high school diploma or a bachelor’s degree. Workers’ ages ranged from 25 to 34 years. The data set also contained information on the region of the country where the person lived.

 Regressor .1/ .2/ .3/ College .X1/ 5:46.0:21/ 5:48.0:21/ 5:44.0:21/ Female .X2/ 2:64.0:20/ 2:62.0:20/ 2:62.0:20/ Age .X3/ 0:29.0:04/ 0:29.0:04/ Northeast .X4/ 0:69.0:3/ Midwest .X5/ 0:60.0:28/ South 0:27.0:26/ Intercept 12:69.0:14/ 4:40.1:05/ 3:75.1:06/

AHE = average hourly earnings (in 1998 dollars)

College = binary variable (1 if college, 0 if high school)

Female = binary variable (1 if female, 0 if male)

Age = age (in years)

Northeast = binary variable (1 if Region = Northeast, 0 otherwise)

 Summary StatisticsSER 6.27 6.22 6.21 R2 0.176 0.19 0.194 2Rn 4000 4000 4000

Midwest = binary variable (1 if Region = Midwest, 0 otherwise)

South = binary variable (1 Region = South, 0 otherwise)

West = binary variable (1 Region = West, 0 otherwise)

(a)    Compute R2 for every regression.

(b)   Consider regression .1/. Do workers with college degree earn more, on average, than workers with only high school degree? How much more?

(c)    Consider regression .1/. Do men earn more than women on average? How much more?

(d)   Consider regression .2/. Is Age an important determinant of earnings? Explain.

(e)    Consider regression .2/. John is a 31-year-old male without a college degree while Bob is a 24 year-old male with college degree. Predict John’s and Bob’s earnings.

(f)     Consider regression .3/. Do there appear to be important regional differences? Why or why not?

(g)   Consider regression .3/. Why was West excluded? What would happen if it was included?

2. Data were collected from a random sample of 220 home sales from a community in 2013. Let Price denote the selling price in (in \$1000), BDR denote the number of bedrooms, Bath denote the number of bathrooms, Hsize denote the size of the house (in square feet), lsize denote the lot size (in square feet), Age denote the age of the house (in years), and Poor denote a binary variable that is equal to 1 if the condition of the house is reported as “poor”. An Estimated regression yields

Priceb D119:2.23:9/ C 0:485.2:61/BDR C .8:9423:4/Bath C .0:0110:156/Hsize C .0:000480:002 /Lsize C .0:3110:090/Age C .10:548:8/Poor; RN2 D 0:72; SER D 41:5:

(a)    Suppose that a homeowner converts part of an existing family room in her house into a new bathroom. What is the expected increase in the value of the house? Suppose that a homeowner adds a new bathroom to her house, which increases the size of the house by 100 square feet. What is the expected increase in the value of the house?

(b)    What is the loss in value if a homeowner lets his house run down so that its condition becomes “poor”?

(c)    Compute the R2 for the regression.

(d)    Is the coefficient on BDR statistically significantly different from zero?

(e)    Typically five-bedroom houses sell for much more than two-bedroom houses. Is this consistent with your answer to (a) and with the regression more generally?

(f)     A homeowner purchases 2000 square feet from an adjacent lot. Construct a 99% confidence interval for the change in the value of her house.

(g)    Lot size is measured in square feet. Do you think that another scale might be more appropriate? Why or why not?

(h)    Theon BDRF -statistic for omittingand Age statistically different from zero at the 10% level?BDR and Age from the regression is F D 0:08. Are the coefficients

3. In a study relating college grade point average to time spent in various activities, you distribute a survey to several students. The students are asked how many hours they spend each week in four activities: studying, sleeping, working, and leisure. Any activity is put into one of the four categories, so that for each student, the sum of hours in the four activities must be 168.

(a)    In the model

GPA D ˇ0 C ˇ1study C ˇ2sleep C ˇ3work C ˇ4leisure C u;

doest it make sense to hold sleep, work and leisure fixed, while changing study?

(b)   Explain why this model violates the no perfect multicollinearity assumption.

(c)    How could you reformulate the model so that its parameters have a useful interpretation and it satisfies the no perfect multicollinearity assumption?

4. Consider the multiple regression model containing three independent variables, under the four assumptions discussed in class:

y D ˇ0 C ˇ1x1 C ˇ2x2 C ˇ3x3 C u:

You are interested in estimating the sum of the parameters on x1 and x2; call this 1 D ˇ1 C ˇ2:

(a)    Show that O1 D ˇO1 C ˇO2 is an unbiased estimator of 1.

(b)   Find Var.O1/ in terms of Var.ˇO1/, Var.ˇO2/, and Cov.ˇO1;ˇO2/. Hint: use the formula for the variance of the sum of two random variables. (c) The expression for Var.O1/ that you found in the previous item is the theoretical variance of O1. How would you estimate the standard error, i.e. SE.O1/.

5. Consider the multiple regression model with three independent variables, under the four OLS assumptions.

y D ˇ0 C ˇ1x1 C ˇ2x2 C ˇ3x3 C u:

You would like to test the null hypothesis H0 W ˇ1               2 D 1: (a) You can follow the following procedure: Let ˇO1 and ˇO2 denote the OLS estimators of ˇ1 and ˇ2.

Find Var.ˇO1 3ˇO2/ in terms of the variances of ˇO1 and ˇO2 and the covariance between them. What is the standard error of ˇO1 3ˇO2? Now, using the standard error that you found, write the t-statistic for testing H0 W ˇ1 2 D 1:

(b) Define 1 D ˇ1 2 and O1 D ˇO1 3ˇO2: Write a regression equation involving ˇ0;12 and ˇ3 that allows you to directly obtain O1 and its standard error.

6.   Consider the homoskedasticity only F -statistic

.R2

F D .1 Runrestricted2 unrestricted/=.nRrestricted2kunrestricted/=q     1/;

and show that it can also be written as

.SSRrestricted       SSRunrestricted/=q

F D SSRunrestricted=.n      kunrestricted      1/

7.   Empirical exercise. Multiple Linear Regression.

For this exercise, use the Birthweight_Smoking data set posted on BlackBoard, which contains data for a random sample of babies born in Pennsylvania in 1989. The data include the baby’s birth weight together with various characteristics of the mother, including whether she smoked during the pregnancy. You can find a detailed description of the dataset in the file Birthweight_Smoking_Description. Use this dataset to answer the following questions.

(a)    Regress Birthweight on Smoker. What is the estimated effect of smoking on birth weight?

(b)   Regress Birthweight on Smoker, Alcohol, and Nprevist.

i.      Explain why the exclusion of Alcohol and Nprevist could lead to ommited variable bias in the regression estimated in (a). ii. Is the estimated effect of smoking on birth weight substantially different from the regression that excludes Alcohol and Nprevist? Does the regression in (a) seem to suffer from omitted variable bias?

iii.   Jane smoked during her pregnancy, did not drink alcohol, and had 8 prenatal care visits. use the regression to predict the birth weight of Jane’s child.

iv.    Compute R and RN2. Why are they so similar?

(c)    An alternative way to control for prenatal visits is to use the binary variables Tripre0, through Tripre3. Regress Birthweight on Smoker, Alcohol, Tripre0, Tripre2 and Tripre3.

i.      Why is Tripre1 excluded from the regression? What would happen if you included it in the regression?

ii.    The estimated coefficient on Tripre0 is large and negative. What does this coefficient measure? Interpret its value.

iii. Interpret the value of the estimated coefficients on Tripe2 and Tripe3. iv. Does the regression in (c) explain a larger fraction of the variance in birth weight than the regression in (b)?

8. Empirical exercise. Use the data in HPRICE1 for this exercise.

(a)    Estimate the model

price D ˇ0 C ˇ1lotsize C ˇ2sqrft C ˇ3bdrms C ˇ4colonial C u;

where lotsize is size of lot in square feet, sqrft is size of house in square feet, bdrms is number of bedrooms and colonial is a dummy variable that equals 1 if the home is colonial style. Report the results of your estimations in the usual form, including the standard error of the regression.

(b)   colonialObtain a predicted price, when we plug inD 1; round this price to the nearest dollar.lotsize D 10000; sqrft D 2300; bdrms D 4 and

(c)    Run a regression that allows you to put a 95% confidence interval around the predicted value in

(b).

(d)   Suppose that house A has characteristics as in (b). Now consider another house, B which has lotsize D 10000; sqrft D 2400; bdrms D 5 and colonial D 0. Let POA and POB be the predicted price confidence interval for this change in price. Do this by transforming the regression, similarly asb A B of A and B respectively. Let P D PO PO denote the expected change in price. Compute a 95% you did in items 4–5, to get the standard error.

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On