# We Helped With This R Programming Homework: Have A Similar One?

SOLVED

## Short Assignment Requirements

Please see attached Lab Assignment. There are 3 problems in the assignment. I couldn't attach the SpeedTrap and Ozone R data so I can send it with me later.

## Assignment Description

Lab Assignment

Problem 1. The file SpeedTrap.RData is an R data set that contains a data frame called SpeedTrap. This data frame consists of 184 observations (rows) and 7 variables (columns).

Each row corresponds to a town in the Chicago area. The variables are as follows:

 Variable Name Description Res.Stop The number of traffic stops made by police in 2014 where the motorist was from the same town as where the stop was made (resident stops) Res.Ticket The number of resident traffic stops where a ticket was issued Out.Stop The number of traffic stops made by police in 2014 where the motorist was not from the same town as where the stop was made (outsider stops) Out.Ticket The number of outsider traffic stops where a ticket was issued Pop Population of the town (in thousands), 2010 census PPSQMI Number of persons per square mile in the town, 2010 census PPHU Number of persons per housing unit in the town, 2010 census PCI Per capita income of the town (in thousands of dollars), 2010 census

In each community we want to compare the rate of ticketing outsiders who are stopped for a traffic violation to the rate of ticketing residents who are stopped. To do this we will use the odds ratio which is defined as follows:

 θ = π out / (1 − πout ) = odds of outsider being ticketed , π res / (1 − πres ) odds of resident being ticketed

where π out  and πres  are the probabilities of being ticketed for outsiders and resident, respectively.

The odds ratio is used to compare probabilities between two populations. It often is preferred to using the straight difference π out πres in statistical modeling. An odds ratio of 1.0 implies that the two probabilities are equal. An odds ratio greater than 1.0 implies that πout is greater than

π            res . The odds ratio is estimated from the counts of successes and failures in each community by replacing π out and πres with sample estimates.

(a)     Begin by calculating the estimated odds ratio for each community. Append this variable to the data frame (call it OddsRatio). The first three values should match the following:

>  SpeedTrap[1:3,"OddsRatio"]

[1] 1.146857 1.201661 1.264754

(b)   Fit a regression model using OddsRatio as the outcome variable and Pop, PPSQMI, PPHU, and PCI as predictor variables. Using diagnostic plots, describe how well the regression conforms to the assumptions of the normal, linear regression model. [3 pts]

(c)   Identify those communities for which the leverage exceeds three times the average value. Re-run the regression with these communities removed from the data set. Describe how their removal affects the fitted model. [3 pts]

(d)   Re-run the regression in (a), replacing each of the predictors by its logarithm. How does this change affect the presence of observations with high leverage? [2 pts]

(e)   Using log-transformed predictors, find a Box-Cox transformation of the outcome variable that maximizes the likelihood. Re-fit the model with the transformed outcome variable. Does it better conform to the assumptions of the normal, linear regression model than the model that you fit originally? In what respects are the diagnostics still troublesome?

[3 pts]

(f)      Produce and interpret a set of partial regression plots for the model that you fit in (e). Do the predictor variables appear to be treated appropriately in the model? [3 pts]

(g)   Assuming that all necessary assumptions are met with the model that you fit in (e):

i.            Test the null hypothesis that the coefficients on log(PPHU) and log(PPSQMI) are both zero. [2 pts]

ii.            Give a 95% confidence interval for the coefficient on log(PCI). [2 pts]

iii.            Give a 95% prediction interval for the estimated odds ratio in a community that has a population of 25,000; 4000 persons per square mile; 2.8 persons per housing unit, and a per capita income of \$26,000. [2 pts]

(h)   Conduct an outlier analysis on residuals from the regression in (e). Use a family-wide Type I error probability of α = .01. Which communities should be considered for removal from the regression? [3 pts]

Problem 2.

The file Ozone.RData

contains a vector named ozone which has length

n = 111. This vector was obtained from a regression of air quality measurements (ozone) taken on 111 consecutive days in New York City in 1973. Each entry of ozone is either –1 if the

residual is negative or +1 if the residual is positive. We are interested in testing the null hypothesis that the residuals are not serially correlated versus the alternative hypothesis that the residuals are serially correlated. Using the Runs Test, report a p-value and state your conclusion at the .05 test level. [3 pts]

Problem 3. For this problem you will use the prostate data that is available in the faraway package. The outcome variable is lcavol, all other variables are predictors. We want to

determine if a regression model behaves differently for younger (under age 65) subjects than for older (age 65 and over) subjects.

(a)     To do this, introduce a new variable called Young to the data set as a factor that distinguishes younger from older men. Introduce it in a way that separate intercepts and slopes are applied to the two groups of men. Show a summary of your regression. Note: we will accept the validity of all regression assumptions in this exercise. [2 pts]

(b)   Using the model in (a) conduct an F-test to see if you reject the null hypothesis that coefficients associated with Young are all equal to zero. Explain in practical terms what your results mean. [2 pts]

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On