Logistic Regression in R: Exercises and Solutions
In order to solve the tasks you need:
- R Studio
- Data Files
Exercise 1
A study was conducted whereby the type of anesthetic (A or B), nausea after the surgery (Yes or No), the amount of pain medication taken during the recovery period, and age for a random sample of 72 patients undergoing reconstructive knee surgery.
The data is in the file anesthesia
.
Part 1a
Use R to create a two-way table with the type of anesthetic defining the rows and nausea after the surgery as the columns and also produce the output for a chi-square test for independence.
Is there an association between these two categorical variables at a 5% level of significance?
Answer 1a
library(readxl)
anesthesia <- read_excel("anesthesia.xlsx")
tbl <- table(anesthesia$anesthetic, anesthesia$nausea)
tbl
##
## No Yes
## A 13 26
## B 23 10
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tbl
## X-squared = 8.0559, df = 1, p-value = 0.004535
Since p-value = 0.004535 < 0.05, we reject the null hypothesis that the type of anesthetic and nausea after the surgery are independent. There is an association between these two categorical variables at a 5% level of significance.
Part 1b
Obtain the output from R (including the Wald tests for coefficients - so use “summary” function) for the logistic regression model with nausea as the dependent variable and the type of anesthetic as the predictor variable.
Answer 1b
anesthesia$nausea <- factor(anesthesia$nausea)
anesthesia$anesthetic <- factor(anesthesia$anesthetic)
mod <- glm(nausea ~ anesthetic, family = binomial, data = anesthesia)
summary(mod)
##
## Call:
## glm(formula = nausea ~ anesthetic, family = binomial, data = anesthesia)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4823 -0.8497 0.0254 0.9005 1.5453
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.6931 0.3397 2.041 0.04129 *
## anestheticB -1.5261 0.5088 -2.999 0.00271 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 99.813 on 71 degrees of freedom
## Residual deviance: 90.133 on 70 degrees of freedom
## AIC: 94.133
##
## Number of Fisher Scoring iterations: 4
Part 1c
What is the outcome of the hypothesis test that the coefficient of anesthetic is “zero” vs “not zero” at a 5% level of significance? (use the Wald test from the R output from the logistic regression you performed)
Answer 1c
At a 5% level of significance we reject the null hypothesis that the coefficient of anesthetic is “zero”, because p-value = 0.00271 < 0.05.
Part 1d
Convert the estimated coefficient of anesthetic to an odds ratio and interpret it in the context of the problem.
Answer 1d
## anestheticB
## 0.2173913
The odds of having nausea after the surgery if anesthetic B is used are 0.217 that of the odds of having nausea if anesthetic A is used.
Part 1e
Install the package “mosaic” (if you don’t have it installed already), then load it. Use the oddsRatio function to compute the odds ratio for having nausea for anesthetic A vs B. You may have to refer back to Week 8 for details on odds ratios and the oddsRatio function in R.
Part 1f
When logistic regression coefficients are negative, the interpretation sometimes has more impact when we switch the perspective and use the reciprocal of the exponentiated coefficient. Find the odds ratio for having nausea for anesthetic B compared to anesthetic A (i.e. take the reciprocal of the odds ratio you computed in part 1d).
Interpret this odds ratio in the context of the problem.
Answer 1f
## anestheticB
## 4.6
For anesthetic A, the odds of having nausea are 4.6 times larger than the odds for having nausea when anesthetic B is used.
Part 1g
Compute the predicted probability of a reconstructive knee surgery patient having nausea after surgery when anesthetic A was used.
Answer 1g
## 1
## 0.6666667
Part 1h
Compute a 95% confidence interval for the predicted probability of a reconstructive knee surgery patient having nausea after surgery when anesthetic A was used.
Answer 1h
pred <- predict(mod, newdata, type = "link", se.fit = TRUE)
invlink <- mod[["family"]][["linkinv"]]
z <- qnorm(1 - (1-0.95)/2)
low.ci <- pred$fit - z * pred$se.fit
up.ci <- pred$fit + z * pred$se.fit
prediction <- invlink(pred$fit)
low.ci.response <- invlink(low.ci)
up.ci.response <- invlink(up.ci)
data.frame(prediction = prediction, low.ci = low.ci.response, up.ci = up.ci.response)
## prediction low.ci up.ci
## 1 0.6666667 0.5068447 0.7955831
Exercise 2
Continue using the anesthesia
data set to do the following.
Part 2a
Obtain the output from R (including the Wald tests for coefficients - so use “summary” function) for the logistic regression model with nausea as the dependent variable and the amount of pain medication taken as the predictor variable.
At \(\alpha = 0.05\), is there a statistically significant relationship between nausea and the amount of pain medication taken?
Answer 2a
##
## Call:
## glm(formula = nausea ~ painmed, family = binomial, data = anesthesia)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8555 -0.6167 -0.1072 0.8206 1.7894
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.062742 0.764501 -4.006 6.17e-05 ***
## painmed 0.037487 0.008833 4.244 2.20e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 99.813 on 71 degrees of freedom
## Residual deviance: 68.049 on 70 degrees of freedom
## AIC: 72.049
##
## Number of Fisher Scoring iterations: 5
There is a statistically significant relationship between nausea and the amount of pain medication taken as p-value (for the coefficient of painmed
) = 2.20e-05 < 0.05.
Part 2b
Convert the estimated coefficient of painmed to an odds ratio and interpret it in the context of the problem.
Answer 2b
## painmed
## 1.038199
The odds for having nausea multiply by 1.038 for every 1-unit increase in the amount of pain medication taken during the recovery period.
Part 2c
Compute the predicted probabilities of a reconstructive knee surgery patient having nausea in the recovery time after surgery for when 50 units of pain medication are used and also for when 100 units of pain medication are used.
Comment on these two probabilities.
Answer 2c
## 1 2
## 0.2335485 0.6650716
The probability of a patient having nausea when 50 units of pain medication are used is 0.2335, while the probability of a patient having nausea when 100 units of pain medication are used is 0.6651.
It means that the odds of having nausea when 50 units of pain medication are used equals 0.3047, while the odds of having nausea when 100 units of pain medication are used equals 1.986. It shows that the odds for having nausea multiply by 1.038199 for every 1-unit increase in the amount of pain medication taken.