# Logistic Regression in R: Exercises and Solutions

In order to solve the tasks you need:

## Exercise 1

A study was conducted whereby the type of anesthetic (A or B), nausea after the surgery (Yes or No), the amount of pain medication taken during the recovery period, and age for a random sample of 72 patients undergoing reconstructive knee surgery.

The data is in the file anesthesia.

### Part 1a

Use R to create a two-way table with the type of anesthetic defining the rows and nausea after the surgery as the columns and also produce the output for a chi-square test for independence.

Is there an association between these two categorical variables at a 5% level of significance?

### Answer 1a

library(readxl)
anesthesia <- read_excel("anesthesia.xlsx")
tbl <- table(anesthesia$anesthetic, anesthesia$nausea)
tbl
##
##     No Yes
##   A 13  26
##   B 23  10
chisq.test(tbl)
##
##  Pearson's Chi-squared test with Yates' continuity correction
##
## data:  tbl
## X-squared = 8.0559, df = 1, p-value = 0.004535

Since p-value = 0.004535 < 0.05, we reject the null hypothesis that the type of anesthetic and nausea after the surgery are independent. There is an association between these two categorical variables at a 5% level of significance.

### Part 1b

Obtain the output from R (including the Wald tests for coefficients - so use “summary” function) for the logistic regression model with nausea as the dependent variable and the type of anesthetic as the predictor variable.

### Answer 1b

anesthesia$nausea <- factor(anesthesia$nausea)
anesthesia$anesthetic <- factor(anesthesia$anesthetic)
mod <- glm(nausea ~ anesthetic, family = binomial, data = anesthesia)
summary(mod)
##
## Call:
## glm(formula = nausea ~ anesthetic, family = binomial, data = anesthesia)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.4823  -0.8497   0.0254   0.9005   1.5453
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.6931     0.3397   2.041  0.04129 *
## anestheticB  -1.5261     0.5088  -2.999  0.00271 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 99.813  on 71  degrees of freedom
## Residual deviance: 90.133  on 70  degrees of freedom
## AIC: 94.133
##
## Number of Fisher Scoring iterations: 4

### Part 1c

What is the outcome of the hypothesis test that the coefficient of anesthetic is “zero” vs “not zero” at a 5% level of significance? (use the Wald test from the R output from the logistic regression you performed)

### Answer 1c

At a 5% level of significance we reject the null hypothesis that the coefficient of anesthetic is “zero”, because p-value = 0.00271 < 0.05.

### Part 1d

Convert the estimated coefficient of anesthetic to an odds ratio and interpret it in the context of the problem.

### Answer 1d

orB <- exp(coef(mod)[2])
orB
## anestheticB
##   0.2173913

The odds of having nausea after the surgery if anesthetic B is used are 0.217 that of the odds of having nausea if anesthetic A is used.

### Part 1e

Install the package “mosaic” (if you don’t have it installed already), then load it. Use the oddsRatio function to compute the odds ratio for having nausea for anesthetic A vs B. You may have to refer back to Week 8 for details on odds ratios and the oddsRatio function in R.

### Answer 1e

library(mosaic)
oddsRatio(tbl)
## [1] 4.6

### Part 1f

When logistic regression coefficients are negative, the interpretation sometimes has more impact when we switch the perspective and use the reciprocal of the exponentiated coefficient. Find the odds ratio for having nausea for anesthetic B compared to anesthetic A (i.e. take the reciprocal of the odds ratio you computed in part 1d).

Interpret this odds ratio in the context of the problem.

### Answer 1f

1/orB
## anestheticB
##         4.6

For anesthetic A, the odds of having nausea are 4.6 times larger than the odds for having nausea when anesthetic B is used.

### Part 1g

Compute the predicted probability of a reconstructive knee surgery patient having nausea after surgery when anesthetic A was used.

### Answer 1g

newdata <- data.frame(anesthetic = "A")
predict(mod, newdata, type = "response")
##         1
## 0.6666667

### Part 1h

Compute a 95% confidence interval for the predicted probability of a reconstructive knee surgery patient having nausea after surgery when anesthetic A was used.

### Answer 1h

pred <- predict(mod, newdata, type = "link", se.fit = TRUE)
invlink <- mod[["family"]][["linkinv"]]
z <- qnorm(1 - (1-0.95)/2)
low.ci <- pred$fit - z * pred$se.fit
up.ci <- pred$fit + z * pred$se.fit

prediction <- invlink(pred\$fit)
low.ci.response <- invlink(low.ci)
up.ci.response <- invlink(up.ci)

data.frame(prediction = prediction, low.ci = low.ci.response, up.ci = up.ci.response)
##   prediction    low.ci     up.ci
## 1  0.6666667 0.5068447 0.7955831

## Exercise 2

Continue using the anesthesia data set to do the following.

### Part 2a

Obtain the output from R (including the Wald tests for coefficients - so use “summary” function) for the logistic regression model with nausea as the dependent variable and the amount of pain medication taken as the predictor variable.

At $$\alpha = 0.05$$, is there a statistically significant relationship between nausea and the amount of pain medication taken?

### Answer 2a

mod2 <- glm(nausea ~ painmed, family = binomial, data = anesthesia)
summary(mod2)
##
## Call:
## glm(formula = nausea ~ painmed, family = binomial, data = anesthesia)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -2.8555  -0.6167  -0.1072   0.8206   1.7894
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.062742   0.764501  -4.006 6.17e-05 ***
## painmed      0.037487   0.008833   4.244 2.20e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 99.813  on 71  degrees of freedom
## Residual deviance: 68.049  on 70  degrees of freedom
## AIC: 72.049
##
## Number of Fisher Scoring iterations: 5

There is a statistically significant relationship between nausea and the amount of pain medication taken as p-value (for the coefficient of painmed) = 2.20e-05 < 0.05.

### Part 2b

Convert the estimated coefficient of painmed to an odds ratio and interpret it in the context of the problem.

### Answer 2b

exp(coef(mod2)[2])
##  painmed
## 1.038199

The odds for having nausea multiply by 1.038 for every 1-unit increase in the amount of pain medication taken during the recovery period.

### Part 2c

Compute the predicted probabilities of a reconstructive knee surgery patient having nausea in the recovery time after surgery for when 50 units of pain medication are used and also for when 100 units of pain medication are used.

Comment on these two probabilities.

### Answer 2c

newdata2 <- data.frame(painmed = c(50, 100))
predict(mod2, newdata2, type = "response")
##         1         2
## 0.2335485 0.6650716

The probability of a patient having nausea when 50 units of pain medication are used is 0.2335, while the probability of a patient having nausea when 100 units of pain medication are used is 0.6651.

It means that the odds of having nausea when 50 units of pain medication are used equals 0.3047, while the odds of having nausea when 100 units of pain medication are used equals 1.986. It shows that the odds for having nausea multiply by 1.038199 for every 1-unit increase in the amount of pain medication taken.