- Details
- Parent Category: Programming Assignments' Solutions

# We Helped With This Statistics with R Programming Assignment: Have A Similar One?

Category | Programming |
---|---|

Subject | R | R Studio |

Difficulty | Undergraduate |

Status | Solved |

More Info | I Need Help In Statistics |

## Assignment Description

Lab Assignment 6

*Statistics
20 - Winter 2017*

*Due March
11, 2017 before 11 pm via upload to CCLE*

### 0. Instruction

• Submit both the .Rmd and .html files to CCLE. Name the files UID-lab06.Rmd and UID-lab06.html where UID should be replaced by your Bruin ID. The HTML file should be compiled from your .Rmd file correctly. Please organize your answers appropriately. If we are forced to search for an answer in a disorganized HTML file or if the file contains excessive and unnecessary output, the assignment will get 10 points deduction.

### 1. Gambler’s ruin (Huygens’ Result, 30 points)

Two players bet on the outcome of a series of coin tosses. After each flip of the coin, if it got a head, the player two transfers one penny to the player one, otherwise the player one transfers one penny to the player two. The game ends when one player has all the pennies (the other player went broke).

If the player one has *n*_{1 }pennies,
the player two has *n*_{2 }pennies, and the probability of getting
a head is *p*, we are interested in two things:

i. the probability that the player one will go broke, and ii. the expected number of tosses to determine the winner.

**1a.**

For this part, your task is to write a function gamblersRuin with three formal arguments

i. n1: the number of pennies the player one have at the beginning of the game, ii. n2: the number of pennies the player two have at the beginning of the game. iii. p: the probability of getting a head for each flip and it returns a named numeric vector of length three, where the elements are

i. len: the number of tosses, ii. n1: the number of pennies the player one have at the end of the game, iii. n2: the number of pennies the player two have at the end of the game.

Please use a repeat loop in your function and use rbinom for each coin tossing. An example of the output format is given below.

**set.seed**(2017) **gamblersRuin**(n1
= 10, n2
= 10, p = 0.5)

len n1 n2 196 0 20 **1b.**

Please write a for loop to simulate results from 10000 games using your gamblersRuin function with the formal arguments n1 = 5, n2 = 10 and p = 0.5. Store the result as one 10000 by 3 matrix res where each row is the output of each game and reveal the first 10 rows.

**1c.**

Here we want to use the proportion of n1 == 0 to estimate the probability that the player one will go broke and use the average value of len to estimate the expected number of tosses to determine the winner. Please use the res from 1b to give these two values.

Given a fair coin, Christiaan Huygens gave the following results:

i. the probability
that the player one will win the game is _{n}^{n }__ ^{2 }__and
ii. the expected number of tosses to determine the winner is

*n*

_{1}

*∗ n*

_{2 }= 50.

Your answer should be close to that.

**1d.**

Please redo 1b and 1c using the formal arguments n1 = 5, n2 = 10000 and p = 0.49.

Given this situation, the theoretical probability that the player one will win the game is

1 *− *(1*−pp*)*n*2

*p n*1+*n*2 *' *1*. *1 *− *(1*−p*)

That means, if you play a game with a negative expected value against some other players (casinos) with much more money than you have, you will eventually go broke.

### 2. Speed up! (25 points)

In this question, we want to compare the performance
between using a **for loop**, the **apply **family, **vectorization **and
the **built-in function **for calculating the column sums given a data frame
or a matrix.

**2a.**

i. Complete the function colSumsFor.

colSumsFor <- function(x) { v <- } |

ii. Complete the function colSumsFor2.

colSumsFor2 <- function(x) {

x
<- **as.matrix**(x) v <- **vector**(length = **ncol**(x))
for (...){

} v } |

... iii. Complete the function colSumsApply.

colSumsApply <- function(x) {

**apply**(**as.matrix**(x),
..., ...)

}

iv. Complete the function colSumsSapply.

colSumsSapply <- function(x) {

**sapply**(**as.data.frame**(x),
...)

}

v. Complete the function colSumsLapply.

colSumsLapply <- function(x) {

**do.call**(cbind, **lapply**(**as.data.frame**(x), ...))

}

vi. Issue the function colSumsCrossprod.

colSumsCrossprod <- function(x) {

**crossprod**(**rep**(1, **nrow**(x)), **as.matrix**(x))[1, ]

}

Test all your functions on the following objects.

mat
<- **as.matrix**(iris[, 1:4]) df <-
iris[, 1:4]

All you functions should return the same output as the follows.

**colSums**(mat)

Sepal.Length Sepal.Width Petal.Length Petal.Width

876.5 458.6 563.7 179.9

**2b.**

Run the following code to time the performance of your functions on a data frame.

df
<- **as.data.frame**(**matrix**(**rnorm**(5000), 50, 100)) colSumsDf <- **rbind**(colSumsFor = **system.time**(**replicate**(n = 10000, **colSumsFor**(df))), colSumsFor2 = **system.time**(**replicate**(n = 10000, **colSumsFor2**(df))), colSumsApply = **system.time**(**replicate**(n = 10000, **colSumsApply**(df))), colSumsSapply = **system.time**(**replicate**(n = 10000, **colSumsSapply**(df))), colSumsLapply = **system.time**(**replicate**(n = 10000, **colSumsLapply**(df))), colSumsCrossprod = **system.time**(**replicate**(n = 10000, **colSumsCrossprod**(df))), colSums = **system.time**(**replicate**(n = 10000, **colSums**(df))))

colSumsDf

Show your results. Which function is the fastest one? Which function is the slowest one?

**2c.**

Run the following code to time the performance of your functions on a numeric matrix. Show your results.

mat
<- **matrix**(**rnorm**(5000), 50, 100)
colSumsMat <- **rbind**(colSumsFor = **system.time**(**replicate**(n = 10000, **colSumsFor**(mat))), colSumsFor2 = **system.time**(**replicate**(n = 10000, **colSumsFor2**(mat))), colSumsApply = **system.time**(**replicate**(n = 10000, **colSumsApply**(mat))), colSumsSapply = **system.time**(**replicate**(n = 10000, **colSumsSapply**(mat))), colSumsLapply = **system.time**(**replicate**(n = 10000, **colSumsLapply**(mat))), colSumsCrossprod = **system.time**(**replicate**(n = 10000, **colSumsCrossprod**(mat))), colSums = **system.time**(**replicate**(n = 10000, **colSums**(mat))))

colSumsMat

Show your results. Which function is the fastest one? Which function is the slowest one? Is the order as the same as 2b?

### 3 Linear model (25 points)

Please use the 50th to 150th observations in the iris dataset to answer the following questions.

**3a.**

Fit the linear model

*Sepal.Length *= *Slope
∗ Petal.Length *+ *Intercept.*

Store your result to an object fit. Print summary(fit).

Reading from the summary, what is the *Intercept *and
the *Slope*? What is the Multiple R-squared?

**3b.**

Extract the regression coefficients and store them to an object coefs. Draw a scatter plot for those two variables. Add a straight line to show the best linear fit.

**3c.**

Generates 4 diagnostics plots. Choose the number of points to be labelled as an potential outlier or a high leverage point by eyeballing from the plots. Specify id.n to be the number of points you want to label (set it as 0 if none).

**3d.**

Extract the fitted values and the residuals and store them as fitted and residual. Draw a scatter plot for residual against fitted.

**3e.**

Evaluate 1 - (var(residual) / var(y), where y is the Sepal.Length of the 50th to 150th observations in the iris dataset.