- Details
- Parent Category: Programming Assignments' Solutions

# We Helped With This R Language Programming Homework: Have A Similar One?

Category | Programming |
---|---|

Subject | R | R Studio |

Difficulty | Undergraduate |

Status | Solved |

More Info | I Need Help With Statistics |

## Assignment Description

**Problem 1. (Subsampling)** This is a variant of the bootstrap where the sampling is without replacement. We
look at this variant in the context of building a confidence interval for the
mean *θ* of a distribution *F* on the real line. Consider an iid
sample *X _{1}, . . . , X_{n}*

_{ }from that distribution. The process is based on a choice of two positive integers

*r*and

*B*, where

*r/n*is typically small and

*B*is large. Let 1 −

*α*denote the desired confidence level.

1) For *b* = 1, . . . , *B*:

a) Draw without replacement r
observations from the sample, obtaining *X _{1}^{b} , . . . ,
X_{r}^{b}* .

b) Compute the corresponding sample
mean *X _{b}¯* and sample standard deviation

*S*, and form the t-ratio

_{b}*T*

_{b}= (X_{b}¯*−*

*X¯)/(S*.

_{b}/ √r)2) Let tγ denote the γ-quantile of {*T _{1},
. . . , T_{B}*} and return the interval

[*X¯ **−** t _{1}*

_{−}

_{α/2 }*S/√r, X¯*

*−*

*t*]

_{α/2}S/ r(Above, *X¯* and *S* denote
the sample mean and sample standard deviation of *X _{1}, . . . , X_{n}*.)

A.
Write a function subsample.mean.CI(x, conf = 0.95, sub =
length(x)/10, B = 999) that takes in the sample in the form of a
numerical vector x,
the desired confidence level conf, the subsample size sub (corresponding to *r* above),
and the number of bootstrap replicates B, and returns the interval as computed
above.

B.
Generate a sample of size *n* ∈ {1000, 2000, . . . ,
10000} from the standard normal distribution. Compute the subsampling
confidence intervals corresponding to subsample sizes *r* ∈ {*n*/50, *n*/20, *n*/10, *n*/5, *n*/2}. Also, compute
the bootstrap confidence interval (use the code from the solution to Homework
3). For each of these six confidence intervals, record whether it covers the
population mean (1 if yes, 0 if not) and its length. Repeat the whole thing *M* = 1000 times. First, plot the coverage (averaged over the repeats) as a function
of the sample size *n*. Then plot the length in a similar way. (Each of
these two plots will have six curves.) Make the plots nice, with different
colors identified in a legend for the different confident intervals. Offer some
brief comments.

**Problem 2. (Goodness-of-fit testing in
multiple samples)** Suppose we have multiple numerical
samples. Let* Y _{ij}* denote the

*i*th observation in group j, where

*i*= 1, . . . ,

*n*and

_{j}*j*= 1, . . . ,

*J*. We want to test the null hypothesis that the samples come from the same distribution.

A. Write a function permSST(y, g, B = 999) which takes in
the observations as a numerical vector y of size *n _{1} + · · · + n_{J} *and the corresponding group labels as a vector g taking values in {1, . .
. ,

*J*}, and also the number of repeats B, and returns the Monte Carlo permutation p-value for the treatment sum of squares. [You may want to write a function to compute that sum, so that the code is cleaner.]

B. Consider the situation where we have *J* groups of same size *n _{1} = · · · = n_{J} = m*. The jth
sample is drawn from the normal distribution with mean

*θ*and variance 1. As an alternative, choose

_{j}*θ*, where

_{j}= jτ*τ*> 0. (The larger τ is, the farther the alternative is from the null.) Record the p-value corresponding to the ANOVA F-test (the one that assumes equal variances) and record the p-value returned by your permSST function. Do this for

*J*∈ {2, 5, 10},

*m*∈ {10, 30, 100}, and five carefully chosen values of

*τ*, and repeat each setting

*M*= 200 times. In the same plot, for each of the two tests, graph the p-value (averaged over the

*M*repeats) as a function of

*τ*. (Thus you will end up generating 3 × 3 = 9 such plots in total.) The range of

*τ*should be such that the average p-values are clearly seen to change with

*τ*. Offer some brief comments.