- Details
- Parent Category: Programming Assignments' Solutions
We Helped With This R Language Programming Homework: Have A Similar One?

Category | Programming |
---|---|
Subject | R | R Studio |
Difficulty | Undergraduate |
Status | Solved |
More Info | I Need Help With Statistics |
Assignment Description
Problem 1. (Subsampling) This is a variant of the bootstrap where the sampling is without replacement. We look at this variant in the context of building a confidence interval for the mean θ of a distribution F on the real line. Consider an iid sample X1, . . . , Xn from that distribution. The process is based on a choice of two positive integers r and B, where r/n is typically small and B is large. Let 1 − α denote the desired confidence level.
1) For b = 1, . . . , B:
a) Draw without replacement r observations from the sample, obtaining X1b , . . . , Xrb .
b) Compute the corresponding sample mean Xb¯ and sample standard deviation Sb, and form the t-ratio Tb = (Xb¯ − X¯)/(Sb/ √r).
2) Let tγ denote the γ-quantile of {T1, . . . , TB} and return the interval
[X¯ − t1−α/2 S/√r, X¯ − tα/2 S/ r ]
(Above, X¯ and S denote the sample mean and sample standard deviation of X1, . . . , Xn.)
A. Write a function subsample.mean.CI(x, conf = 0.95, sub = length(x)/10, B = 999) that takes in the sample in the form of a numerical vector x, the desired confidence level conf, the subsample size sub (corresponding to r above), and the number of bootstrap replicates B, and returns the interval as computed above.
B. Generate a sample of size n ∈ {1000, 2000, . . . , 10000} from the standard normal distribution. Compute the subsampling confidence intervals corresponding to subsample sizes r ∈ {n/50, n/20, n/10, n/5, n/2}. Also, compute the bootstrap confidence interval (use the code from the solution to Homework 3). For each of these six confidence intervals, record whether it covers the population mean (1 if yes, 0 if not) and its length. Repeat the whole thing M = 1000 times. First, plot the coverage (averaged over the repeats) as a function of the sample size n. Then plot the length in a similar way. (Each of these two plots will have six curves.) Make the plots nice, with different colors identified in a legend for the different confident intervals. Offer some brief comments.
Problem 2. (Goodness-of-fit testing in multiple samples) Suppose we have multiple numerical samples. Let Yij denote the ith observation in group j, where i = 1, . . . , nj and j = 1, . . . , J. We want to test the null hypothesis that the samples come from the same distribution.
A. Write a function permSST(y, g, B = 999) which takes in the observations as a numerical vector y of size n1 + · · · + nJ and the corresponding group labels as a vector g taking values in {1, . . . , J}, and also the number of repeats B, and returns the Monte Carlo permutation p-value for the treatment sum of squares. [You may want to write a function to compute that sum, so that the code is cleaner.]
B. Consider the situation where we have J groups of same size n1 = · · · = nJ = m. The jth sample is drawn from the normal distribution with mean θj and variance 1. As an alternative, choose θj = jτ , where τ > 0. (The larger τ is, the farther the alternative is from the null.) Record the p-value corresponding to the ANOVA F-test (the one that assumes equal variances) and record the p-value returned by your permSST function. Do this for J ∈ {2, 5, 10}, m ∈ {10, 30, 100}, and five carefully chosen values of τ, and repeat each setting M = 200 times. In the same plot, for each of the two tests, graph the p-value (averaged over the M repeats) as a function of τ. (Thus you will end up generating 3 × 3 = 9 such plots in total.) The range of τ should be such that the average p-values are clearly seen to change with τ. Offer some brief comments.