# We Helped With This R Studio Programming Assignment: Have A Similar One?

SOLVED
Category Programming R | R Studio Graduate Solved Statistics Hw

## Assignment Description

ESC_PS 8850:

Quantitative Foundations in Educational Research

Thursday, 4:00PM – 6:45PM

Townsend Hall 101E

Wolfgang Wiedermann

University of Missouri

### A brief introduction into

Ten-year projection: 22,452 packages (cf. Fox & Leanage, 2016)

> a <- 2

> a

[1] 2

> A

> rm(a)

> a

> a <- b <- 2 > a

[1] 2

> b

[1] 2

> 4 -> a > a

[1] 4

> a <- 2; b <- 3 > a; b [1] 2

[1] 3

> a <- 2

> b <- 3 > a + b [1] 5

> a - b [1] -1

> a / b

[1] 0.6666667

> a * b [1] 6

> exp(a)

[1] 7.389056

> log(7.389056) [1] 2 > x <- 1:5 > x

[1] 1 2 3 4 5

> x + 3

[1] 4 5 6 7 8

> exp(x)

[1]   2.718282   7.389056  20.085537

[4]  54.598150 148.413159

## --- Numeric variables --- ## > x <- c(21.2, 23.1, 19.5, 20.9)

> x

[1] 21.2 23.1 19.5 20.9

> class(x)

[1] "numeric"

## ---- Character variables ---- ##

> y <- c("m", "w", "m", "m") > y

[1] "m" "w" "m" "m" > class(y)

[1] "character"

## ------------ Factors --------- ##

> y <- c(0,1,0,0) > class(y)

[1] "numeric" > yy <- factor(y) > class(yy)

[1] "factor"

> yy

[1] 0 1 0 0 Levels: 0 1 Variable Types in R > levels(yy) <- c("m", "w")

> yy

[1] m w m m Levels: m w

> is.numeric(yy)

[1] FALSE

## ------------------------------ ##

> y <- c(0,1,0,0)

> class(y)

[1] "numeric"

> levels(y) <- c("m", "w")

> y [1] 0 1 0 0 attr(,"levels") [1] "m" "w"

> is.numeric(y) [1] TRUE

> levels(y) [1] "m" "w"

[1] "mydata" ### -------- Return first lines of dataset: head(mydata)

id age sex A1 A2 A3 A4 A5 A6 A7 A8 A9 A10

1     1  24   0  3  3  1  0  2  0  0  0  1   0

2     2  32   1  3  2  3  3  0  0  0  1  0   0

3     3  27   1  2  3  2  2  0  0  0  0  0   0

4     4  30   1  4  5  2  0  1  2  0  0  0   0

5     5  32   1  3  2  3  0  0  0  0  1  1   0

### -------- View entire dataset: View(mydata)

### -------- Install and load package:

install.packages("foreign") library(foreign)

In read.spss(file = "C:/datafile.sav", to.data.frame = TRUE) :

#### C:/stgallen.sav: Unrecognized record type 7, subtype 18 encountered in system file

### -------- Return first lines of dataset: head(mydata) id age sex A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 1  1  24   0  3  3  1  0  2  0  0  0  1   0

2     2  32   1  3  2  3  3  0  0  0  1  0   0

3     3  27   1  2  3  2  2  0  0  0  0  0   0

4     4  30   1  4  5  2  0  1  2  0  0  0   0 5  5  32   1  3  2  3  0  0  0  0  1  1   0

Variables within a dataset:

id age sex A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 1  1  24   0  3  3  1  0  2  0  0  0  1   0

2     2  32   1  3  2  3  3  0  0  0  1  0   0

3     3  27   1  2  3  2  2  0  0  0  0  0   0

4     4  30   1  4  5  2  0  1  2  0  0  0   0

5     5  32   1  3  2  3  0  0  0  0  1  1   0

## ------------ return first six observations of “age”:

[1] 24 32 27 30 32 23

## ------------ Compute AUDIT sum score: mydata\$audit <- rowSums(mydata[, 4:13]) head(mydata)

id age sex A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 audit 1  1  24   0  3  3  1  0  2  0  0  0  1   0    10

2     2  32   1  3  2  3  3  0  0  0  1  0   0    12

3     3  27   1  2  3  2  2  0  0  0  0  0   0     9

4     4  30   1  4  5  2  0  1  2  0  0  0   0    14

5     5  32   1  3  2  3  0  0  0  0  1  1   0    10

6     6  23   0  2  5  3  0  0  0  0  0  0   0    10

> attach(mydata)

## ------ mean and median ------ ##

> mean(age, na.rm=TRUE) [1] 25.52995

> median(age, na.rm=TRUE)

[1] 25

## -- variance, stddev, quartile -- ##

> var(age, na.rm=TRUE) [1] 20.34285

> sd(age, na.rm=TRUE) [1] 4.510305

> sqrt(var(age, na.rm = TRUE)) [1] 4.510305

> quantile(age, na.rm = TRUE) 0%  25%  50%  75% 100%

16   22   25   28   38

> detach(mydata)

Descriptive Statistics in R ## ---------- Correlation ---------- ##

> cor(age, audit, use = "complete.obs",

+ method = "pearson")

[1] -0.01415857

> cor(age, audit, use = "complete.obs",

+ method = "spearman")

[1] 0.01356717

> cor(age, audit, use = "complete.obs",

+ method = "kendall")

[1] 0.007756665

## ---------- Data Summary ---------- ##

> summary(age)

Min. 1st Qu. Median  Mean  3rd Qu. Max.  NA's

16.00  22.00  25.00   25.53  28.00  38.00   5

> summary(mydata[, c(2,3,14)]) age             sex             audit

Min.   :16.00   Min.   :0.0000   Min.   : 4.00  1st Qu.:22.00   1st Qu.:0.0000   1st Qu.: 8.25  Median :25.00   Median :1.0000   Median :10.00

Mean   :25.53   Mean   :0.5405   Mean   :10.12  3rd Qu.:28.00   3rd Qu.:1.0000   3rd Qu.:12.00  Max.   :38.00   Max.   :1.0000   Max.   :19.00

NA's   :5

## Assignment Description

ESC_PS 8850:

Quantitative Foundations in

Educational Research

Class 2:

Hypothesis Testing

The Pearson Correlation Coefficient

Wolfgang Wiedermann

University of Missouri

Example Empirical Example:

Patterson, Reid, & Dishion (1992) proposed a model for the development of antisocial behavior and academic failure. Children at risk for antisocial behavior often develop poor relationships with teachers. As a result, teachers may be reluctant to adequately monitor/supervise these children or to reinforce them for prosocial behavior or academic achievement.

In other words, there should be a positive association between prosocial behavior and academic competence.

To test this hypothesis, we use a sample of n = 12 second grade elementary school children. Prosocial behavior and academic competence are measured on a scale from 0 – 10.

To generalize from the sample to the population of second graders, we use principles of statistical inference and hypothesis testing.

Hypothesis Testing

Hypothesis: A tentative explanation for a phenomenon.

Statistical Hypothesis: A statement about the value of a population parameter (e.g., mean), about the type of probability distribution of a variable (e.g., normally distributed), or about an effect or relationship.

Null Hypothesis (H0): The observations (values, relationships, effects) are results of pure chance.

Alternative Hypothesis (HA) or Research Hypothesis: The observations show a real effect that is not due to pure chance.

Hypothesis Testing:

To test whether our estimated value for the parameter is different enough from the hypothesized value to reject the null hypothesis in favor of the alternative hypothesis.

Results:

Either reject H0 or fail to reject H0.

Potential Outcome of Hypothesis Testing:

Type I Errors of 5% and Type II Errors of 20% are commonly accepted in the social sciences.

Probabilities of Hypothesis Testing:

1 – β is also known as the statistical power of a test.

General Steps of Hypothesis Testing:

1.             Check statistical assumptions (e.g., distributional assumptions).

2.             State the Null (H0) and the Alternative Hypothesis (HA).

3.             Specify the nominal significance level α (e.g., 0.05, 0.01, 0.001).

4.             Specify the statistical test (e.g., t-test, F-test, etc.).

5.             Compute the test statistic from the observed data (e.g., t-value)

6.             Compute the p-value P(data| H0), i.e., the probability of the observations under the Null Hypothesis.

7.             Retain or reject the Null Hypothesis and draw conclusions.

The Pearson Correlation

Pearson Correlation Coefficient

The Pearson correlation coefficient measures the linear association of two variables (e.g., 𝑥𝑥 and 𝑦𝑦) and can be described as the standardized crossproduct of 𝑥𝑥 and 𝑦𝑦, i.e.,

𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)

𝜌𝜌𝑥𝑥𝑦𝑦 =

𝜎𝜎𝑥𝑥𝜎𝜎𝑦𝑦

with 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦) being the covariance

1

𝑐𝑐𝑐𝑐𝑐𝑐𝑥𝑥, 𝑦𝑦 = 𝑥𝑥−𝑥𝑥̅(𝑦𝑦−𝑦𝑦)

𝑛𝑛

and 𝜎𝜎𝑥𝑥 and 𝜎𝜎𝑦𝑦 being the standard deviations of 𝑥𝑥 and 𝑦𝑦

2 = 1 𝑥𝑥−𝑥𝑥̅2                           𝜎𝜎𝑦𝑦2 = 1 𝑦𝑦−𝑦𝑦2

𝜎𝜎𝑥𝑥𝑛𝑛

### 𝑛𝑛

Properties of the Pearson Correlation

The Pearson correlation coefficient is a standardized measure bounded on the interval –1 and 1, i.e., −1 ≤𝜌𝜌𝑥𝑥𝑦𝑦≤ 1.

𝜌𝜌𝑥𝑥𝑦𝑦 < 0                         𝜌𝜌𝑥𝑥𝑦𝑦 = 0      𝜌𝜌𝑥𝑥𝑦𝑦 > 0

𝜌𝜌𝑥𝑥𝑦𝑦 = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)/(𝜎𝜎𝑥𝑥𝜎𝜎𝑦𝑦) describes the Pearson correlation coefficient in its symmetric form, i.e., 𝜌𝜌𝑥𝑥𝑦𝑦 = 𝜌𝜌𝑦𝑦𝑥𝑥

Properties of the Pearson Correlation

> install.packages("datasets")

#### > library(datasets)

> anscombe x1 x2 x3 x4    y1   y2    y3    y4 10 10 10  8  8.04 9.14  7.46  6.58

8  8  8  8  6.95 8.14  6.77  5.76

13 13 13  8  7.58 8.74 12.74  7.71

9  9  9  8  8.81 8.77  7.11  8.84 11 11 11  8  8.33 9.26  7.81  8.47

14 14 14  8  9.96 8.10  8.84  7.04 6  6  6  8  7.24 6.13  6.08  5.25

5                   10                 15                                                    5                    10                 15 4  4  4 19  4.26 3.10  5.39 12.50

12 12 12  8 10.84 9.13  8.15  5.56

x1                                                                                               x2                  7  7  7  8  4.82 7.26  6.42  7.91

Significance Test for Pearson Correlation

To test whether the observed correlation coefficient significantly differs from zero, we can use the t-test. The test statistic is defined as

#### 𝑡𝑡 = 𝜌𝜌𝑥𝑥𝑦𝑦

with 𝑛𝑛 – 2 degrees of freedom. The obtained t-value is part of the so-called

Example Empirical Example (cont‘d):

prosoc <- c(6,7,7,8,4,4,3,6,7,6,7,4) acad   <- c(5,9,8,5,4,3,3,7,8,8,7,3)

Step 1: Check assumptions (normality and linearity)

shapiro.test(prosoc)

Shapiro-Wilk normality test

data:  AMC

W = 0.8876, p-value = 0.1098

Shapiro-Wilk normality test

data:  AVC

W = 0.8805, p-value = 0.08905

Example

Step 2: Formulate H0 and HA

H0: Prosocial behavior and academic competence are statistically uncorrelated (ρ = 0).

H1: Prosocial behavior and academic competence are statistically correlated (ρ ≠ 0).

Step 3 and 4: Specify significance level and statistical test

Significane level: α = 0.05

Statistical Test: Pearson correlation

Step 5 and 6: Calculate test statistic and p-value cor.test(prosoc, acad)

Pearson's product-moment correlation data:  prosoc and acad

t = 3.8056, df = 10, p-value = 0.003454

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3494733 0.9317479

sample estimates:

cor 0.7691181

Step 7: Conclusion

We reject the Null Hypothesis and conclude that there exists a positive association between prosocial behavior and academic competence.

## Assignment Description

ESC_PS 8850:

Quantitative Foundations in

Educational Research

Class 3:

The Linear Regression Model

Wolfgang Wiedermann

University of Missouri

Outline

Basic Notation of the Linear Regression Model

Parameter Estimation: Ordinary Least Squares

Properties of the Linear Regression Model

Statistical Inference

Model Fit

Model Selection

### The Linear Regression Model

#### Bivariate Example

The simplest form of regression deals with one independent variable 𝑥𝑥 (the predictor/explanatory variable) and one dependent variable 𝑦𝑦 (the outcome/response).

It is assumed that the association between 𝑥𝑥 and 𝑦𝑦 can be described by a straight line:

Any straight line can be described by two parameters: The intercept (𝛽𝛽0) and the slope

(𝛽𝛽𝑦𝑦𝑦𝑦).

Regression equation: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥 + 𝜖𝜖

𝛽𝛽0 = 𝑦𝑦 − 𝛽𝛽1𝑥𝑥̅ reflects the predicted value 𝑦𝑦 when 𝑥𝑥 = 0

𝛽𝛽1 = 𝑐𝑐𝑐𝑐𝑐𝑐 𝑥𝑥, 𝑦𝑦 /𝜎𝜎𝑦𝑦2 reflects the change in 𝑦𝑦 when moving from 𝑥𝑥 to 𝑥𝑥 + 1, i.e., 𝛽𝛽1 = 𝑦𝑦 𝑥𝑥 + 1 − 𝑦𝑦 𝑥𝑥

Predicted values: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥

#### Basic Notation

The (general) linear model assumes that a response variable, 𝑦𝑦𝑖𝑖 (i = 1, ..., n), emerges from a set of (weighted) explanatory variables, 𝑥𝑥𝑗𝑗 (p = 1, ... k), and an error term:

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥𝑖𝑖1 + ⋯ + 𝛽𝛽𝑘𝑘𝑥𝑥𝑖𝑖𝑘𝑘          + 𝜀𝜀𝑖𝑖

systematic part            random part

or in matrix notation:

𝒚𝒚 = 𝑿𝑿𝜷𝜷 + 𝜺𝜺

𝒚𝒚 ... a 𝑛𝑛 × 1 response vector,

𝑿𝑿 ... a 𝑛𝑛 × 𝑘𝑘 matrix of explanatory variables (including a 1 vector for the intercept 𝛽𝛽0)

𝜷𝜷... 𝑘𝑘 × 1 vector of regression weights

𝜺𝜺... 𝑛𝑛 × 1 vector of disturbances

#### Basic Notation

The equation 𝒚𝒚=𝑿𝑿𝜷𝜷+𝜺𝜺 can be written as:

1    𝑥𝑥11       …   𝑥𝑥1𝑗𝑗     …   𝑥𝑥1𝑘𝑘

⋮        ⋮         ⋮         ⋮         ⋮         ⋮

1   𝑥𝑥𝑖𝑖1     …   𝑥𝑥𝑖𝑖𝑗𝑗    …   𝑥𝑥𝑗𝑗𝑘𝑘

⋮        ⋮         ⋮         ⋮         ⋮         ⋮

1   𝑥𝑥𝑛𝑛1    …  𝑥𝑥𝑛𝑛𝑗𝑗   …  𝑥𝑥𝑛𝑛𝑘𝑘

 𝒚𝒚   =                         𝑿𝑿𝜷𝜷The model is called general because: + 𝜺𝜺

More than one explanatory variable can be used to predict 𝒚𝒚.

The explanatory variables 𝑥𝑥𝑗𝑗 can be continuous (e.g., age) or categorical (e.g., age groups).

Systematic Part of the Model

The model is called linear because ...

... it can be written in the form 𝒚𝒚 = 𝑿𝑿𝜷𝜷+𝜺𝜺

... the model is linear in the parameters, e.g.,

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 +𝛽𝛽1𝑥𝑥𝑖𝑖1 +𝛽𝛽2𝑥𝑥𝑖𝑖12 +𝜀𝜀𝑖𝑖

or

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 +𝛽𝛽1𝑙𝑙𝑐𝑐𝑙𝑙(𝑥𝑥𝑖𝑖1)+𝜀𝜀𝑖𝑖 but not in the variables.

Note that, e.g., 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 +𝛽𝛽1𝑥𝑥𝑖𝑖1𝛽𝛽2 + 𝜀𝜀𝑖𝑖 is not linear in the parameters!

Systematic Part of the Model Parameter Interpretation:

Reconsider a multiple linear regression model where 𝑦𝑦 is predicted from two variables, 𝑥𝑥1 and 𝑥𝑥2:

𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥1 + 𝛽𝛽2𝑥𝑥2 + 𝜀𝜀

The effect 𝛽𝛽1 reflects the change in 𝑦𝑦 when moving from 𝑥𝑥1 to 𝑥𝑥1 + 1 while holding 𝑥𝑥2 constant at any value 𝑐𝑐, i.e.,

𝛽𝛽1 = 𝑦𝑦 𝑥𝑥1 + 1, 𝑥𝑥2 = 𝑐𝑐 − 𝑦𝑦 𝑥𝑥1, 𝑥𝑥2 = 𝑐𝑐

(this feature defines the effect as being unconditional

Analogous conclusions hold for 𝛽𝛽2, i.e.,

𝛽𝛽2 = 𝑦𝑦|𝑥𝑥2 + 1, 𝑥𝑥1 = 𝑐𝑐𝑦𝑦|𝑥𝑥2, 𝑥𝑥1 = 𝑐𝑐

#### Random Part of the Model

The random part (disturbances or error) are commonly assumed to follow a normal distribution with an expected value of zero and a variance 𝜎𝜎2, i.e.,

𝜀𝜀𝑖𝑖~𝑁𝑁(0, 𝜎𝜎2).

Graphical representation:

Some properties of the random part:      Hatzinger (2011)

The mean of the random part is zero: 𝜀𝜀̅ 𝑛𝑛   𝑛𝑛𝑖𝑖=1𝜀𝜀𝑖𝑖 = 0

The explanatory variables and the error term are assumed to be

independent: 𝐶𝐶𝑐𝑐𝐶𝐶(𝜀𝜀𝑖𝑖, 𝑥𝑥𝑗𝑗) = 0

Parameter Estimation The most well-known model estimation techniques are

Ordinary Least Squares Method (OLS):

...aims at minimizing the sum of the squared distances of observed data points and predicted values under a certain model.

Maximum Likelihood:

...aims at finding those parameter values 𝜷𝜷 which are most plausible for the data generating process.

OLS estimation considers the deviation of 𝒚𝒚 from its expected value under a certain model, i.e.,

𝜺𝜺 = 𝒚𝒚 − 𝑿𝑿𝜷𝜷

with 𝑿𝑿𝜷𝜷 being the predicted values 𝒚𝒚 under a certain model.

For example:

plot(prosoc, acad, pch=19, cex=1.2, xlim=c(0,10), ylim=c(0,10))

abline(mod) # add regression line pred <- fitted(mod) # predicted values segments(prosoc, acad, prosoc, pred, lty="dashed") # add residuals

OLS estimation considers the deviation of 𝒚𝒚 from its expected value under a certain model, i.e.,

𝜺𝜺 = 𝒚𝒚 − 𝑿𝑿𝜷𝜷

with 𝑿𝑿𝜷𝜷 being the predicted values 𝒚𝒚 under a certain model.

Specifically, OLS estimation uses the sum of the squared deviations, denoted by 𝑄𝑄:

𝑛𝑛

𝑄𝑄 = 𝜺𝜺𝜺𝜺 = 𝜀𝜀𝑖𝑖2

𝑖𝑖=1

We need to find estimators for the regression parameters 𝜷𝜷 which minimize 𝑄𝑄.

Step 1: Insert model equation

𝑄𝑄 = 𝜺𝜺𝜺𝜺

=𝒚𝒚 − 𝑿𝑿𝜷𝜷 𝒚𝒚 − 𝑿𝑿𝜷𝜷

=𝒚𝒚− 𝜷𝜷𝑿𝑿𝒚𝒚 − 𝑿𝑿𝜷𝜷

= 𝒚𝒚𝒚𝒚 − 𝜷𝜷𝑿𝑿𝒚𝒚 − 𝒚𝒚𝑿𝑿𝜷𝜷 + 𝜷𝜷𝑿𝑿𝑿𝑿𝜷𝜷

= 𝒚𝒚𝒚𝒚 − 𝟐𝟐𝜷𝜷𝑿𝑿𝒚𝒚 + 𝜷𝜷𝑿𝑿𝑿𝑿𝜷𝜷

Note: The terms 𝜷𝜷𝑿𝑿𝒚𝒚 and 𝒚𝒚𝑿𝑿𝜷𝜷 in line 4 can be re-written as 𝟐𝟐𝜷𝜷𝑿𝑿𝒚𝒚 because 𝜷𝜷𝑿𝑿𝒚𝒚= 𝒚𝒚𝑿𝑿𝜷𝜷 [... due to 𝜷𝜷𝑿𝑿𝒚𝒚=(𝒚𝒚𝑿𝑿𝜷𝜷)]. Thus, −𝜷𝜷𝑿𝑿𝒚𝒚−𝒚𝒚𝑿𝑿𝜷𝜷=−𝟐𝟐𝜷𝜷𝑿𝑿𝒚𝒚.

Step 2: Find the derivative of 𝑸𝑸 and set it to zero:

𝜕𝜕𝑄𝑄′       ′𝒚𝒚 + 𝟐𝟐𝑿𝑿𝑿𝑿𝜷𝜷 = 𝟎𝟎

= −2𝑿𝑿

𝜕𝜕𝜷𝜷

𝜕𝜕𝜷𝜷𝑿𝑿𝒚𝒚

Rules for matrix derivatives: 𝜕𝜕𝜷𝜷=𝑿𝑿𝒚𝒚 and : 𝜕𝜕𝜷𝜷𝜕𝜕𝜷𝜷𝑿𝑿𝑿𝑿𝜷𝜷 =𝟐𝟐𝑿𝑿𝑿𝑿𝜷𝜷

Step 3: Solve −2𝑿𝑿𝒚𝒚 + 𝟐𝟐𝑿𝑿𝑿𝑿𝜷𝜷 = 𝟎𝟎 for 𝜷𝜷:

−2𝑿𝑿𝒚𝒚 + 𝟐𝟐𝑿𝑿𝑿𝑿𝜷𝜷 = 𝟎𝟎

thus

𝟐𝟐𝑿𝑿𝑿𝑿𝜷𝜷 = 2𝑿𝑿𝒚𝒚

𝑿𝑿𝑿𝑿𝜷𝜷 = 𝑿𝑿𝒚𝒚

Multiplying the last expression by the inverse (𝑿𝑿𝑿𝑿)−1 leads to

(𝑿𝑿𝑿𝑿)−1𝑿𝑿𝑿𝑿𝜷𝜷 = (𝑿𝑿𝑿𝑿)−1𝑿𝑿𝒚𝒚.

Because (𝑿𝑿𝑿𝑿)−1𝑿𝑿𝑿𝑿 equals the identity matrix 𝑰𝑰 = (𝑿𝑿𝑿𝑿)−1𝑿𝑿𝑿𝑿 (i.e., a diagonal matrix of 1s), we obtain

(𝑿𝑿𝑿𝑿)−1𝑿𝑿𝑿𝑿𝜷𝜷 = 𝑰𝑰𝜷𝜷 = 𝜷𝜷 = (𝑿𝑿𝑿𝑿)−𝟏𝟏𝑿𝑿𝒚𝒚

A closer look at 𝜷𝜷 = (𝑿𝑿𝑿𝑿)−𝟏𝟏𝑿𝑿𝒚𝒚

The cross-product matrix 𝑿𝑿𝑿𝑿 provides information on the magnitude of co-variation among the explanatory variables.

The cross-product matrix 𝑿𝑿𝒚𝒚 provides information on the magnitude of co-variation of the explanatory variables and the response variable.

Note that we need to find the inverse of 𝑿𝑿𝑿𝑿 (i.e., (𝑿𝑿𝑿𝑿)−𝟏𝟏). In general, matrix inverses are sensititve to the magnitude association among matrix columns (i.e., the predictors). In regression analysis → see collinearity issue.

For the univariate case (i.e., considering only one predictor), we obtain

𝑿𝑿𝑿𝑿 = ∑𝑖𝑖 𝑥𝑥𝑖𝑖2 and 𝑿𝑿𝒚𝒚 = ∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑦𝑦𝑖𝑖. Thus, the equation reduces to:

∑(𝑥𝑥 − 𝑥𝑥̅)(𝑦𝑦 − 𝑦𝑦) 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)

𝛽𝛽 =    ∑(𝑥𝑥 − 𝑥𝑥̅)2        =     𝜎𝜎𝑦𝑦2

Empirical Example: Artifical dataset with known parameter values:

set.seed(1842)  # to reproduce results n  <- 500       # sample size b0 <- 2.5     # true intercept b1 <- 5.7     # true slope of x1 b2 <- 3.2    # true slope of x2

### ----- Generate artifical data:

##### x1    <- rnorm(n, mean = 0, sd = 2) x2    <- rnorm(n, mean = 1, sd = 3.5) error <- rnorm(n, mean = 0, sd = 1) y     <- b0 + x1*b1 + x2*b2 + error

### ----- Show dataset: mydata <- data.frame(y, x1, x2) head(mydata) y          x1         x2 1   2.140325 -1.06642262  2.0639484 2  -5.599110 -0.57981954 -1.6829349 3   6.253962 -1.42709712  3.2378684

4   5.722593 -0.08850081  0.7485646 5 -15.115317 -1.21956682 -2.7025510 6  34.428496  3.87573159  3.0039056

Estimate parameters using 𝜷𝜷 = (𝑿𝑿𝑿𝑿)−1𝑿𝑿𝒚𝒚:

X <- cbind(1, x1, x2)     # create the design matrix X (incl. the intercept) beta <- solve(t(X) %*% X) %*% t(X)%*%y # estimate parameters using OLS beta

[,1]

2.502532 x1 5.667552 x2 3.194397

Note: solve()is used to compute in inverse of a matrix, t() generates a transposed matrix, %*% is used for matrix multiplications.

Using the build-in function: lm(...) model <- lm(y ~ x1 + x2, data = mydata) coef(model)

(Intercept)          x1          x2

2.502532    5.667552    3.194397

For the simulated data, we get the following regression equation:

𝑦𝑦𝑖𝑖 = 2.503 + 5.668  𝑥𝑥𝑖𝑖1 + 3.194  𝑥𝑥𝑖𝑖2 + 𝜀𝜀𝑖𝑖

The predicted values (𝑦𝑦𝑖𝑖) are obtained through:

𝑦𝑦𝑖𝑖 = 2.503 + 5.668  𝑥𝑥𝑖𝑖1 + 3.194  𝑥𝑥𝑖𝑖2

The estimated (raw) residuals are obtained through 𝜀𝜀𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝑦𝑦𝑖𝑖.

##### y.hat <- fitted(model) # predicted values

y.res <- resid(model)  # raw residuals head(cbind(y, y.hat, y.res)) y      y.hat      y.res 1   2.140325   3.051597 -0.9112712 2  -5.599110  -6.159587  0.5604775 3   6.253962   4.757421  1.4965403

4   5.722593   4.392161  1.3304320 5 -15.115317 -13.042446 -2.0728711 6  34.428496  34.064108  0.3643884

### Statistical Inference

Confidence Interval:

Under the assumption that the error term follows a normal distribution

with zero mean and standard deviation 𝜎𝜎𝜀𝜀 (i.e., 𝜀𝜀 ~ 𝑁𝑁(0, 𝜎𝜎𝜀𝜀)), one obtains that sample estimates for the intercept (𝛽𝛽̂0) and the slope (𝛽𝛽̂1) also follow a normal distribution:

̅                               𝛽𝛽̂1 ~ 𝑁𝑁 𝛽𝛽 ,

𝛽𝛽̂0 ~ 𝑁𝑁𝛽𝛽0, 𝜎𝜎𝜀𝜀

Replacing 𝜎𝜎𝜀𝜀 with the sample estimate gives the standard errors 𝑆𝑆𝑆𝑆 𝛽𝛽̂0 and 𝑆𝑆𝑆𝑆 𝛽𝛽̂1 and a level (1 – 𝛼𝛼) confidence interval is given by

𝛽𝛽̂0 − 𝑡𝑡𝑐𝑐𝑐𝑐𝑖𝑖𝑐𝑐 × 𝑆𝑆𝑆𝑆 𝛽𝛽̂0 ; 𝛽𝛽̂0 + 𝑡𝑡𝑐𝑐𝑐𝑐𝑖𝑖𝑐𝑐 × 𝑆𝑆𝑆𝑆 𝛽𝛽̂0 𝛽𝛽̂1 − 𝑡𝑡𝑐𝑐𝑐𝑐𝑖𝑖𝑐𝑐 × 𝑆𝑆𝑆𝑆 𝛽𝛽̂1 ; 𝛽𝛽̂1 + 𝑡𝑡𝑐𝑐𝑐𝑐𝑖𝑖𝑐𝑐 × 𝑆𝑆𝑆𝑆 𝛽𝛽̂1

where 𝑡𝑡𝑐𝑐𝑐𝑐𝑖𝑖𝑐𝑐 is the upper 𝛼𝛼/2 critical value of Student’s t-distribution with df

= n – 2.

(General) Linear Model Significant test:

To test the null hypothesis that 𝐻𝐻0:𝛽𝛽1 = 0 we can use

𝑡𝑡 = 𝛽𝛽̂1              with   𝑆𝑆𝑆𝑆 𝛽𝛽̂1 =       𝜎𝜎𝜀𝜀

𝑆𝑆𝑆𝑆 𝛽𝛽̂1∑ 𝑥𝑥 − 𝑥𝑥̅ 2

The p-value for the test statistic can be found from Student’s tdistribution with df = n – 2.

When the regression assumptions are fulfilled, evaluating 𝐻𝐻0:𝛽𝛽1 = 0 corrsponds to testing whether there is a linear relation between 𝑥𝑥 and 𝑦𝑦.

#### Statistical inference using summary() and confint(): model <- lm(y ~ x1 + x2, data = mydata)

summary(model) Call: lm(formula = y ~ x1 + x2)

Residuals:

Min      1Q  Median      3Q     Max -3.5341 -0.6187  0.0296  0.6949  3.3571

Coefficients:

Estimate Std. Error t value Pr(>|t|)    (Intercept)  2.50253    0.04622   54.14   <2e-16 *** x1           5.66755    0.02167  261.53   <2e-16 *** x2           3.19440    0.01246  256.41   <2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[...output omitted...]

#### confint(model, level = 0.95)

2.5 %   97.5 %

(Intercept) 2.411718 2.593346 x1          5.624974 5.710130 x2          3.169919 3.218874

### Model Fit

Coefficient of Determination (𝑹𝑹𝑹):

The 𝑅𝑅𝑹 value is based on partitioning the total variation of the observations

𝑦𝑦𝑖𝑖 (without considering predictors) into (1) variation of the fitted regression 𝑦𝑦𝑖𝑖 around the mean 𝑦𝑦 and (2) variation of observed values 𝑦𝑦𝑖𝑖 around the fitted regression 𝑦𝑦𝑖𝑖 :

(𝑦𝑦𝑖𝑖 − 𝑦𝑦) = (𝑦𝑦𝑖𝑖 − 𝑦𝑦) + (𝑦𝑦𝑖𝑖 − 𝑦𝑦𝑖𝑖)

The same relation holds for the sums of squared deviations:

(𝑦𝑦𝑖𝑖 − 𝑦𝑦)𝑹 = (𝑦𝑦𝑖𝑖 − 𝑦𝑦)𝑹 + (𝑦𝑦𝑖𝑖 − 𝑦𝑦𝑖𝑖)𝑹

 Total sum of squares (TSS) Sums of squares of the fitted regression around the mean (FSS) Sums of squares of the observed values around the fitted regression (RSS)

(General) Linear Model

(𝑦𝑦𝑖𝑖 − 𝑦𝑦)𝑹 =     (𝑦𝑦𝑖𝑖 − 𝑦𝑦)𝑹    +    (𝑦𝑦𝑖𝑖 − 𝑦𝑦𝑖𝑖)𝑹

Total sum Sums of squares Sums of squares of of squares of the fitted the observed values

(TSS) regression around around the fitted the mean (FSS) regression (RSS) The coefficient of determination is defined as:

𝐹𝐹𝑆𝑆𝑆𝑆      𝑅𝑅𝑆𝑆𝑆𝑆

𝑅𝑅𝑹 =  = 1 −

𝑇𝑇𝑆𝑆𝑆𝑆       𝑇𝑇𝑆𝑆𝑆𝑆

with 0 ≤ 𝑅𝑅𝑹 ≤ 1, while larger values indicate a better model fit.

Alternatively, the coefficient of determination may also be interpreted as the squared correlation between observed (𝒚𝒚) and predicted (𝒚𝒚) values of the response variable:

𝑅𝑅𝑹 = 𝑐𝑐𝑐𝑐𝐶𝐶(𝒚𝒚, 𝒚𝒚)𝑹

Coefficient of Determination in R:

#### summary(model.h)

[... Output omitted ...]

Residual standard error: 1.001 on 497 degrees of freedom

Multiple R-squared:  0.9965,    Adjusted R-squared:  0.9965 F-statistic: 7.127e+04 on 2 and 497 DF,  p-value: < 2.2e-16

summary(model.h)\$r.squared [1] 0.9965256

### ------ R² using sums of squares:

TSS <- sum((y-mean(y))^2)

FSS <- sum((fitted(model.h) - mean(y))^2)

#### FSS/TSS

[1] 0.9965256

### ------- R² as the correlation between observed and fitted values:

cor(y, fitted(model.h))^2

[1] 0.9965256

### Model Selection

In practical applications we need to find an appropriate model.

Consider the model which is the hierarchically higher model (i.e., the more complex model) with 𝒫𝒫parameters. For example:

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥𝑖𝑖1 + 𝛽𝛽2𝑥𝑥𝑖𝑖2 + 𝜀𝜀𝑖𝑖.

Let 𝑐𝑐 be a restricted model (i.e., a hierarchically lower model) with 𝒫𝒫𝑐𝑐 parameters (while assuming 𝒫𝒫𝑐𝑐 < 𝒫𝒫). For example:

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑥𝑥𝑖𝑖1 + 𝜀𝜀𝑖𝑖.

We use the sum of the squared deviations of each observation (𝒚𝒚) around its estimated expected value (𝒚𝒚). The residual sum of squares

(𝑅𝑅𝑆𝑆𝑆𝑆) are defined as

 Model ℳℎ: 𝑅𝑅𝑆𝑆𝑆𝑆ℎ = ∑(𝒚𝒚 −𝒚𝒚)𝑹 = ∑[𝑦𝑦𝑖𝑖 − (𝛽𝛽0 + 𝛽𝛽1𝑥𝑥𝑖𝑖1 + 𝛽𝛽2𝑥𝑥𝑖𝑖2)]𝑹 Model ℳ𝑐𝑐: 𝑅𝑅𝑆𝑆𝑆𝑆𝑐𝑐 = ∑(𝒚𝒚 −𝒚𝒚)𝑹 = ∑[𝑦𝑦𝑖𝑖 − (𝛽𝛽0 + 𝛽𝛽1𝑥𝑥𝑖𝑖1)]𝑹

In general, one assumes 𝑅𝑅𝑆𝑆𝑆𝑆≤ 𝑅𝑅𝑆𝑆𝑆𝑆𝑐𝑐 because the more parameters are in the model, the better will be the model fit (thus, the smaller are the deviations around the fitted regression line).

When 𝑅𝑅𝑆𝑆𝑆𝑆is close to 𝑅𝑅𝑆𝑆𝑆𝑆𝑐𝑐, the additional parameters of the model do not help to better explain the variability of the observed data.

Statistical Test:

𝑅𝑅𝑆𝑆𝑆𝑆𝑐𝑐 − 𝑅𝑅𝑆𝑆𝑆𝑆𝑅𝑅𝑆𝑆𝑆𝑆

𝐹𝐹 = ÷

𝑑𝑑𝑑𝑑𝑐𝑐 − 𝑑𝑑𝑑𝑑ℎ  𝑑𝑑𝑑𝑑

with 𝑑𝑑𝑑𝑑𝑐𝑐 − 𝑑𝑑𝑑𝑑and 𝑑𝑑𝑑𝑑degrees of freedom (𝑑𝑑𝑑𝑑 = #𝑐𝑐𝑜𝑜𝑜𝑜 − #𝑝𝑝𝑝𝑝𝐶𝐶𝑝𝑝𝑝𝑝). The

null hypothesis of the F-test states that the reduces model (𝑐𝑐) is adequate, i.e., that the additional parameters are not needed.

Performing the F-test in R:

### -------- Estimate both models:

#### model.h <- lm(y ~ x1 + x2) model.r <- lm(y ~ x1)

### -------- Perform F-test:

anova(model.r, model.h) Analysis of Variance Table

Model 1: y ~ x1

Model 2: y ~ x1 + x2

Res.Df   RSS Df Sum of Sq     F    Pr(>F)    1    498 66408                                 2    497   498  1     65910 65745 < 2.2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Conclusion: The additional parameter is needed to explain the observed data.

Model Selection Strategies:

Simultaneous Regression: All IVs are entered simultaneously, i.e., there is no theoretical bases to logically order the variables.

Hierarchical Regression: IVs are entered cumulatively according to a pre-specified theoretical hierarchy with a focus on the change in 𝑅𝑅2. For example, when regressing 𝑦𝑦 on 𝑥𝑥 one obtains 𝑅𝑅𝑦𝑦2|𝑦𝑦 = 0.20. When regressing 𝑦𝑦 on 𝑥𝑥 and 𝑧𝑧 one obtaines 𝑅𝑅𝑦𝑦2|𝑦𝑦𝑥𝑥 = 0.27. Thus, after

controling for 𝑥𝑥, 𝑧𝑧 accounts for 7% of the variance in 𝑦𝑦 (i.e., 𝑅𝑅𝑦𝑦2|𝑦𝑦𝑥𝑥 −𝑅𝑅𝑦𝑦2|𝑦𝑦 = 0.07).

Stepwise Regression: IVs are automatically selected based on their relative contribution (see R function step(...))

Forward Selection: At each stage the IV with the largest change in 𝑅𝑅2 is selected.

Backward Selection: In the first step, all IVs are used and the one IV with the smalles contribution is dropped.

## Assignment Code

``````
# -------------------------------------------------
# ------------- Analysis of Variances
# -------------------------------------------------

# get means

aggregate(reading ~ group, data = anova_dat)
#1     1    37.2
#2     2    45.2
#3     3    45.8

# ANOVA

m <- aov(reading ~ group, data = anova_dat)
#Terms:
#                   group Residuals
#Sum of Squares  184.9000  324.0333  >>--->> CAUTION: Wrong degrees of freedom
#Deg. of Freedom        1        13
#Residual standard error: 4.992559
#Estimated effects may be unbalanced

anova_dat\$group.f <- factor(anova_dat\$group)

m <- aov(reading ~ group.f, data = anova_dat)
summary(m)

# planned comparison

comp1 <- c(-1, 0.5, 0.5)  # specifying intended comparison
comp2 <- c(0, -1, 1)
weight <- cbind(comp1, comp2) # weight matrix
contrasts(anova_dat\$group.f) <- weight

m2 <- aov(reading ~ group.f, data = anova_dat)
summary(m2, split = list(group.f = list("Ave. Treat versus Control" = 1, "Par versus Non-Par" = 2)) )
#                                     Df Sum Sq Mean Sq F value  Pr(>F)
#group.f                               2  230.5   115.3   4.968 0.02679 *
#  group.f: Ave. Treat versus Control  1  229.6   229.6   9.898 0.00844 **
#  group.f: Par versus Non-Par         1    0.9     0.9   0.039 0.84716
#Residuals                            12  278.4    23.2
#new alpha: .05 / 3 = 0.01666667

t.test(reading ~ group.f, data = subset(anova_dat, group.f != "3") )  # group 1 vs 2
#t.test(reading ~ group.f, data = subset(anova_dat, group.f == "1" | group.f == "2") )
t.test(reading ~ group.f, data = subset(anova_dat, group.f != "2") )  # group 1 vs 3
t.test(reading ~ group.f, data = subset(anova_dat, group.f != "1") )  # group 2 vs 3

#  1     2
#2 0.046 -
#3 0.046 0.847

aggregate(reading ~ group.f, data = anova_dat, mean)
#1       1    37.2
#2       2    45.2
#3       3    45.8
aggregate(reading ~ group.f, data = anova_dat, sd)
#1       1 4.658326
#2       2 4.969909
#3       3 4.816638
aggregate(reading ~ group.f, data = anova_dat, summary)
#1       1         33.0            34.0           35.0         37.2            40.0         44.0
#2       2         38.0            43.0           46.0         45.2            48.0         51.0
#3       3         42.0            43.0           44.0         45.8            46.0         54.0

# --- GLM Approach

anova_dat\$nonpar <- ifelse(anova_dat\$group.f == "2", 1, 0)
anova_dat\$par <- ifelse(anova_dat\$group.f == "3", 1, 0)

m.lm <- lm(reading ~ nonpar + par, data = anova_dat)
summary(m.lm)
#Call:
#lm(formula = reading ~ nonpar + par, data = anova_dat)
#Residuals:
#   Min     1Q Median     3Q    Max
#  -7.2   -3.0   -1.8    2.8    8.2
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)   37.200      2.154  17.270 7.68e-10 ***
#nonpar         8.000      3.046   2.626   0.0221 *    >>--->> nonpar vs control
#par            8.600      3.046   2.823   0.0154 *    >>--->> par vs control
#---
#Signif. codes:  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1
#Residual standard error: 4.817 on 12 degrees of freedom
#Multiple R-squared:  0.453,     Adjusted R-squared:  0.3618
#F-statistic: 4.968 on 2 and 12 DF,  p-value: 0.02679

m0 <- lm(reading ~ 1, data = anova_dat)
#Call:
#lm(formula = reading ~ 1, data = anova_dat)
#Coefficients:
#(Intercept)
#      42.73  >>--->> grand mean
anova(m0, m.lm)
#Analysis of Variance Table
#Model 2: reading ~ nonpar + par
#  Res.Df    RSS Df Sum of Sq      F  Pr(>F)
#1     14 508.93
#2     12 278.40  2    230.53 4.9684 0.02679 *  >>--->> summary(aov(...))

``````

## Assignment Code

``````
# ------------------------------------------
# ------------- Introduction into R
# ------------------------------------------

# --- generate variable in R

# numeric variable "age"

age <- c(3, 5, 2.5)
age
#[1] 3.0 5.0 2.5
age + 2
#[1] 5.0 7.0 4.5

# center variable (i.e. subtract mean of "age")
age.c <- age - mean(age)

mean(age)
#[1] 3.5
mean(age.c)
#[1] 0
age
#[1] 3.0 5.0 2.5
age.c
#[1] -0.5  1.5 -1.0

# categorical variable "gender"

gender <- c("female", "male", "female")
gender.2 <- c(1, 0, 1)

class(age)
#[1] "numeric"
class(gender.2)
#[1] "numeric"

gender.2 <- factor(gender.2)
levels(gender.2) <- c("male", "female")

# ---- datasets in R

dat <- data.frame(age, age.c, gender.2)
dat
#  age age.c gender.2
#1 3.0  -0.5   female
#2 5.0   1.5     male
#3 2.5  -1.0   female

dat[2 , ]
#  age age.c gender.2
#2   5   1.5     male
dat[2 , 3 ]
#[1] male
#Levels: male female
dat[ , 2 ]
#[1] -0.5  1.5 -1.0

# access variables in dataset

mean(mydata[ , 2], na.rm = TRUE)
#[1] 25.52995
mean(mydata\$age, na.rm = TRUE)
#[1] 25.52995

mydata\$audit <- rowSums( mydata[ , 4:13 ] )

# remove variable "id"
mydata\$id <- NULL

``````

## Assignment Code

``````
# -------------------------------------------------
# ------------- Hypothesis Testing and Correlation
# -------------------------------------------------

prosoc <- c(6, 7, 7, 8, 4, 4, 3, 6, 7, 6, 7, 4)
acad   <- c(5, 9, 8, 5, 4, 3, 3, 7, 8, 8, 7, 3)

# --- descriptive statistics
mean(prosoc)
#[1] 5.75
median(prosoc)
#[1] 6
sd(prosoc)
#[1] 1.602555
quantile(prosoc)
#  0%  25%  50%  75% 100%
#   3    4    6    7    8
summary(prosoc)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#   3.00    4.00    6.00    5.75    7.00    8.00

# --- Pearson correlation

#[1] 0.7691181
#prosoc 1.0000000 0.7691181

# --- visual check of linearity

plot(prosoc, acad, pch = 21, bg = "grey", cex = 2,
xlab = "Prosocial Behavior",
main = "My first scatterplot",
xlim = c(0, 10), ylim = c(0, 10))

# --- significance test for correlation

# normality of variables:

par(mfrow = c(1,2))

hist(prosoc, col = "grey", xlim = c(0, 10),
xlab = "Prosocial Behavior")

hist(acad, col = "grey", xlim = c(0, 10),

# run t-test:

#        Pearson's product-moment correlation
#t = 3.8056, df = 10, p-value = 0.003454
#alternative hypothesis: true correlation is not equal to 0
#95 percent confidence interval:
# 0.3494733 0.9317479
#sample estimates:
#      cor
#0.7691181

``````

## Assignment Code

``````
# -------------------------------------------------
# ------------- Regression Diagnostics
# -------------------------------------------------

m <- lm(swls ~ fs, data = mydata)

# --- Normality assumption

r.swls <- resid(m)

hist(r.swls, breaks = 20, col = "red", xlab = "Residuals", main = "swls ~ fs")
boxplot(r.swls, col = "blue", xlab = "Residuals")

qqnorm(r.swls)  # QQ plot

# significance tests

install.packages("moments")  # only use once
library(moments)

shapiro.test(r.swls)

# --- Homoskedasticity

r.swls <- resid(m)

plot(mydata\$fs, r.swls, pch = 21, bg = "grey",
xlab = "Psych. Well-Being", ylab = "Residuals",
ylim = c(-3,3))

abline(h = 0, lty = "dashed", lwd = 1.5)

# significance test

install.packages("lmtest")  # run only once
library(lmtest)

bptest(m)
#        studentized Breusch-Pagan test
#data:  m
#BP = 8.1597, df = 1, p-value = 0.004283  -> violated assumption

# log transformation

summary(mydata\$swls)
#  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
#  1.800   4.400   5.200   5.007   5.800   7.000   # no values <= 0

m.log <- lm(log(swls) ~ fs, data = mydata)
bptest(m.log)
#        studentized Breusch-Pagan test
#data:  m.log
#BP = 24.696, df = 1, p-value = 6.713e-07 -> still violated assumption

# Box-Cox transformation

install.packages("MASS")
library(MASS)

bc.out <- boxcox(m, lambda = seq(-3, 6, by = .0001), plotit = TRUE)
bc.out\$x[which.max(bc.out\$y)] # gives lambda = 1.7857

mydata\$swls.bc <- (mydata\$swls^1.7857 - 1) / 1.7857

par(mfrow=c(1,2))
hist(mydata\$swls, breaks = 10, col = "grey")
hist(mydata\$swls.bc, breaks = 10, col = "grey")

m.bc <- lm(swls.bc ~ fs, data = mydata)
bptest(m.bc)
#        studentized Breusch-Pagan test
#data:  m.bc
#BP = 0.3652, df = 1, p-value = 0.5456

# post-hoc correction of standard errors

install.packages("lmtest")
install.packages("sandwich")

library(lmtest)   # coeftest
library(sandwich) # vcovHC

coeftest(m, vcov = vcovHC(m, type = "HC3") )
#t test of coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.531472   0.233585  2.2753  0.02327 *
#fs          0.816283   0.040966 19.9258  < 2e-16 ***
#---
#Signif. codes:  0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1
# --- Outliers and Influential Data Points
# hat-values
h <- hatvalues(m)
outliers <- which(h > 3*mean(h) ) # identify outliers

mydata_reduced <- mydata[ -outliers, ]  # delete outliers from dataset
nrow(mydata_reduced)

m.nooutlier <- lm(swls ~ fs, data = mydata_reduced)
coef(m)
#(Intercept)          fs
#  0.5314718   0.8162830
coef(m.nooutlier)  # no major difference --> go with original model
#(Intercept)          fs
#  0.1732704   0.8786697

# Cook's distances

cd <- cooks.distance(m)
outliers2 <- which(cd > 4/(559 - 1 - 1))  # detect outliers

mydata_reduced2 <- mydata[ -outliers2 , ] # delete outliers

m.outlier2 <- lm(swls ~ fs, data = mydata_reduced2)
coef(m)
#(Intercept)          fs
#  0.5314718   0.8162830
coef(m.outlier2) # no major difference --> go with original model
#(Intercept)          fs
#  0.4176248   0.8455873

``````

## Assignment Code

``````
# -------------------------------------------------
# ------------- Linear Regression Model
# -------------------------------------------------

# --- scatterplot

plot(swls ~ age, data = mydata, pch = 21, bg = "grey", cex = 1.5, xlim = c(0, 80))

#plot(mydata\$age, mydata\$swls, pch = 21, bg = "grey", cex = 1.5)

# --- simple linear regression model

m1 <- lm(swls ~ age, data = mydata)
#Call:
#lm(formula = swls ~ age, data = mydata)
#Coefficients:
#(Intercept)          age
#    4.95865      0.00163

# --- multiple linear regression model

m2 <- lm(swls ~ age + fs, data = mydata)
#Call:
#lm(formula = swls ~ age + fs, data = mydata)
#Coefficients:
#(Intercept)          age           fs
#   0.577138    -0.001753     0.817466

# --- model fit

summary(m2)\$r.squared
#[1] 0.444049

pred <- predict(m2)
cor(pred, mydata\$swls)^2
#[1] 0.444049

# --- model selection

anova(m1, m2)
#Model 1: swls ~ age
#Model 2: swls ~ age + fs
#  Res.Df    RSS Df Sum of Sq     F    Pr(>F)
#1    557 640.51
#2    556 356.19  1    284.31 443.8 < 2.2e-16 ***

anova(m2, m3)

# backward selection

m0 <- lm(swls ~ sex + age + fs, data = mydata)
m1 <- lm(swls ~ sex + fs, data = mydata)
m2 <- lm(swls ~ age + fs, data = mydata)
m3 <- lm(swls ~ fs, data = mydata)
m4 <- lm(swls ~ 1, data = mydata)
anova(m1, m0)  # is age needed?
anova(m2, m0)  # is sex needed?
anova(m3, m1)  # is sex needed in new baseline model?
anova(m3, m2)  # is age needed in new baseline model?

anova(m4, m3)  # is fs needed?
# hierarchical

m0 <- lm(swls ~ age + sex, data = mydata)   # baseline adjusting demographics
m1 <- lm(swls ~ age + sex + fs, data = mydata) # adding predictor of interest
anova(m0, m1)

# stepwise

m <- step( lm(swls ~ sex + age + fs, data = mydata), direction = "both"  )     # backward and forward selection
m <- step( lm(swls ~ sex + age + fs, data = mydata), direction = "backward"  ) # backward selection only
m <- step( lm(swls ~ sex + age + fs, data = mydata), direction = "forward"  )  # forward selection only

``````

## Assignment Image

FILE Paste HOME INSERT X Cut Copy Format Painter Clipboard OM DESIGN PAGE LAYOUT REFERENCES MAILINGS REVIEW VIEW Consolas 11 AA AE·E·• **21 BIU-xx-7·A·-- Fort Faragraph n interval which does not include 1. Hence, having no central heating is not more likely to be above median rent in this sample. 3 Run a multivariate logistic regression model to predict rent2 groups using all four predictors simultaneously. Compare the ORs and 95% Cis for this model with the ORs and 95% Cls obtained in (2) Are there differences? 1) A multivariate logistic regression model m5 + gla(rent2 floor year heating public, family binomial(link logit"), data - mydata) sm5 = summary (m5) shim 1.dock-Word ACROBAT 2) The odds ratios and 95% confident intervals multi.or=cbind(t(t(exp(sm5\$coefficients)[-1,1])), sxp(confint (as))[-1,], round(sasscoef shind(table1,multi.or) I solnames (table2) [5] - "ultivat(OR) salnames (table2) [8] = "(>)" round (table2,3) ## ficients [-1,4],3)) 3) ORS and 95% CIs of Four logistic models and a multivariate logistic model table2= ALEIDO ABC ABbcel ABuct 1 Autor 1 og 1 Body Tom ##floor ##year ## heating #public 0.000 0.000 0.546 Uni Var (OR) 2.5 % 97.5 % Pr(>121) Multi.var(OR) 2.5 % 97.5 % Pr(>[z]) 1.039 1.028 1.050 0.000 1.043 1.032 1.055 0.000 1.011 1.003 1.019 0.007 1.012 1.005 1.018 0.216 0.126 0.358 1.118 0.777 1.610 0.283 0.153 0.506 0.000 1.036 0.687 1.560 0.867 No. The exact estimate values of ORs and Cls are different, but they are very similar. When all four predictors are in a model, as same as we see above, floor, year, and heating variables are statistically significant, and public is not. PA W TIE

## Assignment Description

ESCP 8850: Quantitative Foundations in Educational Research:

Midterm Exam

### Part I: Basic Concepts (10 points)

Match the following 10 items with the best options. Each option can be used more than once and multiple options may be correct. Fill the letters in the parentheses.

(        ) 1. HC3 correction

(        ) 2. R-squared in linear regression

(        ) 3. robust Breusch-Pagan test

(        ) 4. Type II Error

(        ) 5. Box Cox-transformation

(        ) 6. Dummy Coding

(        ) 7. D’Agostino test

(        ) 8. P(data | H0)

(        ) 9. Bonferroni correction (        ) 10. Slope Coefficient

Options:

(A)    Adjusting Type I Error rate for multiple comparisons

(B)    Testing for homoscedasticity

(C)    Correcting standard errors for non-normality

(D)    Correcting standard errors for heteroscedasticity

(E)    The probability of rejecting the H0 given that the effect does not exist in the population

(F)     Testing if a variable is asymmetrically distributed

(G)    Using k – 1 binary variables to represent k groups

(H)    Correcting for highly influential data points in a linear regression model

(I)       the p-value, i.e., the probability that the alternative hypothesis is correct given the observed data

(J)      The probability of retaining the H0 given that the effect exists in the population

(K)    Stabilizing the error variance in linear regression models

(L)    The squared correlation between observed and predicted response values

(M)  the p-value, i.e., the probability of observed data under the null hypothesis

(N)    Used to evaluate the model fit

(O)    Expected change in the dependent variable for a unit increase of the independent variable

### Part II: Applications in R (20 points)

The dataset yourname.csv contains the variables prosocial behavior (prosoc; standardized score), treatment status (treat; 1 = advanced teacher classroom management training, 2 = basic teacher classroom management training, and 3 = business as usual), and grade (grade; 1 = 1st grade, 2 = 2nd grade, 3 = 3rd grade) obtained from 543 students. When answering the questions, show the R code and the corresponding output.

1.      Convert the variables grade and treat into factors and compute means and standard deviations of prosoc for all grade levels and treatment groups. (2 points)

2.      Run a one-way ANOVA for prosoc and treat. Is there a significant difference of academic competence across treatment groups? Further, evaluate the ANOVA assumption of normality of the outcome variable. Is the assumption fulfilled? (4 points)

3.      Compare the average treatment effect of “advanced teacher classroom management training” and “basic teacher classroom management training” with the observed effect of the control group (“business as usual”) using planned comparisons. What are your conclusions? (2 points)

4.      Repeat the analyses of Question 2 using ordinary linear regression models using Dummy and Effect Coding. Interpret the regression coefficients for the treatment effects for both coding schemes. (4 points).

5.      Perform all possible pairwise contrasts for the model in Question 2 using Tukey’s HSD method and Bonferroni’s correction approach. What are your conclusions? (4 points)

6.      Extend the model in Question 4 by including the variable grade using Dummy Coding for both variables. It is hypothesized that grade has an additional effect on academic competence. Can you confirm this statement based on your analysis? (2 points)

7.      Repeat your analysis in Question 6 using Effect Coding. Does a different coding scheme change your conclusions? (2 points).

Is it free to get my assignment evaluated?

Yes. No hidden fees. You pay for the solution only, and all the explanations about how to run it are included in the price. It takes up to 24 hours to get a quote from an expert. In some cases, we can help you faster if an expert is available, but you should always order in advance to avoid the risks. You can place a new order here.

How much does it cost?

The cost depends on many factors: how far away the deadline is, how hard/big the task is, if it is code only or a report, etc. We try to give rough estimates here, but it is just for orientation (in USD):

 Regular homework \$20 - \$150 Advanced homework \$100 - \$300 Group project or a report \$200 - \$500 Mid-term or final project \$200 - \$800 Live exam help \$100 - \$300 Full thesis \$1000 - \$3000

How do I pay?

Credit card or PayPal. You don't need to create/have a Payal account in order to pay by a credit card. Paypal offers you "buyer's protection" in case of any issues.

Why do I need to pay in advance?

We have no way to request money after we send you the solution. PayPal works as a middleman, which protects you in case of any disputes, so you should feel safe paying using PayPal.

Do you do essays?

No, unless it is a data analysis essay or report. This is because essays are very personal and it is easy to see when they are written by another person. This is not the case with math and programming.

Why there are no discounts?

It is because we don't want to lie - in such services no discount can be set in advance because we set the price knowing that there is a discount. For example, if we wanted to ask for \$100, we could tell that the price is \$200 and because you are special, we can do a 50% discount. It is the way all scam websites operate. We set honest prices instead, so there is no need for fake discounts.

Do you do live tutoring?

No, it is simply not how we operate. How often do you meet a great programmer who is also a great speaker? Rarely. It is why we encourage our experts to write down explanations instead of having a live call. It is often enough to get you started - analyzing and running the solutions is a big part of learning.

What happens if I am not satisfied with the solution?

Another expert will review the task, and if your claim is reasonable - we refund the payment and often block the freelancer from our platform. Because we are so harsh with our experts - the ones working with us are very trustworthy to deliver high-quality assignment solutions on time.

## Popular Solved Assignments Like This

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On