Let us worry about your assignment instead!

We Helped With This R Language Programming Assignment: Have A Similar One?

SOLVED
CategoryProgramming
SubjectR | R Studio
DifficultyUndergraduate
StatusSolved
More InfoStatistics Assignment Experts
136111

Assignment Description

 

Homework #3 Scatterplots and Correlation

 

A sample of 24 faculty members at WMU was conducted and each professor was asked their age and annual income. The results of this study are listed below:

 

 

Person

 

Age (X)

Income (Y) (x $1000)

1

40

32

2

31

24

3

50

47

4

53

50

5

36

30

6

55

55

7

37

33

8

45

41

9

60

63

10

41

34

11

46

43

12

38

35

13

32

28

14

56

57

15

51

50

16

37

30

17

54

52

18

42

35

19

47

41

20

33

26

21

39

34

22

52

49

23

57

60

24

55

51

 

a.        This data is in the FacSalaries.csv file on the eLearning page for GEOG 5670 in the data section — copy this to your USB drive in a folder called “Correlations.” In RStudio navigate to this folder and make it the working directory. Load this into a dataframe called Salaries using the read.csv() function.

b.       The first step in any analysis of correlations is to generate a scatterplot. Install and load the ggplot2 package and use qplot() to generate a simple scatterplot. Specify the Age data for the x axis and the Salary data for the y. Set the geom= parameter equal to “point”. Specify appropriate labels for the x axis and y axis using the xlab= and ylab= arguments, and use main= to give the plot an appropriate title. In RStudio click on the Plot tab and Export button to save this to the clipboard. Paste this into a Word document.

c.        Compute the covariance between Age and Income using the cov() function. Copy the output from this and paste it below the scatterplot in the Word document. Answer the following questions in the Word document:

1.       What does the sign (+ or -) indicate?

2.       If the salaries were expressed as dollars instead of the current $ x 1000, how do you think the value of the covariance would change?


d.       Now use the cor.test() function to determine the correlation between the two variables. You can specify a fomula as: ~ Age + Income, data = Salaries and be sure to set the method to “pearson” Copy the output from this and paste it to the Word document beneath the covariance. Rerun this command with the methods set to “spearman” and “kendall”. Copy the output from these to your Word document. Answer the following questions:

1.       Does correlation have more or less utility in determining the strength of the linear relationship between Age and Income? Briefly explain why.

2.       If we had an additional ordinal variable that indicates the rank of each professor (Instructor, Assistant Professor, Associate Prof, Professor), could we use Pearson’s r to measure the strength of the relationship between Age and Income?

3.       How do the values of the three methods (Pearson, Spearman, and Kendall) compare?

 

 

 

 

1.       This question uses agricultural data from China provided by Dr. Veeck. Copy the ChinaAgCorr.csv file from the Data section of the GEOG 5670 elearning page and paste it in your Correlation folder. Import this to a data frame called ChinaAg.

The variables in this dataset are: MECHIDX-an index of the amount of mechanization in ag.; AGCHEM—the amount of ag. chemicals used in each district; ECOCROPS—the area of ecologically benign crops (such as nuts) under cultivation; DIVERSIT- an index of the biodiversity of the district; TOTFRMS—the total number of farms; AGAREA—the total area under cultivation; ARABLE—the total area of arable land; IRRIG—the total area of irrigated land.

2.       A quick way to generate a matrix of scatterplots for all combinations of these eight variable is the pairs() function. Run this on the ChinaAg dataframe and copy the resulting graph to your Word document. Using the combinations below the diagonal names on this graph, indicate which combinations are positively related, negatively related, or weakly related and list these below the graph in your Word document.

3.       This time, we’ll use the cor() function on the whole dataframe to get a correlation matrix. You can make a cleaner output for this matrix if you round the values of the correlation coefficient to 3 decimal places using the round() function. So nest the cov() function inside the round() function to accomplish this. Copy the output matrix and paste this in your Word document.

Print out your Word document and turn it in for credit on the homework ass

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

GEOG5670HW2Answers.doc

 

GEOG 5670 Spatial Analysis                                                 Name:  ______________________

 

Homework #3

Scatterplots and Correlation

 

 

Paste scatterplot of Age vs. Income data:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Paste covariance output here:

 

 

 

1.      What does the sign (+ or -) indicate?

 

 

2.      If  the salaries were expressed as dollars instead of the current $ x 1000, how do you think the value of the covariance would change?

 

 

 

Paste Pearson’s r output here:

 

 

 

 

 

 

 

3.      What is the value for Pearson’s r?  Briefly state what this means.

 

 

4.      Does this value indicate a significantly significant correlation?  What evidence can you cite from the output?

 

 

 

5.      As compared to the covariance value computed earlier, does correlation have more or less utility in determining the strength of the linear relationship between Age and Income?  Briefly explain why.

 

 

 

 

 

 

 

6. Paste ChinaAg scatterplot matrix here:

 

 

 

 

 

 

 

 

 

 

 

 

 

Pairs that are + correlated                Pairs that are - correlated       Pairs with weak correlation

 

 

 

 

 

 

 

 

 

 

7. Paste the correlation matrix here:

Assignment Description

 

CORRELATION

 


Preview

Bivariate Random Variables

Correlation Analysis

Pearson’s r

Spatial Autocorrelation

Regression Analysis

Linear Regression

Goodness of Fit

 

Read Section 5.7 through 5.7.5 and Chapter 15 of your book

Homework #5 is on the course web page—try to finish by March 12


 

 

 

 

 

             

 


Multivariate Techniques

Involve two or more variables


Covariance of Two Random Variables

 

𝑁


Simple (bivariate) correlation analysis is an investigation

of the strength of association between two variables

Simple regression analysis is a study of the nature of the relationship

Estimating the value of one variable given another


1

𝐶𝑜𝑣 𝑋, 𝑌 = 𝑁 − 1

 

•  C(X,Y) is the covariance


𝑋𝑖 𝑋

𝑖=1


𝑌𝑖  𝑌


Values of 0 indicate no relationship

If X increases result in increases in Y,  + values

•  If X increases result in decreases in Y,  - values

Problem with covariance is that it is in units of X and Y, so values are difficult to interpret

 

 

 

 

 


Sample Covariance

 

The best point estimate for C(X,Y) is

S     =      ån    ( X  - X )(Y  - Y )

XY       n -1         i                i i=1


SXY is positive if the two variables have a positive relationship and it is negative if they are negatively related

Sample covariance has the same disadvantage as

C(X,Y)

It is highly influenced by the units in which the two variables are measured


 

 

If two random variables are jointly normally distributed

The marginal distributions of both X and Y are univariate normal

Any conditional distribution of X or Y is also univariate normal

Five parameters to the bivariate normal density function

mx my sx sy rxy


rxy


= C( X ,Y )

o s


x y


Pearson’s Product Moment Correlation Coefficient

Population parameters mx my sx sy rxy are almost never


Sample Correlation Coefficient

Substitutes appropriate point estimators for C(X,Y) into


known

Must estimate rxy from sample data


In simplified form


rxy


= C( X ,Y )

o s

Pearson’s r is the best point estimate of rxy

Where all points plot on a positively sloped line, r = 1

Where all points plot on a negatively sloped line, r = -1

If r is near 0, the scatter of points is nearly circular

A scatter of points can have a strong nonlinear association but still have r near 0


x y

 

 

𝑟 =


 

 

 

 

 

 

 

          

 


Scatter Diagrams


Each observation pair (Xi,Yi) represents one dot


Positive Linear Association

 

 

 

 

 

r = 1 if all points lie on a positively sloped line

r = 0.88 in the second example


 

 

 

 

 

 

 

Negative Linear Association

 

 


 

Real Estate

Hair Height, in

Income ($ x 1000)

4.5

100

6

130

5.5

160

7

180

7.5

190

8

200

10

220

9

240

10.5

280

12

300

 

 
 

 

 

 

 

 

 

 


r = -1 if all points lie on a negatively sloped line


We want to see if there is a relationship between height of hair and income from real estate


Calculation of Pearson’s r

 

Xi

Yi

Xi^2

Yi^2

XiYi

4.5

100

20.25

10000

450

6.0

130

36.00

16900

780

5.5

160

30.25

25600

880

7.0

180

49.00

32400

1260

7.5

190

56.25

36100

1425

8.0

200

64.00

40000

1600

10.0

220

100.00

48400

2200

9.0

240

81.00

57600

2160

10.5

280

110.25

78400

2940

12.0

300

144.00

90000

3600

80

2000

691

435400

17295


Covariance and Correlation in R

Both in base package

cov() or cor()

  cov(x, y = NULL, use = c("everything", “all.obs”, “complete.obs”), method = c("pearson", "kendall", "spearman"))

  cor(x, y = NULL, use = c("everything", “all.obs”, “complete.obs”), method = c("pearson", "kendall", "spearman"))

cor.test()

  cor.test(x, y, alternative = c("two.sided", "less", "greater"), method =


r =             17295 - (80)(2000) /10

691- 802 /10 435400 - 20002 /10


= 0.96


c("pearson", "kendall", "spearman"), exact = NULL, conf.level = 0.95, continuity = FALSE, ...)

cor() can do all correlations in a dataframe while cor.test()


There is a strong positive correlation between hair height

and real estate sales--higher hair = more real estate sales


only does specified pairs of variables


 

 

 

 

 

             

 


Significance Testing

Assume two random variables are bivariate

normally distributed

H0: r = 0, HA: r ¹ 0, or r > 0 or r < 0

The sampling distribution of r is t-distributed with n - 2 degrees of freedom and an estimated standard error of:                       

1- r 2

sr   =    n - 2

The test statistic is:


Correlation Matrices

A summary of the correlation coefficients between all pairs of variables in a set

If for the Texas example we had data on

Hair size

Real Estate Income

Percentage of gold chrome on car

Tons of makeup applied annually

Numbers of packs of More cigarettes smoked per day

Monthly bill for Home Shopping Network


t = r =

Sr


r

(1- r 2 ) /(n - 2)


= r n - 2 1- r 2


 

 

 

 

 

 

Correlation Matrix for Texas                                                  Texas Example


 

 

Hair

Real Estate Gold Chrome Makeup

More Cigs HSN Bill


Hair Real Estate Gold Chrome Makeup More Cigs HSN

 

1.0

.96

.83

.91

-.53

.92

.96

1.0

.98

.80

.23

.91

.83

.98

1.0

.62

-.71

.94

.91

.80

.62

1.0

.07

.21

-.53

.23

-.71

.07

1.0

-.77

.92

.91

.94

.21

-.77

1.0



 

t = -2.306         0         2.306 10.2

 

> cor.test(~HairHgt + Income, data=TexasHair, method="pearson")


 

  You can also generate a matrix of scatterplots using the pairs() command in the R graphics package


Pearson's product-moment correlation data: HairHgt and Income

t = 10.223, df = 8, p-value = 7.198e-06

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval: 0.8499218 0.9916539

sample estimates:

cor

0.9637914


General Procedure for Correlations Using

R

To compute basic correlation coefficients there are three main functions that can be used:

cor(), cor.test() and rcorr().

 


 

 

 

 

 

 

 

 

 

Pearson Correlation Output

 

Exam           Anxiety Revise Exam    1.0000000 -0.4409934  0.3967207

Anxiety -0.4409934  1.0000000 -0.7092493

Revise   0.3967207 -0.7092493  1.0000000


Correlations using R

Pearson correlations:

  cor(examData, use = "complete.obs", method = "pearson")

  rcorr(examData, type = "pearson")

  cor.test(examData$Exam, examData$Anxiety, method = "pearson")

If we predicted a negative correlation:

  cor.test(examData$Exam, examData$Anxiety, alternative = "less"), method = "pearson")

 

 

 

 

 

 

 

 


 

Reporting the Results

Exam performance was significantly correlated with exam anxiety, r = -.44, and time spent revising, r = .40; the time spent revising was also correlated with exam anxiety, r =

-.71 (all ps < .001).


 

 

 

 

 

 

 

 

 

 

 

             

 


Things to Know about the Correlation

It varies between -1 and +1

  0 = no relationship

It is an effect size

  ±.1 = small effect

  ±.3 = medium effect

  ±.5 = large effect

Coefficient of determination, r2

  By squaring the value of r you get the proportion of variance in one variable shared by the other.


Correlation and Causality

The third-variable problem:

  In any correlation, causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results.

Direction of causality:

  Correlation coefficients say nothing about which variable causes the other to change.


Non-parametric Correlation

Spearman’s rho

  Pearson’s correlation on the ranked data

Kendall’s tau

  Better than Spearman’s for small samples

World’s Biggest Liar competition

  68 contestants

  Measures

Where they were placed in the competition (first, second, third, etc.)

Creativity questionnaire (maximum score 60)


Spearman’s Rho

 

cor(liarData$Position, liarData$Creativity, method =

"spearman")

The output of this command will be:

[1] -0.3732184

To get the significance value use rcorr() (NB: first convert the dataframe to a matrix):

liarMatrix<-as.matrix(liarData[, c("Position", "Creativity")]) rcorr(liarMatrix)

Or:

cor.test(liarData$Position, liarData$Creativity, alternative = "less", method = "spearman")


 

 

 

 

 

 

 

             


Spearman's Rho

Output

Spearman's rank correlation rho

data: liarData$Position and liarData$Creativity S = 71948.4, p-value = 0.0008602

alternative hypothesis: true rho is less than 0 sample estimates:

rho

-0.3732184


Kendall’s Tau (Non-parametric)

To carry out Kendall’s correlation on the World’s Biggest Liar data simply follow the same steps as for Pearson and Spearman correlations but use method = “kendall”:

cor(liarData$Position, liarData$Creativity, method = "kendall")

cor.test(liarData$Position, liarData$Creativity, alternative = "less", method = "kendall")


 

 

 

 

 

 

 

 

 

 

             

 


Kendall’s Tau (Non-parametric)

 

The output is much the same as for Spearman’s correlation.

Kendall's rank correlation tau

data: liarData$Position and liarData$Creativity z = -3.2252, p-value = 0.0006294

alternative hypothesis: true tau is less than 0

sample estimates:

tau

-0.3002413


Bootstrapping Correlations

If we stick with our World’s Biggest Liar data and want to bootstrap Kendall’s tau, then our function will be:

bootTau<-function(liarData,i) cor(liarData$Position[i], liarData$Creativity[i], use = "complete.obs", method = "kendall")

 

To bootstrap a Pearson or Spearman correlation you do it in exactly the same way except that you specify method = “pearson” or method = “spearman” when you define the function.


Bootstrapping Correlations Output

 

To create the bootstrap object, we execute:

library(boot)

boot_kendall<-boot(liarData, bootTau, 2000) boot_kendall

To get the 95% confidence interval for the

boot_kendall object:

boot.ci(boot_kendall)


Bootstrapping Correlations

To bootstrap a Pearson or Spearman correlation you do it in exactly the same way except that you specify method = “pearson” or method = “spearman” when you define the function.


 

 

 

 

 

 

 

 

 

             

 


Bootstrapping Correlations Output

The output below shows the contents of

boot_kendall:

 

ORDINARY NONPARAMETRIC BOOTSTRAP

 

Call:

boot(data = liarData, statistic = bootTau, R = 2000)

 

Bootstrap Statistics :

original bias std. error

t1* -0.3002413 0.001058191 0.097663


Bootstrapping Correlations Output

    The output below shows the contents of the boot.ci()

function:

 

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

Based on 2000 bootstrap replicates

 

CALL :

boot.ci(boot.out = boot_kendall)

 

Intervals :

Level      Normal               Basic

95% (-0.4927, -0.1099 ) (-0.4956, -0.1126 )

 

Level     Percentile             BCa

95% (-0.4879, -0.1049 ) (-0.4777, -0.0941 )


 

 

 

 

 

             


Partial and Semi-partial Correlations

 

Partial correlation:


 

Exam Performance

1


Measures the relationship between two variables, controlling for the effect that a third variable has on them both.


Variance Accounted for by Exam Anxiety (19.4%)


Exam Anxiety

 

 

Exam


Semi-partial correlation:

Measures the relationship between two variables controlling for the effect that a third variable has on only one of the others.


2

Variance Accounted for by Revision Time (15.7%)

 

 

 

 

3

 
Variance accounted for by both Exam Anxiety and Revision Time


Performance

 

 

 

 

 

 

 

 

 

 

 

Exam Performance


Revision Time

 

 

 

 

 

 

 

Unique variance accounted for by Revision Time

 

 

 

 

 

Revision Time


 


 

 

Unique variance accounted for by Exam Anxiety


Exam Anxiety


 

 

 

 

 

             

 

Doing Partial Correlation using R

The general form of pcor() is:

pcor(c("var1", "var2", "control1", "control2" etc.), var(dataframe))

We can then see the partial correlation and the value of

R2 in the console by executing:

pc

pc^2

 

Partial Correlation               Semi-Partial Correlation

 

 

 

 

 

 


 


Doing Partial Correlation using R

The general form of pcor.test() is:

pcor(pcor object, number of control variables, sample size)

Basically, you enter an object that you have created with pcor() (or you can put the pcor() command directly into the function):

pcor.test(pc, 1, 103)


Partial Correlation Output

> pc

[1] -0.2466658

 

> pc^2

[1] 0.06084403

> t(pc, 1, 103)

$tval

[1] -2.545307

 

$df

[1] 100

 

$pvalue

[1] 0.01244581


 

 

 

 

 

 

 

 

Frequently Asked Questions

Is it free to get my assignment evaluated?

Yes. No hidden fees. You pay for the solution only, and all the explanations about how to run it are included in the price. It takes up to 24 hours to get a quote from an expert. In some cases, we can help you faster if an expert is available, but you should always order in advance to avoid the risks. You can place a new order here.

How much does it cost?

The cost depends on many factors: how far away the deadline is, how hard/big the task is, if it is code only or a report, etc. We try to give rough estimates here, but it is just for orientation (in USD):

Regular homework$20 - $150
Advanced homework$100 - $300
Group project or a report$200 - $500
Mid-term or final project$200 - $800
Live exam help$100 - $300
Full thesis$1000 - $3000

How do I pay?

Credit card or PayPal. You don't need to create/have a Payal account in order to pay by a credit card. Paypal offers you "buyer's protection" in case of any issues.

Why do I need to pay in advance?

We have no way to request money after we send you the solution. PayPal works as a middleman, which protects you in case of any disputes, so you should feel safe paying using PayPal.

Do you do essays?

No, unless it is a data analysis essay or report. This is because essays are very personal and it is easy to see when they are written by another person. This is not the case with math and programming.

Why there are no discounts?

It is because we don't want to lie - in such services no discount can be set in advance because we set the price knowing that there is a discount. For example, if we wanted to ask for $100, we could tell that the price is $200 and because you are special, we can do a 50% discount. It is the way all scam websites operate. We set honest prices instead, so there is no need for fake discounts.

Do you do live tutoring?

No, it is simply not how we operate. How often do you meet a great programmer who is also a great speaker? Rarely. It is why we encourage our experts to write down explanations instead of having a live call. It is often enough to get you started - analyzing and running the solutions is a big part of learning.

What happens if I am not satisfied with the solution?

Another expert will review the task, and if your claim is reasonable - we refund the payment and often block the freelancer from our platform. Because we are so harsh with our experts - the ones working with us are very trustworthy to deliver high-quality assignment solutions on time.

Customer Feedback

"Thanks for explanations after the assignment was already completed... Emily is such a nice tutor! "

Order #13073

Find Us On

soc fb soc insta


Paypal supported