Who Uses WINKS? Am J Pathology, J Am Anim Hosp Assoc, J Oral Rehab, J. Biol. Chem, Acta chir belg. . . more citations. . .
Pearson's Correlation Coefficient
This
is one in a series of tutorials using examples from WINKS SDA.
Definition: Measures
the strength of the linear relationship between two variables.
Assumptions: Both
variables (often called X and Y) are interval/ratio and approximately
normally distributed, and their joint distribution is bivariate normal.
Characteristics:
Pearson's Correlation Coefficient is usually signified by r (rho),
and can take on the values from -1.0 to 1.0. Where -1.0 is a perfect
negative (inverse) correlation, 0.0 is no correlation, and 1.0 is a perfect
positive correlation.
Related statistics:
R2 (called the coefficient of determination or r squared) can be
interpreted as the proportion of variance in Y that is contained in X.
Tests: The
statistical significance of r is tested using a t-test. The
hypotheses for this test are:
H0: rho = 0
Ha: rho <> 0
A low p-value for this test
(less than 0.05 for example) means that there is evidence to reject the null
hypothesis in favor of the alternative hypothesis, or that there is a
statistically significant relationship between the two variables.
Note: This test is
equivalent to the test of no slope in the simple linear regression
procedure.
Location in
WINKS:
Pearson's correlation coefficient is found in the following locations:
1. Regression and
Correlation - The Correlation procedure produces both Pearson and Spearman
Correlation coefficients. The t-test for statistical significance of r is
calculated. R2 is also reported.
2. Regression and
Correlation - The Simple linear regression reports the Pearson correlation
coefficient and the t-test. R2 is also reported.
3. Regression and
Correlation - The Correlation Matrix procedure produces a matrix of
correlations for a number of pairs of variables at a time, and includes the
p-value for the test or significance of r.
Graphs: An important
part of interpreting r is to observe a scatterplot of the data. Scatterplots
are available from the Graphs option, as a part of Simple Linear Regression
and in the Graphical Correlation Matrix option in Regression and
Correlation.
Example: Use the
Correlation procedure to calculate r for the two variables HP
(horsepower) and WEIGHT in the WINKS "CAR" database. The results from WINKS (in part) are:
Variables used
: HP and WEIGHT Number of cases used: 38 Pearson's r (Correlations
Coefficient) = 0.9172 R-Square = 0.8413 Test of hypothesis to determine
significance of relationship:
H(null): Slope
= 0 or H(null): r = 0 (Pearson's)
t = 13.81425
with 36 d.f. p < 0.001
(A low p-value
implies that the slope does not = 0.) Spearman's Rank Correlation
Coefficient = 0.9071 (Spearman's) t = 12.93361 with 36 d.f. p < 0.001.
A scatterplot of this data shows the positive correlation -- cars with
higher horsepower tend to weigh more:

An example of writing up these results:
Narrative: "An evaluation was made of the linear relationship
between horsepower and vehicle weight using Pearson's correlation."
Results:
"An analysis using Pearson's correlation coefficient indicates a
statistically significant linear relationship between horsepower and vehicle
weight r(36)=0.92, p<0.001. For these data, the mean (SD) for horsepower is
101.7(26.4) and for weight 2.86 (0.71)."
Warning: There is a
temptation to infer cause and effect when observing a correlation. However,
the ability to assign causality depends on the creation of an experiment
specifically designed to provide this kind of inference.
Related topics:
Spearman's Correlation Coefficient is the non-parametric counterpart to r.
See also simple linear regression, multiple regression, and polynomial
regression.
Exercise - Correlation
At the beginning of an
introductory engineering course, 10 students were given a pre-test to
determine their initial mathematical ability. The following table lists the
student's pre-test score and final grade in the class:
Student Number |
Pre-Test |
Course Grade |
1
2
3
4
5
6
7
8
9
10
|
45
23
50
46
33
21
13
30
34
50 |
92
86
97
95
87
76
72
84
85
98 |
1. Calculate Pearson's
Correlation Coefficient (r) on this data.
r =
2. What statistical test is
used to determine if this value of r is statistically significant?
3. Is the correlation seen
in this data statistically significant. Why?
4. Display a scatterplot of
the data. Does the data appear linearly correlated. Do there seem to be any
outlier values?
5. Suppose an 11th student
were added to the data, with a pre-test score of 40 and a Course Grade of
70. How would this effect r?
Yes! You've Found it!
WINKS SDA Data Analytics Software
Affordable Software for Predictive Analytics
in Health, Science, Business, and Government
Free Shipping for a limited time...
Best value for a statistics software program, starting at $75 for download version. (Less for student editions.)
WINKS
Statistical Software
Reliable. Relevant. Affordable.
www.texasoft.com
-
Statistical Software
designed for your needs -- Medical, Clinical Trials, Dissertation, Thesis,
Business, Marketing, Agriculture, Forestry, Science and Research.
-
WINKS is an
economical, reliable and simple to use statistical analysis tool designed
to help students and researchers get the statistical answers they need
quickly and without hassle.
-
It contains a wide
range of statistical tests including many handy features not found in
programs such as SPSS or SAS -- for example, easy analysis from summary
data (as well as from raw data), nonparametric multiple comparisons,
APA standard analysis write-up suggestions and more.
-
Newest additions to
WINKS (ver 6.05) includes Grubbs Test for outliers and Tukey Test
for outliers, Kwikstat Data Generator evaluation edition,
multiple comparisons in 2xc and rx2 crosstabulation (Chi-Square)
analysis) & Permutations and combinations calculator
-
Statistical test such as
t-tests (paired t test, independent group t test/unpaired t test),
ANOVA, regression, correlation, repeated measures,
logistic regression, times series analysis, chi-square,
Bland-Altman, Kruskal-Wallis, Mann-Whitney and much
more.
-
Friendly interface with
easy Excel-like data handling & graphs (reads and writes Excel files.)
-
FAST
delivery to you when you order now
-
Print a brochure (pdf
format)
-
Recent updates include
- Added ability to calculate new column of z-scores for numeric
variables.
- Increased maximum groups to 40 for one-way ANOVA and one-way
repeated measures (40 repeats).
- Added editor option to create indicator variables from categorical
variables.
- Added Outcomes and Probabilities module that calculates combinations
and permutations.
- Added Grubbs & Tukey tests for outliers.
- Added Multiple Comparisons of Proportions to Crosstabulations
procedures for 2xc or rx2 tables
- Updates are free to current users.