Friedman's Test
(Non-Parametric Repeated Measures Comparisons)
Definition: A non-parametric test (distribution-free) used to compare
observations repeated on the same subjects. This is also called a
non-parametric randomized black analysis of variance.
Assumptions: Unlike the parametric repeated measures ANOVA or paired
t-test, this non-parametric makes no assumptions about the distribution of the
data (e.g., normality).
Characteristics: This test is an alternative to the repeated measures
ANOVA, when the assumption of normality or equality of variance is not met.
This, like many non-parametric tests, uses the ranks of the data rather than
their raw values to calculate the statistic. Since this test does not make a
distribution assumption, it is not as powerful as the ANOVA. If there are only
two measures for this test, it is equivalent to the sign test. (See the Zar
reference for more information.)
Test: The hypotheses for the comparison across repeated measures are:
Ho: The distributions are the same across repeated measures.
Ha: The distributions across repeated measures are different
Notice that the hypothesis makes no assumptions about the distribution of the
populations. These hypotheses could also be expressed as comparing mean ranks
across measures.
The test statistic for the Friedman's test is a Chi-square with a-1 degrees of
freedom, where a is the number of repeated measures. When the p-value for this
test is small (usually <0.05) you have evidence to reject the null
hypothesis.
Example: Friedman's non-parametric repeated measures comparisons
Five people were given four different drugs (in random order) and with a
washout period. Reaction time to a test was measured. The data are in the file
DRUG.DBF. The output is:
Friedman's Test for Repeated Measures
Number of repeated measures= 4 Number of subjects = 5
1 )DRUG1 Rank sum = 13.0 Mean rank = 2.6
2 )DRUG2 Rank sum = 12.0 Mean rank = 2.4
3 )DRUG3 Rank sum = 5.0 Mean rank = 1.0
4 )DRUG4 Rank sum = 20.0 Mean rank = 4.0
Ho:There is no difference in mean ranks for repeated measures.
Ha:A difference exists in the mean ranks for repeated measures.
Friedman's Chi-Square = 14.13 with d.f. = 3 p < 0.001
Kendall's coefficient of concordance = 0.942
When the p-value is low, there is evidence to reject Ho,
and conclude that there is a difference between mean ranks.
Error term used for comparisons = 2.89
Critical q
Tukey Multiple Comp. Difference Q (.05)
----------------------------------------------------------------
Rank(4)-Rank(3) = 15.0 5.196 3.63 *
Rank(4)-Rank(2) = 8.0 2.771 3.63
Rank(4)-Rank(1) = 7.0 (Do not test)
Rank(1)-Rank(3) = 8.0 2.771 3.63
Rank(1)-Rank(2) = 1.0 (Do not test)
Rank(2)-Rank(3) = 7.0 (Do not test)
Homogeneous Populations, repeated measures ranked
Gp 1 refers to DRUG1
Gp 2 refers to DRUG2
Gp 3 refers to DRUG3
Gp 4 refers to DRUG4
Gp Gp Gp Gp
3 2 1 4
--------
--------
This is a graphical representation of the Tukey multiple comparisons test.
At the 0.05 significance level, the ranks of any two groups underscored by
the same line are not significantly different.
This analysis indicates that there is a difference in reaction times across
drugs with p < 0.01. In this case, the multiple comparisons indicate (at the
0.05 significance level) that the reaction time for drug 3 was shorter than
from drug 4. However, it was not shown to be significantly shorter than the time
for drugs 2 or 1. Note that the p-value for the Chi-square test is reported
even though the sample size is small. In this case, the tabled value agrees
with the Chi-square value. However, the p-value becomes less accurate for small
values as indicated in the note above.
Exercise: Friedman's non-parametric repeated measures comparison
Six welders with different expertise were asked to weld two pipes together
using 5 different welding torches. Torches were used in random order for each
welder. Finished pipes were measured on a variety of quality factors, and rated
from 1 to 10, where 10 represents a perfect weld. The data are:
Welder |
Torch1 |
Torch2 |
Torch3 |
Torch4 |
Torch5 |
1 |
3.9 |
4.1 |
4.2 |
4.1 |
3.3 |
2 |
9.4 |
9.5 |
9.4 |
9.0 |
8.6 |
3 |
9.7 |
9.3 |
9.3 |
9.2 |
8.4 |
4 |
8.3 |
8.0 |
7.9 |
8.6 |
7.4 |
5 |
9.8 |
8.9 |
9.0 |
9.0 |
8.3 |
6 |
9.9 |
10.0 |
9.7 |
9.6 |
9.1 |
1. Perform a Friedman's test on this data. From the results, can you select a
torch that you believe is the "best" of the 5 when measured on this test?
2. Perform the test using parametric methods, and compare the difference.