Spearman's Rank-Difference Coefficient of Correlation
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
************ spearman.doc ************ Background: Consider the degree of association between the Grade Point Average of an individual student and the family income for this student: -- A student who comes from a wealthy family may not have to work after school, but can instead use this free time to study and do extra homework problems. -- A student who comes from a poor family probably does not have this free time, but instead needs to work after school and on weekends to help the family with daily expenses. -- A student who comes from a wealthy family quite possibly has their own computer and is able to prepare programming homework problems at any convenient time. -- A student who comes from a poor family probably does not have a computer at home, but instead has to schedule "at computer" time during open laboratory hours, which may quite possibly limit time-on-task for attempts at creative programming projects. As such, it is not at all surprising that there is historically a strong positive correlation between Grade Point Average and family income: -- Overally, for groups of students, Grade Point Average increases as family income increases. -- Correspondingly, for groups of students, Grade Pont Average decreases as family income decreases. The key points here are that: -- Family income does not "cause" the Grade Point Average. Instead, there is merely an association between the two phenonema. -- This degree of association is for a collective "group" of subjects. Any one individual could have results that are different than group observations. With the importance of relationships between data in mind, it is often helpful to determine if there is a positive or negative relationship (i.e., association) between various phenomena. Consider the following scenario: 1. There is a high "negative" correlation between miles of paved roads and infant mortality: -- Let X equal miles of paved roads -- Let Y equal incidence of infant mortality 2. As X increases Y decreases. That is to say, as the miles of paved roads in an area increases, the rate of infant mortality decreases. Why? Well, that is a totally different concern. Do not assume that cause and effect is in place here. Sure, a few mothers may get to the hospital faster (and therefore have children who survive birth), but that is hardly the reason for the association. Other factors such as economic wealth, societal issues, and the general infrastructure are also prime concerns. Spearman's Rank-Difference Coefficient of Correlation is viewed as the nonparametric test for determining if there is an association between phenomena. Please recall that the negative (- or decrease) and positive (+ or increase) signs in correlation are only used to suggest direction. The negative sign does not mean "bad" and the positive sign does not mean "good". Y |* | * | * | * As X increases Y decreases | * ----------- Negative Correlation X Y | * | * | * | * As X increases Y increases |* ----------- Positive Correlation X Scenario: This study examines any possible correlation between student performance on a mathematics mastery test and later performance in a final examination in a C++ programming course. The mathematics mastery test was prepared by faculty at Warren County Community College. Because the teacher conducting this analysis knows that this test has not been subjected to any estimates of reliability and validity, she thinks that it is best to view these test scores as ordinal data (data are ordered, but not with the precision of interval data). Accordingly, she will use the nonparametric Spearman's test, instead of Pearson's product moment correlation coefficient test which requires the use of interval data, to test the association between the two phenonema: -- Mathematics mastery test score -- C++ final examination score Data associated with this investigation are presented in Table 1. Table 1 Mathematics Mastery Test Scores and C++ Final Examination Test Scores at Warren County Community College ==================================================== Test Score ================================ Student Number Mathematics Test C++ Exam ---------------------------------------------------- 01 089 082 02 084 078 03 083 092 04 093 085 05 076 091 06 086 075 07 092 091 08 090 086 09 085 082 10 082 087 11 085 093 12 069 072 13 088 069 14 073 080 15 064 076 16 076 084 17 081 063 18 068 082 19 078 073 20 095 092 ---------------------------------------------------- Ho: Null Hypothesis: There is no association between test scores on a mathematics mastery test and later final examination test scores in a C++ programming course final among students at Warren County Community College (p = ,05). Files: 1. spearman.doc 2. spearman.dat 3. spearman.r01 4. spearman.o01 5. spearman.con 6. spearman.lis Command: At the Unix prompt (%), key: %spss -m < spearman.r01 > spearman.o01 ************ spearman.dat ************ 01 089 082 02 084 078 03 083 092 04 093 085 05 076 091 06 086 075 07 092 091 08 090 086 09 085 082 10 082 087 11 085 093 12 069 072 13 088 069 14 073 080 15 064 076 16 076 084 17 081 063 18 068 082 19 078 073 20 095 092 ************ spearman.r01 ************ SET WIDTH = 80 SET LENGTH = NONE SET CASE = UPLOW SET HEADER = NO TITLE = Spearman Rank Correlation Coefficient COMMENT = This file examines any possible correlation between student performance on a mathematics mastery test and later performance in a final examination in a C++ programming course. The mathematics mastery test was prepared by faculty at Warren County Community College. Because the teacher conducting this analysis knows that this test has not been subjected to any estimates of reliability and validity, she thinks that it is best to view these test scores as ordinal data (data are ordered, but not with the precision of interval data). Accordingly, she will use the nonparametric Spearman's test, instead of Pearson's product moment correlation coefficient test, which requires the use of interval data. DATA LIST FILE = 'spearman.dat' FIXED / Stu_Code 20-21 Math_Tst 40-42 Cpp_Test 55-57 Variable Lables Stu_Code "Student Code" / Math_Tst "Score - Mathematics Mastery Test" / Cpp_Test "Score - C++ Test" NONPAR CORR VARIABLES = Math_Tst WITH Cpp_Test / PRINT = SPEARMAN ************ spearman.o01 ************ 2 SET WIDTH = 80 3 SET LENGTH = NONE 4 SET CASE = UPLOW 5 SET HEADER = NO 6 TITLE = Spearman Rank Correlation Coefficient 7 COMMENT = This file examines any possible correlation 8 between student performance on a mathematics 9 mastery test and later performance in a 10 final examination in a C++ programming 11 course. 12 13 The mathematics mastery test was prepared by 14 faculty at Warren County Community College. 15 Because the teacher conducting this analysis 16 knows that this test has not been subjected to 17 any estimates of reliability and validity, she 18 thinks that it is best to view these test scores 19 as ordinal data (data are ordered, but not with 20 the precision of interval data). 21 22 Accordingly, she will use the nonparametric 23 Spearman's test, instead of Pearson's product moment 24 correlation coefficient test, which requires the use 25 of interval data. 26 DATA LIST FILE = 'spearman.dat' FIXED 27 / Stu_Code 20-21 28 Math_Tst 40-42 29 Cpp_Test 55-57 30 This command will read 1 records from spearman.dat Variable Rec Start End Format STU_CODE 1 20 21 F2.0 MATH_TST 1 40 42 F3.0 CPP_TEST 1 55 57 F3.0 31 Variable Lables 32 Stu_Code "Student Code" 33 / Math_Tst "Score - Mathematics Mastery Test" 34 / Cpp_Test "Score - C++ Test" 35 36 NONPAR CORR VARIABLES = Math_Tst WITH Cpp_Test 37 / PRINT = SPEARMAN *** Workspace allows for 26214 cases for nonparametric correlation problem *** - - - S P E A R M A N C O R R E L A T I O N C O E F F I C I E N T S - - - CPP_TEST MATH_TST .3601 N( 20) Sig .119 ************ spearman.con ************ Outcome: Computed r = + .3601 or + .360 Criterion r (alpha = .05, n = 25) = + or - .377 Computed r (+ .360) < Criterion r (+ .377) Therefore, there is no association between the test score on a mathematics mastery test and the later final examination test score in a C++ programming class at Warren County Community College (p = .05). The p value is another way to view differences in the three graded activities: -- The calculated p value is .119. -- The delcared p value is .05. The calculated p value exceeds the declared p value and there is, accordingly, no association between the two measures: mastery test scores and final examination scores in a C++ programming course. Note. As a general "rule of thumb," correlation is often viewed along the following continuum: + .00 to + .30 = no positive correlation between X and Y - .00 to - .30 = no negative correlation between X and Y + .40 to + .70 = mild positive correlation between X and Y - .40 to - .70 = mild negative correlation between X and Y + .80 to + .99 = strong positive correlation between X and Y - .80 to - .99 = strong negative correlation between X and Y At the most, a correlation coefficient can only reach -1.0 or + 1.0. It may also be helpful to consider the sample size when considering the efficacy or the "practical" significance of a correlation design: 1. Correlation studies are very sensitive to n. That is to say, a correlation of - .425 is not significant (p = .05) when n = 15. But, a correlation of - .425 is significant when n = 17. Refer to a table on critical values of Spearman's Rank Correlation Coefficient for criterion values. 2. Do not automatically think, however, that increasing n will give you greater validity in developing conclusions. A trivial study, regardless of the magnitude of n, is still a trivial study. 3. Finally, do not allow "cause and effect" to creep into your decisions. Use caution when making inferences about correlation. Correlation does not imply cause and effect. ************ spearman.lis ************ % minitab MTB > outfile 'spearman.lis' Collecting Minitab session in file: spearman.lis MTB > # MINITAB addendum to spearman.dat MTB > read 'spearman.dat' c1 c2 c3 Entering data from file: spearman.dat 20 rows read. MTB > name c1 'Stu_Code' c2 'Math_Tst' c3 'Cpp_Test' MTB > print c1 c2 c3 ROW Stu_Code Math_Tst Cpp_Test 1 1 89 82 2 2 84 78 3 3 83 92 4 4 93 85 5 5 76 91 6 6 86 75 7 7 92 91 8 8 90 86 9 9 85 82 10 10 82 87 11 11 85 93 12 12 69 72 13 13 88 69 14 14 73 80 15 15 64 76 16 16 76 84 17 17 81 63 18 18 68 82 Continue? y 19 19 78 73 20 20 95 92 MTB > # It is always helpful to plot data, when appropriate, MTB > # to gain a sense of the association between variables. MTB > MTB > plot c3 c2 - * - * * * * 90+ - * Cpp_Test- * * - * - * * * 80+ * - * - * * - * - * 70+ * - - - * - ------+---------+---------+---------+---------+---------+Math_Tst 66.0 72.0 78.0 84.0 90.0 96.0 MTB > # And you can see the general way data appear when plotted. MTB > MTB > # With the Spearman test, you must first rank data before MTB > # you can use the correlation command. MTB > MTB > rank the data in c2 and put into c6 MTB > rank the data in c3 and put into c7 MTB > MTB > print c1-c7 ROW Stu_Code Math_Tst Cpp_Test C6 C7 1 1 89 82 16.0 10.0 2 2 84 78 11.0 7.0 3 3 83 92 10.0 18.5 4 4 93 85 19.0 13.0 5 5 76 91 5.5 16.5 6 6 86 75 14.0 5.0 7 7 92 91 18.0 16.5 8 8 90 86 17.0 14.0 9 9 85 82 12.5 10.0 10 10 82 87 9.0 15.0 11 11 85 93 12.5 20.0 12 12 69 72 3.0 3.0 13 13 88 69 15.0 2.0 14 14 73 80 4.0 8.0 15 15 64 76 1.0 6.0 16 16 76 84 5.5 12.0 17 17 81 63 8.0 1.0 18 18 68 82 2.0 10.0 Continue? y 19 19 78 73 7.0 4.0 20 20 95 92 20.0 18.5 * NOTE * One or more variables are undefined. MTB > # Be sure to notice that rank was used, not the sort MTB > # command. MTB > MTB > # Now that the data are in rank order, it is a simple MTB > # task to use the correlation command. Be sure to MTB > # remember that the data are ranked, so the correlation MTB > # output will be the nonparametric Spearman and not MTB > # the parametric Pearson coefficient of correlation. MTB > MTB > correlation of the data in c6 and c7 Correlation of C6 and C7 = 0.360 MTB > stop -------------------------- Disclaimer: All care was used to prepare the information in this tutorial. Even so, the author does not and cannot guarantee the accuracy of this information. The author disclaims any and all injury that may come about from the use of this tutorial. As always, students and all others should check with their advisor(s) and/or other appropriate professionals for any and all assistance on research design, analysis, selected levels of significance, and interpretation of output file(s). The author is entitled to exclusive distribution of this tutorial. Readers have permission to print this tutorial for individual use, provided that the copyright statement appears and that there is no redistribution of this tutorial without permission. Prepared 980316 Revised 980914 end-of-file 'spearman.ssi'