Spearman's Rank-Difference Coefficient of Correlation

© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved


************
spearman.doc
************
Background:  Consider the degree of association between the
             Grade Point Average of an individual student and
             the family income for this student:

             -- A student who comes from a wealthy family
                may not have to work after school, but can
                instead use this free time to study and 
                do extra homework problems.

             -- A student who comes from a poor family 
                probably does not have this free time, but
                instead needs to work after school and on
                weekends to help the family with daily
                expenses.

             -- A student who comes from a wealthy family
                quite possibly has their own computer and
                is able to prepare programming homework problems
                at any convenient time.

             -- A student who comes from a poor family 
                probably does not have a computer at home,
                but instead has to schedule "at computer"
                time during open laboratory hours, which
                may quite possibly limit time-on-task for
                attempts at creative programming projects.

             As such, it is not at all surprising that there
             is historically a strong positive correlation
             between Grade Point Average and family income:

             -- Overally, for groups of students, Grade
                Point Average increases as family income
                increases.

             -- Correspondingly, for groups of students,
                Grade Pont Average decreases as family 
                income decreases.

             The key points here are that:

             -- Family income does not "cause" the Grade
                Point Average.  Instead, there is merely
                an association between the two phenonema.

             -- This degree of association is for a 
                collective "group" of subjects.  Any one
                individual could have results that are 
                different than group observations.
                
             With the importance of relationships between 
             data in mind, it is often helpful to determine
             if there is a positive or negative relationship
             (i.e., association) between various phenomena.  

             Consider the following scenario:      

             1.  There is a high "negative" correlation between 
                 miles of paved roads and infant mortality:

                 -- Let X equal miles of paved roads

                 -- Let Y equal incidence of infant mortality

             2.  As X increases Y decreases.  That is to
                 say, as the miles of paved roads in an area
                 increases, the rate of infant mortality 
                 decreases.

             Why?  Well, that is a totally different concern.  Do 
             not assume that cause and effect is in place here.  
             Sure, a few mothers may get to the hospital faster 
             (and therefore have children who survive birth), but 
             that is hardly the reason for the association.  
             Other factors such as economic wealth, societal 
             issues, and the general infrastructure are also prime 
             concerns. 

             Spearman's Rank-Difference Coefficient of Correlation 
             is viewed as the nonparametric test for determining 
             if there is an association between phenomena.  

             Please recall that the negative (- or decrease) and 
             positive (+ or increase) signs in correlation are 
             only used to suggest direction.  The negative sign 
             does not mean "bad" and the positive sign does not 
             mean "good".


             Y |*
               |  *
               |    *
               |      *        As X increases Y decreases 
               |        *
               -----------     Negative Correlation
                         X



             Y |        *
               |      *
               |    *
               |  *            As X increases Y increases
               |* 
               -----------     Positive Correlation
                         X


Scenario:    This study examines any possible correlation between 
             student performance on a mathematics mastery test and
             later performance in a final examination in a C++ 
             programming course.

             The mathematics mastery test was prepared by faculty 
             at Warren County Community College.  Because the 
             teacher conducting this analysis knows that this test 
             has not been subjected to any estimates of reliability 
             and validity, she thinks that it is best to view these 
             test scores as ordinal data (data are ordered, but not 
             with the precision of interval data).  

             Accordingly, she will use the nonparametric Spearman's 
             test, instead of Pearson's product moment correlation 
             coefficient test which requires the use of interval 
             data, to test the association between the two 
             phenonema:

             -- Mathematics mastery test score

             -- C++ final examination score

             Data associated with this investigation are presented 
             in Table 1. 


             Table 1

             Mathematics Mastery Test Scores and C++ Final 
             Examination Test Scores at Warren County
             Community College
             ====================================================   

                                           Test Score
                                 ================================
             Student Number      Mathematics Test   C++ Exam  
             ----------------------------------------------------    

                   01                  089            082
                   02                  084            078
                   03                  083            092
                   04                  093            085
                   05                  076            091
                   06                  086            075
                   07                  092            091
                   08                  090            086
                   09                  085            082
                   10                  082            087
                   11                  085            093
                   12                  069            072
                   13                  088            069
                   14                  073            080
                   15                  064            076
                   16                  076            084
                   17                  081            063
                   18                  068            082
                   19                  078            073
                   20                  095            092
             ----------------------------------------------------    


Ho:          Null Hypothesis:  There is no association between 
             test scores on a mathematics mastery test and later
             final examination test scores in a C++ programming 
             course final among students at Warren County 
             Community College (p = ,05).


Files:       1.  spearman.doc

             2.  spearman.dat

             3.  spearman.r01

             4.  spearman.o01

             5.  spearman.con

             6.  spearman.lis


Command:     At the Unix prompt (%), key:

             %spss -m < spearman.r01 > spearman.o01


************
spearman.dat
************
                   01                  089            082
                   02                  084            078
                   03                  083            092
                   04                  093            085
                   05                  076            091
                   06                  086            075
                   07                  092            091
                   08                  090            086
                   09                  085            082
                   10                  082            087
                   11                  085            093
                   12                  069            072
                   13                  088            069
                   14                  073            080
                   15                  064            076
                   16                  076            084
                   17                  081            063
                   18                  068            082
                   19                  078            073
                   20                  095            092


************
spearman.r01
************
SET WIDTH      = 80
SET LENGTH     = NONE
SET CASE       = UPLOW
SET HEADER     = NO
TITLE          = Spearman Rank Correlation Coefficient 
COMMENT        = This file examines any possible correlation 
                 between student performance on a mathematics
                 mastery test and later performance in a 
                 final examination in a C++ programming 
                 course.

                 The mathematics mastery test was prepared by
                 faculty at Warren County Community College.
                 Because the teacher conducting this analysis
                 knows that this test has not been subjected to
                 any estimates of reliability and validity, she
                 thinks that it is best to view these test scores 
                 as ordinal data (data are ordered, but not with 
                 the precision of interval data).  

                 Accordingly, she will use the nonparametric
                 Spearman's test, instead of Pearson's product moment
                 correlation coefficient test, which requires the use 
                 of interval data.
DATA LIST FILE = 'spearman.dat' FIXED
     / Stu_Code    20-21
       Math_Tst    40-42 
       Cpp_Test    55-57 

Variable Lables
       Stu_Code   "Student Code"
     / Math_Tst   "Score - Mathematics Mastery Test"
     / Cpp_Test   "Score - C++ Test"  

NONPAR CORR VARIABLES = Math_Tst WITH Cpp_Test
     / PRINT = SPEARMAN


************
spearman.o01
************
   2  SET WIDTH      = 80
   3  SET LENGTH     = NONE
   4  SET CASE       = UPLOW
   5  SET HEADER     = NO
   6  TITLE          = Spearman Rank Correlation Coefficient
   7  COMMENT        = This file examines any possible correlation
   8                   between student performance on a mathematics
   9                   mastery test and later performance in a
  10                   final examination in a C++ programming
  11                   course.
  12
  13                   The mathematics mastery test was prepared by
  14                   faculty at Warren County Community College.
  15                   Because the teacher conducting this analysis
  16                   knows that this test has not been subjected to
  17                   any estimates of reliability and validity, she
  18                   thinks that it is best to view these test scores
  19                   as ordinal data (data are ordered, but not with
  20                   the precision of interval data).
  21
  22                   Accordingly, she will use the nonparametric
  23                   Spearman's test, instead of Pearson's product moment
  24                   correlation coefficient test, which requires the use
  25                   of interval data.
  26  DATA LIST FILE = 'spearman.dat' FIXED
  27       / Stu_Code    20-21
  28         Math_Tst    40-42
  29         Cpp_Test    55-57
  30

This command will read 1 records from spearman.dat

Variable   Rec   Start     End         Format

STU_CODE     1      20      21         F2.0
MATH_TST     1      40      42         F3.0
CPP_TEST     1      55      57         F3.0

  31  Variable Lables
  32         Stu_Code   "Student Code"
  33       / Math_Tst   "Score - Mathematics Mastery Test"
  34       / Cpp_Test   "Score - C++ Test"
  35
  36  NONPAR CORR VARIABLES = Math_Tst WITH Cpp_Test
  37       / PRINT = SPEARMAN

 *** Workspace allows for 26214 cases for nonparametric correlation problem ***



- - -  S P E A R M A N   C O R R E L A T I O N   C O E F F I C I E N T S  - - -

             CPP_TEST

MATH_TST        .3601
             N(   20)
             Sig .119


************
spearman.con
************

Outcome:     Computed r = + .3601 or + .360

             Criterion r (alpha = .05, n = 25) = + or - .377

             Computed r (+ .360) < Criterion r (+ .377)

             Therefore, there is no association between the test  
             score on a mathematics mastery test and the later 
             final examination test score in a C++ programming
             class at Warren County Community College (p = .05).

             The p value is another way to view differences in
             the three graded activities:

             -- The calculated p value is .119. 

             -- The delcared p value is .05.

             The calculated p value exceeds the declared p value 
             and there is, accordingly, no association between
             the two measures:  mastery test scores and final 
             examination scores in a C++ programming course.

Note.        As a general "rule of thumb," correlation is 
             often viewed along the following continuum:

             + .00 to + .30 = no positive correlation 
                              between X and Y

             - .00 to - .30 = no negative correlation 
                              between X and Y

             + .40 to + .70 = mild positive correlation 
                              between X and Y

             - .40 to - .70 = mild negative correlation 
                              between X and Y

             + .80 to + .99 = strong positive correlation 
                              between X and Y

             - .80 to - .99 = strong negative correlation 
                              between X and Y

             At the most, a correlation coefficient can only
             reach -1.0 or + 1.0.

             It may also be helpful to consider the sample 
             size when considering the efficacy or the "practical"
             significance of a correlation design:

             1.  Correlation studies are very sensitive to n.  That
                 is to say, a correlation of - .425 is not significant
                 (p = .05) when n = 15.  But, a correlation of 
                 - .425 is significant when n = 17.  

                 Refer to a table on critical values of Spearman's 
                 Rank Correlation Coefficient for criterion values.  

             2.  Do not automatically think, however, that 
                 increasing n will give you greater validity in 
                 developing conclusions.  A trivial study, 
                 regardless of the magnitude of n, is still a 
                 trivial study.

             3.  Finally, do not allow "cause and effect" to 
                 creep into your decisions.  Use caution when
                 making inferences about correlation.
                 Correlation does not imply cause and effect. 


************
spearman.lis
************
% minitab
 
 MTB > outfile 'spearman.lis'
 Collecting Minitab session in file: spearman.lis
 MTB > # MINITAB addendum to spearman.dat
 MTB > read 'spearman.dat' c1 c2 c3
 Entering data from file: spearman.dat
      20 rows read.
 MTB > name c1 'Stu_Code' c2 'Math_Tst' c3 'Cpp_Test'
 MTB > print c1 c2 c3
 
 
  ROW  Stu_Code  Math_Tst  Cpp_Test
 
    1         1        89        82
    2         2        84        78
    3         3        83        92
    4         4        93        85
    5         5        76        91
    6         6        86        75
    7         7        92        91
    8         8        90        86
    9         9        85        82
   10        10        82        87
   11        11        85        93
   12        12        69        72
   13        13        88        69
   14        14        73        80
   15        15        64        76
   16        16        76        84
   17        17        81        63
   18        18        68        82
 Continue? y
   19        19        78        73
   20        20        95        92
 
 MTB > # It is always helpful to plot data, when appropriate,
 MTB > # to gain a sense of the association between variables.
 MTB > 
 MTB > plot c3 c2
 
          -                                       *
          -                        *          *              *    *
        90+
          -                                  *
  Cpp_Test-                                               *    *
          -                        *
          -          *                            *     *
        80+                   *
          -                                     *
          -    *                                   *
          -                           *
          -            *
        70+                                            *
          -
          -
          -                                *
          -
            ------+---------+---------+---------+---------+---------+Math_Tst
               66.0      72.0      78.0      84.0      90.0      96.0
 
 MTB > # And you can see the general way data appear when plotted.
 MTB > 
 MTB > # With the Spearman test, you must first rank data before
 MTB > # you can use the correlation command.
 MTB > 
 MTB > rank the data in c2 and put into c6
 MTB > rank the data in c3 and put into c7
 MTB > 
 MTB > print c1-c7
 
 
  ROW  Stu_Code  Math_Tst  Cpp_Test     C6     C7
 
    1         1        89        82   16.0   10.0
    2         2        84        78   11.0    7.0
    3         3        83        92   10.0   18.5
    4         4        93        85   19.0   13.0
    5         5        76        91    5.5   16.5
    6         6        86        75   14.0    5.0
    7         7        92        91   18.0   16.5
    8         8        90        86   17.0   14.0
    9         9        85        82   12.5   10.0
   10        10        82        87    9.0   15.0
   11        11        85        93   12.5   20.0
   12        12        69        72    3.0    3.0
   13        13        88        69   15.0    2.0
   14        14        73        80    4.0    8.0
   15        15        64        76    1.0    6.0
   16        16        76        84    5.5   12.0
   17        17        81        63    8.0    1.0
   18        18        68        82    2.0   10.0
 Continue? y
   19        19        78        73    7.0    4.0
   20        20        95        92   20.0   18.5
 
 * NOTE  * One or more variables are undefined.
 
 MTB > # Be sure to notice that rank was used, not the sort
 MTB > # command.
 MTB > 
 MTB > # Now that the data are in rank order, it is a simple
 MTB > # task to use the correlation command.  Be sure to 
 MTB > # remember that the data are ranked, so the correlation
 MTB > # output will be the nonparametric Spearman and not 
 MTB > # the parametric Pearson coefficient of correlation.
 MTB > 
 MTB > correlation of the data in c6 and c7
 
 Correlation of C6 and C7 = 0.360
 
 MTB > stop

--------------------------
Disclaimer:  All care was used to prepare the information in this 
tutorial.  Even so, the author does not and cannot guarantee the 
accuracy of this information.  The author disclaims any and all 
injury that may come about from the use of this tutorial.  As 
always, students and all others should check with their advisor(s) 
and/or other appropriate professionals for any and all assistance 
on research design, analysis, selected levels of significance, and 
interpretation of output file(s).

The author is entitled to exclusive distribution of this tutorial. 
Readers have permission to print this tutorial for individual use, 
provided that the copyright statement appears and that there is no 
redistribution of this tutorial without permission.

Prepared 980316
Revised  980914
end-of-file 'spearman.ssi'

Please send comments or suggestions to Dr. Thomas W. MacFarland

There have been visitors to this page since February 1, 1999.