Twoway Analysis of Variance
© 1998 by Dr. Thomas W. MacFarland -- All Rights Reserved
************ two_anov.doc ************ Background: Analysis of Variance (ANOVA) methodology is quite effective in determining if two or more group means differ due to chance, or if observed differences are indeed the result of true difference between phenomena. As useful as it may be to determine singular differences between multiple groups: -- ANOVA analysis is not limited only to studies involving one single variable. -- On the contrary, ANOVA can be used to examine differences with two or more factors (i.e., independent variables) at the same time. A common use of ANOVA methodology is to use a Twoway ANOVA statistical test to determine differences (and possible interactions) when variables have two or more categories. When Twoway ANOVA is used, it is possible to determine if: 1. Is there a difference because of variables acting independently of each other? 2. Is there a difference because of joint effects (i.e., interaction)? Twoway ANOVA designs can become quite complex, not only to effect but also to interpret. Yet, this highly useful methodology should not be avoided merely because it is not "user friendly." On the contrary, Twoway ANOVA should be used perhaps more than it it is, due to the advantage of greater utilization of resources while modeling real-world scenarios. Twoway ANOVA designs are often presented in a manner similar to other factorial analyses, such as the Chi-square analysis. Like the chi-square analysis, Twoway ANOVA uses a factorial organization with data placed in cells. The information within each cell provides the necessary data for later analysis. A graphic representation of a factorial design is presented below in Figure 1. When reviewing this representation, be sure to recall that interval or ratio data are used with a Twoway ANOVA design, as opposed to the use of nominal data with a chi-square analysis, which is also organized along the format of a factorial design. Variable A Category 1 Category 2 ______________________________________ | | | | | | Category 1 | n1, n2, ... | n1, n2, ... | | | | | | | Variable B |------------------|-----------------| | | | | | | Category 2 | n1, n2, ... | n1, n2, ... | | | | |__________________|_________________| Figure 1 Comparative Study of Two Variables, (Variable A and Variable B), with Two Categories (Category 1 and Category 2) for each Variable Thus, when using a Twoway ANOVA, be sure to remember that it is possible to examine three separate hypotheses: 1. if means for Variable A are equal to the population 2. if means for Variable B are equal to the population 3. if there is interaction between Variable A and Variable B As such, Twoway ANOVA are often used to help explain "real-world" scenarios, where interaction is often found. This more complex design is different from simplistic designs that can only explain scenarios designed for simplistic modeling. The decision to use a Twoway ANOVA is the decision to see if complex issues can be understood, and possibly acted upon. Scenario: This study examines if there are differences in final examination test scores (the dependent variable) for students in a university senior- level software engineering course on two separate factors: -- The first factor addresses the method of instruction, with: -- The first group of students was taught by traditional lecture (Method Code = 1). -- The second group of students was taught by Computer Based Training (Method Code = 2). -- The third group of students was taught by the use of instructional videotapes (Method Code = 3). -- The fourth group of students was enrolled through independent study (Method Code = 4). -- The second factor addresses each student's possible prior graduation from a community college: -- Some students in the senior-level course had previously attended and graduated from a community college (Grd_CC Code = 1). -- Other students in the senior-level course did not graduate from a community college (Grd_CC Code = 2). This coding scheme (Grd_CC Code = 2) is discrete and therefore includes students who may have attended a community college but did not complete the full curriculum needed to receive an associate's degree. Students were all from a university senior- level software engineering course who were assigned, through random selection, to placement into one of four groups: instruction by traditional lecture, instruction by CBT (Computer Based Training), instruction by the use of instructional videotapes, and independent study. The teacher worked with the registrat's office to obtain information about prior graduation from a community college. The teacher was confident that final examination scores represented interval data (i.e., the data are parametric, with the difference between "89" and "90" equal to the difference between "75" and "76"). The teacher also wanted to learn more about the effects of teaching method, prior graduation from a community college, and the possible effect of interaction between these two factors on final examination scores. As such, Twoway ANOVA (Analysis of Variance) was correctly judged to be the appropriate test for this analysis. Data from this study are summarized in Table 1. Table 1 Final Examination Test Scores in a Senior-Level Software Engineering Course by Instructional Method (Traditional Lecture, Computer Based Training, Instructional Videotape, and Independent Study) and by Prior Graduation from a Community College ==================================================== Instructional Method ============= 1 = Lecture CC Graduate 2 = CBT =========== 3 = Video 1 = Yes Student Number 4 = IDS 2 = No Final Score ---------------------------------------------------- 01 1 1 089 02 1 1 081 03 1 2 073 04 1 1 084 05 1 2 070 06 1 2 056 07 1 1 070 08 1 2 081 09 1 2 078 10 1 1 069 11 1 1 089 12 1 2 088 13 1 2 045 14 1 2 083 15 1 1 095 16 1 2 077 17 1 1 069 18 1 1 080 19 2 2 093 20 2 1 086 21 2 1 089 22 2 2 095 23 2 2 089 24 2 1 088 25 2 1 098 26 2 1 089 27 2 2 094 28 2 1 095 29 2 2 095 30 2 2 098 31 2 2 087 32 2 2 085 33 2 1 098 34 2 1 093 35 2 2 087 36 2 1 095 37 2 1 093 38 2 2 093 39 3 2 095 40 3 1 096 41 3 2 083 42 3 2 089 43 3 1 088 44 3 1 087 45 3 1 094 46 3 2 097 47 3 1 095 48 3 2 093 49 3 2 085 50 3 2 095 51 3 1 092 52 3 2 082 53 3 1 086 54 3 1 087 55 3 2 089 56 3 2 097 57 3 1 100 58 3 2 093 59 3 1 096 60 4 2 084 61 4 1 085 62 4 2 073 63 4 1 092 64 4 2 057 65 4 1 063 66 4 1 069 67 4 2 073 68 4 2 091 69 4 1 065 70 4 1 074 71 4 2 071 72 4 2 068 73 4 2 062 74 4 1 056 75 4 1 085 ---------------------------------------------------- Note. Notice how the N (i.e., number of subjects or group members) for each instructional group does not have to be equal. Ho: Null Hypothesis: There is no difference between instructional method (instruction by traditional lecture, instruction by Computer Based Training, instruction by the use of instructional videotapes, and independent study) and graduation status from a community college (either graduated from a community college or did not graduate from a community college) regarding final examination test scores of students enrolled in a university senior- level software engineering course (p <= .05). Files: 1. two_anov.doc 2. two_anov.dat 3. two_anov.r01 4. two_anov.o01 5. two_anov.con 6. two_anov.lis Command: At the Unix prompt (%), key: %spss -m < two_anov.r01 > two_anov.o01 ************ two_anov.dat ************ 01 1 1 089 02 1 1 081 03 1 2 073 04 1 1 084 05 1 2 070 06 1 2 056 07 1 1 070 08 1 2 081 09 1 2 078 10 1 1 069 11 1 1 089 12 1 2 088 13 1 2 045 14 1 2 083 15 1 1 095 16 1 2 077 17 1 1 069 18 1 1 080 19 2 2 093 20 2 1 086 21 2 1 089 22 2 2 095 23 2 2 089 24 2 1 088 25 2 1 098 26 2 1 089 27 2 2 094 28 2 1 095 29 2 2 095 30 2 2 098 31 2 2 087 32 2 2 085 33 2 1 098 34 2 1 093 35 2 2 087 36 2 1 095 37 2 1 093 38 2 2 093 39 3 2 095 40 3 1 096 41 3 2 083 42 3 2 089 43 3 1 088 44 3 1 087 45 3 1 094 46 3 2 097 47 3 1 095 48 3 2 093 49 3 2 085 50 3 2 095 51 3 1 092 52 3 2 082 53 3 1 086 54 3 1 087 55 3 2 089 56 3 2 097 57 3 1 100 58 3 2 093 59 3 1 096 60 4 2 084 61 4 1 085 62 4 2 073 63 4 1 092 64 4 2 057 65 4 1 063 66 4 1 069 67 4 2 073 68 4 2 091 69 4 1 065 70 4 1 074 71 4 2 071 72 4 2 068 73 4 2 062 74 4 1 056 75 4 1 085 ************ two_anov.r01 ************ SET WIDTH = 80 SET LENGTH = NONE SET CASE = UPLOW SET HEADER = NO TITLE = Twoway Analysis of Variance (TWOWAY ANOVA) COMMENT = This file examines if there are differences in final examination test scores (the dependent variable) for students in a university senior-level software engineering course on two separate factors: -- The first factor addresses the method of instruction, with: -- the first group of students was taught by traditional lecture (Method Code = 1). -- the second group of students was taught by Computer Based Training (Method Code = 2). -- the third group of students was taught by the use of instructional videotapes (Method Code = 3). -- the fourth group of students was enrolled through independent study (Method Code = 4). -- The second factor addresses each student's possible prior graduation from a community college: -- Some students in the senior-level course had previously attended and graduated from a community college (Grd_CC Code = 1). -- Other students in the senior-level course did not graduate from a community college (Grd_CC Code = 2), which includes students who may have attended a community college but did not complete the full curriculum needed to receive an associate's degree. Students were all from a university senior- level software engineering course who were assigned, through random selection, to placement into one of four groups: instruction by traditional lecture, instruction by CBT (Computer Based Training), instruction by the use of instructional videotapes, and independent study. The teacher worked with the registrat's office to obtain information about prior graduation from a community college. The teacher was confident that final examination scores represented interval data (i.e., the data are parametric, with the difference between "89" and "90" equal to the difference between "75" and "76"). The teacher also wanted to learn more about the effects of teaching method, prior graduation from a community college, and the possible effect of interaction between these two factors on final examination scores. As such, Twoway ANOVA (Analysis of Variance) was correctly judged to be the appropriate test for this analysis. DATA LIST FILE = 'two_anov.dat' FIXED / Stu_Code 20-21 Method 35 Grd_CC 45 Score 58-60 Variable Labels Stu_Code "Student Code" / Method "Method: Lecture, CBT, Video, IDS" / Grd_CC "Graduated from Community College: Y or N" / Score "Final Examination Score" Value Labels Method 1 'Lecture: Traditional Lecture' 2 'CBT: Computer-Based Training' 3 'Video: Instructional Videotape' 4 'IDS: Independent Study' / Grd_CC 1 'Grd_Yes: Graduated from a CC' 2 'Grd_No : Did NOT Graduate from a CC' ANOVA Score BY Method(1,4) Grd_CC (1,2) / STATISTICS = ALL / FORMAT = LABELS COMMENT = Please note in this analysis how I need to identify which methods (1, 2, 3, and 4) and community college status (1 and 2) to analyze. ************ two_anov.o01 ************ 1 SET WIDTH = 80 2 SET LENGTH = NONE 3 SET CASE = UPLOW 4 SET HEADER = NO 5 TITLE = Twoway Analysis of Variance (TWOWAY ANOVA) 6 COMMENT = This file examines if there are differences 7 in final examination test scores (the 8 dependent variable) for students in a 9 university senior-level software engineering 10 course on two separate factors: 11 12 -- The first factor addresses the method 13 of instruction, with: 14 15 -- the first group of students was taught 16 by traditional lecture (Method Code 17 = 1). 18 19 -- the second group of students was taught 20 by Computer Based Training (Method Code 21 = 2). 22 23 -- the third group of students was taught 24 by the use of instructional videotapes 25 (Method Code = 3). 26 27 -- the fourth group of students was 28 enrolled through independent study 29 (Method Code = 4). 30 31 -- The second factor addresses each student's 32 possible prior graduation from a community 33 college: 34 35 -- Some students in the senior-level course 36 had previously attended and graduated 37 from a community college (Grd_CC Code 38 = 1). 39 40 -- Other students in the senior-level course 41 did not graduate from a community college 42 (Grd_CC Code = 2), which includes students 43 who may have attended a community college 44 but did not complete the full curriculum 45 needed to receive an associate's degree. 46 47 48 Students were all from a university senior- 49 level software engineering course who were 50 assigned, through random selection, to 51 placement into one of four groups: instruction 52 by traditional lecture, instruction by CBT 53 (Computer Based Training), instruction by 54 the use of instructional videotapes, and 55 independent study. The teacher worked with 56 the registrat's office to obtain information 57 about prior graduation from a community 58 college. 59 60 The teacher was confident that final examination 61 scores represented interval data (i.e., the data 62 are parametric, with the difference between "89" 63 and "90" equal to the difference between "75" 64 and "76"). The teacher also wanted to learn 65 more about the effects of teaching method, 66 prior graduation from a community college, and 67 the possible effect of interaction between these 68 two factors on final examination scores. As 69 such, Twoway ANOVA (Analysis of Variance) was 70 correctly judged to be the appropriate test for 71 this analysis. 72 DATA LIST FILE = 'two_anov.dat' FIXED 73 / Stu_Code 20-21 74 Method 35 75 Grd_CC 45 76 Score 58-60 77 This command will read 1 records from two_anov.dat Variable Rec Start End Format STU_CODE 1 20 21 F2.0 METHOD 1 35 35 F1.0 GRD_CC 1 45 45 F1.0 SCORE 1 58 60 F3.0 78 Variable Labels 79 Stu_Code "Student Code" 80 / Method "Method: Lecture, CBT, Video, IDS" 81 / Grd_CC "Graduated from Community College: Y or N" 82 / Score "Final Examination Score" 83 84 Value Labels 85 Method 1 'Lecture: Traditional Lecture' 86 2 'CBT: Computer-Based Training' 87 3 'Video: Instructional Videotape' 88 4 'IDS: Independent Study' 89 90 / Grd_CC 1 'Grd_Yes: Graduated from a CC' 91 2 'Grd_No : Did NOT Graduate from a CC' 92 93 ANOVA Score BY Method(1,4) Grd_CC (1,2) 94 / STATISTICS = ALL 95 / FORMAT = LABELS 96 COMMENT = Please note in this analysis how I need to 97 identify which methods (1, 2, 3, and 4) 98 and community college status (1 and 2) to 99 analyze. 100 * * * A N A L Y S I S O F V A R I A N C E * * * SCORE Final Examination Score by METHOD Method: Lecture, CBT, Video, IDS GRD_CC Graduated from Community College: Y or N UNIQUE sums of squares All effects entered simultaneously Sum of Mean Sig Source of Variation Squares DF Square F of F Main Effects 5521.491 4 1380.373 18.308 .000 METHOD 5379.834 3 1793.278 23.784 .000 GRD_CC 160.120 1 160.120 2.124 .150 2-Way Interactions 177.981 3 59.327 .787 .505 METHOD GRD_CC 177.981 3 59.327 .787 .505 Explained 5704.155 7 814.879 10.808 .000 Residual 5051.632 67 75.397 Total 10755.787 74 145.348 75 cases were processed. 0 cases (.0 pct) were missing. ************ two_anov.con ************ Outcome: The SPSS Analysis of Variance output file for a Twoway ANOVA is rather complex and at first it may appear somewhat difficult to interpret. Look at the following edited section of the output file to see just what you need to examine to determine if the variables differ from sample means and if there is any interaction between the variables: Sig Source of Variation F of F Main Effects METHOD 23.784 .000 GRD_CC 2.124 .150 2-Way Interactions METHOD GRD_CC .787 .505 Although you could compare the calculated F statistics to criterion F statistics, it is usually easier and just as informative to use the probability (i.e., Significance of F) values to determine if differences exist. In this example: METHOD (Instructional Method) -- The calculated Method p value is .000. -- The delcared Method p value is .05. The calculated p value is less than the declared p value and there is, accordingly, a difference in final examination scores based on instructional method. GRD_CC (Graduated from a Community College) -- The calculated Method p value is .150. -- The delcared Method p value is .05. The calculated p value exceeds the declared p value value and there is, accordingly, no difference in final examination scores based on prior graduation from a community college. METHOD by GRD_CC (2-Way Interaction) -- The calculated Method p value is .505. -- The delcared Method p value is .05. The calculated p value exceeds the declared p value value and there is, accordingly, no interaction between instructional method and prior graduation from a community college. ************ two_anov.lis ************ % minitab MTB > outfile 'two_anov.lis' Collecting Minitab session in file: two_anov.lis MTB > # MINITAB Addendum to 'two_anov.dat' MTB > # MTB > read 'two_anov.dat' c1 c2 c3 c4 Entering data from file: two_anov.dat 75 rows read. MTB > print c1 c2 c3 c4 ROW C1 C2 C3 C4 1 1 1 1 89 2 2 1 1 81 3 3 1 2 73 4 4 1 1 84 5 5 1 2 70 6 6 1 2 56 7 7 1 1 70 8 8 1 2 81 9 9 1 2 78 10 10 1 1 69 11 11 1 1 89 12 12 1 2 88 13 13 1 2 45 14 14 1 2 83 15 15 1 1 95 16 16 1 2 77 17 17 1 1 69 18 18 1 1 80 19 19 2 2 93 20 20 2 1 86 21 21 2 1 89 Continue? y 22 22 2 2 95 23 23 2 2 89 24 24 2 1 88 25 25 2 1 98 26 26 2 1 89 27 27 2 2 94 28 28 2 1 95 29 29 2 2 95 30 30 2 2 98 31 31 2 2 87 32 32 2 2 85 33 33 2 1 98 34 34 2 1 93 35 35 2 2 87 36 36 2 1 95 37 37 2 1 93 38 38 2 2 93 39 39 3 2 95 40 40 3 1 96 41 41 3 2 83 42 42 3 2 89 43 43 3 1 88 44 44 3 1 87 Continue? y 45 45 3 1 94 46 46 3 2 97 47 47 3 1 95 48 48 3 2 93 49 49 3 2 85 50 50 3 2 95 51 51 3 1 92 52 52 3 2 82 53 53 3 1 86 54 54 3 1 87 55 55 3 2 89 56 56 3 2 97 57 57 3 1 100 58 58 3 2 93 59 59 3 1 96 60 60 4 2 84 61 61 4 1 85 62 62 4 2 73 63 63 4 1 92 64 64 4 2 57 65 65 4 1 63 66 66 4 1 69 67 67 4 2 73 Continue? y 68 68 4 2 91 69 69 4 1 65 70 70 4 1 74 71 71 4 2 71 72 72 4 2 68 73 73 4 2 62 74 74 4 1 56 75 75 4 1 85 MTB > # Before I attempt a TWOWAY ANOVA on this data set, MTB > # I first need to determine if the design is balanced MTB > # or unbalanced. MTB > # MTB > # Look carefully at the various groups and you will see MTB > # that the number of students in each teaching method MTB > # group is not consistent. Because the numbers are MTB > # not consistent, this design is unbalanced. The same MTB > # issue applies to the number of students with a prior MTB > # community college associate's degree. MTB > # MTB > # I will use MINITAB's help command to determine the MTB > # proper command for a TWOWAY ANOVA on an unbalanced MTB > # design. MTB > help * You are using MINITAB Statistical Software, Standard Version * To see: Type: ----------------------------- --------------------------------- A list of all command topics HELP COMMANDS A list of all overview topics HELP OVERVIEW Information on a command HELP commandname [subcommandname] ----------------------------- --------------------------------- For example: HELP COMMANDS HELP PLOT HELP PLOT TITLE To leave Minitab, type STOP. MTB > help commands To list the Minitab commands for any category below, type HELP COMMANDS followed by the appropriate number. For example, to list available regression commands, type: HELP COMMANDS 7. 1 General Information 10 Nonparametrics 2 Files, Data, and Printing 11 Tables 3 Editing and Manipulating Data 12 Times Series 4 Arithmetic 13 Statistical Process Control 5 Plotting Data 14 Exploratory Data Analysis 6 Basic Statistics 15 Distributions and Random Data 7 Regression 16 Matrices 8 Analysis of Variance 17 Miscellaneous Features 9 Multivariate Analysis 18 Macros * * * Enhanced Version * * * 19 Professional Graphics 20 Enhanced Statistical Process Control 21 Graphical Options for Control Charts 22 Analysis of Means 23 Design and Analysis of Experiments MTB > help commands 8 Analysis of Variance AOVONEWAY.....does one way analysis of variance, with each group in separate columns ONEWAYAOV.....does one way analysis of variance, with the response in one column, subscripts in another TWOWAYAOV.....does balanced two way analysis of variance ANOVA.........does univariate and multivariate analysis of variance with balanced designs ANCOVA........analyzes orthogonal designs (including latin squares and crossover designs) with crossed and nested factors and additive covariates GLM...........does univariate and multivariate analysis of variance with balanced and unbalanced designs, analysis of covariance, and regression NESTED........experimental command that analyzes fully nested (hierarchical) designs INDICATOR.....creates indicator or dummy variables MTB > # And I see that glm is used when the design is unbalanced. MTB > # MTB > # FYI ... the two options here are: MTB > # MTB > # -- for a balanced design anova c4 = c2 | c3 MTB > # -- for an unbalanced design glm c4 = c2 | c3 MTB > # MTB > # I recommend that you stay away from the standard command of MTB > # twoway c4 c2 c3 when dealing with a balanced design. MTB > # MTB > # If you use this command, you will need to do manual MTB > # calculations to obtain F values. MTB > # MTB > # I will now use glm on this unbalanced design. MTB > # MTB > glm c4 = c2 | c3 Factor Levels Values C2 4 1 2 3 4 C3 2 1 2 Analysis of Variance for C4 Source DF Seq SS Adj SS Adj MS F P C2 3 5372.33 5379.83 1793.28 23.78 0.000 C3 1 153.84 160.12 160.12 2.12 0.150 C2*C3 3 177.98 177.98 59.33 0.79 0.505 Error 67 5051.63 5051.63 75.40 Total 74 10755.79 Unusual Observations for C4 Obs. C4 Fit Stdev.Fit Residual St.Resid 13 45.000 72.333 2.894 -27.333 -3.34R 63 92.000 73.625 3.070 18.375 2.26R 68 91.000 72.375 3.070 18.625 2.29R 74 56.000 73.625 3.070 -17.625 -2.17R Continue? y R denotes an obs. with a large st. resid. MTB > # And you will notice that the significance of F (or MTB > # the p values in MINITAB's printout) are the same MTB > # as what you previously saw in the SPSS printout: MTB > # MTB > # -- Method p = .000 MTB > # -- Grd_CC p = .150 MTB > # -- 2-Way Interaction (Method * Grd_CC) p = .505 MTB > # MTB > # Let me demonstrate the use of the twoway command on MTB > # this unbalanced design, just to show how the analysis MTB > # will not continue. MTB > # MTB > anova c4 = c2 | c3 * ERROR * Unequal cell counts. MTB > stop -------------------------- Disclaimer: All care was used to prepare the information in this tutorial. Even so, the author does not and cannot guarantee the accuracy of this information. The author disclaims any and all injury that may come about from the use of this tutorial. As always, students and all others should check with their advisor(s) and/or other appropriate professionals for any and all assistance on research design, analysis, selected levels of significance, and interpretation of output file(s). The author is entitled to exclusive distribution of this tutorial. Readers have permission to print this tutorial for individual use, provided that the copyright statement appears and that there is no redistribution of this tutorial without permission. Prepared 980316 Revised 980914 end-of-file 'two_anov.ssi'