Situations with Two or More Independent Variables of Interest
When considering the relationship among three or more variables, an interaction may arise. Interactions describe a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of multiple regression analyses, but they may also be evaluated using twoway ANOVA.
An Example Problem
To determine if prenatal exposure to cocaine alters dendritic spine density within prefrontal cortex, 20 rats were equally divided between treatment groups that were prenatally exposed to either cocaine or placebo. Further, because any effect of prenatal drug exposure might be evident at one age but not another, animals within each treatment group were further divided into two groups. One subgroup was studied at 4 weeks of age and the other was studied at 12 weeks of age. Thus, our independent variables are treatment (prenatal drug/placebo exposure) and age (lets say 4 and 12 weeks of age), and our dependent variable is spine density. The following table shows one possible outcome of such a study:
DENDRITIC SPINE DENSITY 

4Week Control 
4Week Cocaine 
12Week Control 
12Week Cocaine 
7.5 
5.5 
8.0 
5.0 
8.0 
3.5 
10.0 
4.5 
6.0 
4.5 
13.0 
4.0 
7.0 
6.0 
9.0 
6.0 
6.5 
5.0 
8.5 
4.0 
There are three null hypotheses we may want to test. The first two test the effects of each independent variable (or factor) under investigation, and the third tests for an interaction between these two factors:
H_{01}: Prenatal treatment substance (control or cocaine) has no effect on dendritic spine density in rats.
We can state this more formally as _{Control} = _{Cocaine}, where _{Control} and _{Cocaine} are the population mean spine densities of rats who were not and were prenatally exposed to cocaine, respectively.
H_{02}: Age has no effect on dendritic spine density.
Or, more formally, _{4Week} = _{12Week}, where _{4Week} and _{12Week} are the population mean spine densities of rats at the ages of 4 and 12weeks, respectively.
H_{03}: The two factors (treatment and age) are independent; i.e., there is no interaction effect.
Or, _{4Week, Control} = _{4Week, Cocaine} = _{12Week, Control} = _{12Week, Cocaine}, where each represents the population mean spine density of rats at the labeled age and prenatal treatment condition.
TwoWay ANOVA
A twoway ANOVA is an analysis technique that quantifies how much of the variance in a sample can be accounted for by each of two categorical variables and their interactions. Note that different rats are used in each of the four experimental groups, so a standard twoway ANOVA should be used. If the same rats were used in each experimental condition, one would want to use a twoway repeatedmeasures ANOVA.
Step 1 is to compute group means (for each cell group), row and column means, and the grand mean (for all observations in the whole experiment):
GROUP MEANS 


4Week 
12Week 
All Ages 
Control 
7 
9.7 
8.35 
Cocaine 
4.9 
4.7 
4.8 
All Prenatal Exposures 
5.95 
7.2 
6.575 
Take a moment to define your variables
Before we go any further, we should define some variables that we'll use to complete the computations needed for the rest of the analysis:
n = number of observations in each group (here, it's 5 since there are 5 subjects per group)
N = total number of observations in the whole experiment (here, 20)
r = number of rows (here 2, one for each age group)
c = number of columns (here 2, one for each treatment substance condition
_{g} = mean of a particular cell group g (for example, _{12Week, Control} = 9.7).
_{R} = mean of a particular row R (for example, _{4Week} = 5.95)
_{C} = mean of a particular column C (for example, _{Cocaine} = 4.8)
And finally, to refer to the grand mean (the overall mean of all observations in the experiment), we'll simply use the notation (here, = 6.575).
Now that we've calculated all of our means, it would be helpful to also plot the mean data. (It's easier to understand it that way!)
Step 2 is to calculate the sumofsquares for each individual group (SS_{g}) using this:
where x_{i,g} is the ith measurement for group g, and _{g} is the mean of group g. Remember here that when we say "group", we are referring to individual cells (not rows or columns, which we'll deal with next):
For each group, this formula is implemented as follows:
4Week Control:
{7.5, 8, 6, 7, 6.5}, _{4week, control} = 7
SS_{4week, control} = (7.57)^{2} + (87)^{2} + (67)^{2} + (77)^{2} + (6.57)^{2} = 2.5
4Week Cocaine:
{5.5, 3.5, 4.5, 6, 5}, _{4week, cocaine} = 4.9
SS_{4week, cocaine} = (5.54.9)^{2} + (3.54.9)^{2} + (4.54.9)^{2} + (64.9)^{2} + (54.9)^{2} = 3.7
12Week Control:
{8, 10, 13, 9, 8.5}, _{12week, control} = 9.7
SS_{12week, control} = (89.7)^{2} + (109.7)^{2} + (139.7)^{2} + (99.7)^{2} + (8.59.7)^{2} = 15.8
12Week Cocaine:
{5, 4.5, 4, 6, 4}, _{12week, cocaine} = 4.7
SS_{12week, cocaine} = (54.7)^{2} + (4.54.7)^{2} + (44.7)^{2} + (64.7)^{2} + (44.7)^{2} = 2.8
Step 3 is to calculate the betweengroups sumofsquares (SS_{B}):
where n is the number of observations per group, _{g} is the mean for group g, and is the grand mean. So...
SS_{B} = n [( _{4week, control}  )^{2} + ( _{4week, cocaine}  )^{2} + ( _{12week, control}  )^{2} + ( _{12week, cocaine}  )^{2}]
= 5 [(7  6.575 )^{2} + (4.9  6.575)^{2} + (9.7  6.575)^{2} + (4.7  6.575)^{2}]
= 5 [0.180625 + 2.805625 + 9.765625 + 3.515625]
= 5 [16.2675]
= 81.3375
Step 4 pertains to withingroup variance. Here, we'll calculate the sumofsquares, degrees of freedom, and mean square error within groups. First, compute the withingroups sumofsquares (SS_{W}) by summing the sumofsquares you computed for each group in Step 1:
where SS_{g} is the sumofsquares for group g. So...
SS_{W} = SS_{4week, control} + SS_{4week, cocaine} + SS_{12week, control} + SS_{12week, cocaine}
= 2.5 + 3.7 + 15.8 + 2.8
= 24.8
Next, calculate the withingroups degrees of freedom (df_{W}), using this:
df_{W} = N  rc
where Nis the total number of observations in the experiment, r is the number of rows, and c is the number of columns. So...
df_{W} = N  rc
= 20  (2 * 2)
= 16
And finally, we compute the withingroups mean square error (s_{W}^{2}) by dividing SS_{W}by df_{W}:
s_{W}^{2} = SS_{W} / df_{W}
= 24.8 / 16
= 1.55
Note that SS_{W} is also known as the "residual" or "error" since it quantifies the amount of variability after the condition means are taken into account. The degrees of freedom here are N  rc because there are N data points, but the number of means fit is r*c, giving a total of N  rc variables that are free to vary.
Step 5 pertains to row variance. Here, we'll calculate all the same stuff we did in that last step (the sumofsquares, degrees of freedom, and mean square error), but for the rows. First, compute the row sumofsquares (SS_{R}) by summing the sumofsquares you computed for each row in Step 1:
where n_{R} is the number of observations per row, _{R} is the mean of row R, and is the grand mean.
SS_{R} = n_{R} [( _{control}  )^{2} + ( _{cocaine}  )^{2}]
= 10 [(8.35  6.575)^{2} + (4.8  6.575)^{2}]
= 10 [3.150625 + 3.150625]
= 10 [6.30125]
= 63.0125
df_{R} = r  1, where r is the number of rows.
= 21
= 1
And, just as before, divide the sumofsquares by the degrees of freedom to get the mean square error for the rows:
s_{R}^{2} = SS_{R} / df_{R}
= 63.0125 / 1
= 63.0125
In Step 6, we calculate all of the variance metrics again for the columns (SS_{C}, df_{C}, s_{C}^{2}):
where n_{C} is the number of observations per column, _{C} is the mean of column C, and is the grand mean. So...
SS_{C} = n_{C} ( _{4week}  )^{2} + ( _{12week}  )^{2}]
= 10 [(5.95  6.575)^{2} + (7.2  6.575)^{2}]
= 10 [0.390625 + 0.390625]
= 10 [0.78125]
= 7.8125
df_{C} = c  1, where c is the number of columns.
= 21
= 1
s_{C}^{2} = SS_{C} / df_{C}
= 7.8125 / 1
= 7.8125
Step 7 is to calculate the same three variance metrics the rowcolumn interaction (SS_{RC}, df_{RC}, s_{RC}^{2}). To calculateSS_{RC}, simply sum the sumofsquares you calculated for betweengroups, rows, and columns (SS_{B}, SS_{R}, and SS_{C}, respectively):
SS_{RC} = SS_{B}  SS_{R}  SS_{C}
= 81.3375  63.0125  7.8125
= 10.5125
And, for df_{RC} ...
df_{RC} = (r  1)(c  1)
= (21)(21)
= 1
Finally, s_{RC}^{2}:
s_{RC}^{2} = SS_{RC} / df_{RC}
= 10.5125 / 1
= 10.5125
Step 8 is to calculate the total variance metrics for the experiment. As always, start by computing the sumofsquares (SS_{T}), which is done here by summing the betweengroups, withingroups, row, column, and interaction sumofsquares you computed in steps 37:
SS_{T} = SS_{B} + SS_{W} + SS_{R} + SS_{C} + SS_{RC}
= 81.3375 + 24.8 + 63.0125 + 7.8125 + 10.5125
= 187.475
df_{T} = N  1, where N is the total number of observations in the experiment.
= 201
= 19
Step 9 is calculating the Fvalues (F_{obt}) required to determine significance. For twoway ANOVAs, you'll compute 3 Fvalues: one for rows, one for columns, and one for the interaction between the two. Each Fvalue is computed by subtracting the mean square of the relevant dimension (R, C, or RC) by the withingroups mean square:
F_{R} = s_{R}^{2} / s_{W}^{2}
= 63.0125 / 1.55
= 40.65323
F_{C} = s_{C}^{2} / s_{W}^{2}
= 7.8125 / 1.55
= 5.040323
F_{RC} = s_{RC}^{2} / s_{W}^{2}
= 10.5125 / 1.55
= 6.782258
Step 10 is organizing all of the above calculations into a table, along with the appropriate F_{crit} values (looked up in a table like this one). Then we'll compare eachF_{obt} value we computed to the F_{crit} values we looked up to draw our conclusions:
F_{crit} (1, 16) _{α=0.5} = 4.49
ANOVA TABLE 

Source 
SS 
df 
s^{2} 
F_{obt} 
F_{crit} 
p 
rows 
63.0125 
1 
63.0125 
40.65323 
4.49 
p < 0.05 
columns 
7.8125 
1 
7.8125 
5.040323 
4.49 
p < 0.05 
r * c 
10.5125 
1 
10.5125 
6.782258 
4.49 
p < 0.05 
within 
24.8 
16 
1.55 
 
 
 
total 
187.475 
19 
 
 
 
 
Both variables (treatment and age) are significant, as indicated by the fact that F_{obt} > F_{crit} for both. Thus, we can reject H_{01} and H_{02} and conclude that dendritic spine density is affected by prenatal cocaine exposure and age. The interaction between the two factors (r * c) is also significant. Thus, we can also reject H_{03} and conclude there is a significant interaction between treatment and age.
 To further interpret these results, we can plot the group means as follows:
NOTE: Remember that the statistics provided by the ANOVA quantify the effect of each factor (in this case, treatment and age). These statistics do not compare individual condition means, such as whether 4week control differs from 12week control. If k = the number of groups, the number of possible comparisons is k * (k1) / 2. In the above example, we have 4 groups, so there are (4*3)/2 = 6 possible comparisons between these group means. Statistical testing of these individual comparisons requires a posthoc analysis that corrects for experimentwise error rate. If all possible comparisons are of interest, the Tukey's HSD (Honestly Significant Difference) Test is commonly used.
Tukey's HSD (Honestly Significant Difference) Test
Tukey's test is a singlestep, multiplecomparison statistical procedure often used in conjunction with an ANOVA to test which group means are significantly different from one another. It is used in cases where group sizes are equal (the TukeyKramer procedure is used if group sizes are unequal) and it compares all possible pairs of means. Note that the Tukey's test formula is very similar to that of the ttest, except that it corrects for experimentwise error rate. (When there are multiple comparisons being made, the probability of making a type I error (rejecting a true null hypothesis) increases, so Tukey's test corrects for this.) The formula for a Tukey's test is:
where Y_{A} is the larger of the two means being compared, Y_{B} is the smaller,s_{W}^{2} is the mean squared error within, and n is the number of data points within each group . Once computed, the q_{obt} value is compared to a qvalue from the q distribution. If the q_{obt} value is larger than the q_{crit} value from the distribution, the two means are significantly different.
So, if we wanted to use a Tukey's test to determine whether 4week control significantly differs from 12week control, we'd calculate it as follows
q_{obt} = _{12week, control}  _{4week, control} /
= 9.7  7 /
= 9.7  7 /
= 2.7 / 0.5567764
= 4.849343
The q_{crit} value may be looked up in a chart (like this one) using the appropriate values for k (which represents the number of group means, so 4 here) and df_{w} (16). So here, q_{crit} (4, 16) _{α=0.5}= 4.05. Since q_{obt} > q_{crit} (4.85 > 4.05), we can conclude that the two means are, in fact, (honestly) significantly different.
Additional group comparisons reveal that spine density does not change significantly with age in the cocaineexposed group, but is significantly greater in the control group as compared to cocaineexposed animals at both 4 and 12 weeks of age. Thus, the effect of prenatal drug exposure is magnified with time due to a developmental increase in spine density that occurs only in the control group.
IMPORTANT NOTE: There are various posthoc tests that can be used, and the choice of which method to apply is a controversial area in statistics. The different tests vary in how conservative they are: more conservative tests are less powerful and have a lower risk of a type I error (rejecting a true null hypothesis), however this comes at the cost of increasing the risk of a type II error (incorrectly accepting the null hypothesis). Some common methods, listed in order of decreasing power, are: Fisher’s LSD, NewmanKeuls, Tukey HSD, Bonferonni, Scheffé. The following provides some guidelines for choosing an appropriate posthoc procedure.
If all pairwise comparisons are truly relevant, the Tukey method is recommended.
If it is most reasonable to compare all groups against a single control, then the Dunnett test is recommended.
If only a subset of the pairwise comparisons are relevant, then the Bonferroni method is often utilized for those selected comparisons. For example, in the present example one would likely not be interested in comparing the 4week control to the 12week cocaine group, or the 12week control to the 4week cocaine group.
Multiple Regression
Another analysis technique you could use is a multiple regression. In multiple regression, we find coefficients for each group such that we are able to best predict the group means. The multiple regression computes standard errors on the coefficients, meaning that we can determine if a coefficient is significantly different from zero. On this simple example, multiple regression will give identical answers to ANOVA, but in more complex cases, multiple regression is a more powerful technique that allows you to include additional nuisance predictors, that the analysis controls for before testing for significance of your independent variables.