Analysis of Variance (ANOVA)
1
Recall, when we wanted to compare two population means, we used the 2-sample t procedures . Now let’s expand this to compare k ≥ 3 population means. As with the t-test, we can graphically get an idea of what is going on by looking at side-by-side boxplots. (See Example 12.3, p. 748, along with Figure 12.3, p. 749.)
1 Basic ANOVA concepts
1.1 The Setting
Generally, we are considering a quantitative response variable as it relates to one or more explanatory variables, usually categorical. Questions which fit this setting: (i) Which academic department in the sciences gives out the lowest average grades? (Explanatory variable: department; Response variable: student GPA’s for individual courses) (ii) Which kind of promotional campaign leads to greatest store income at Christmas time? (Explanatory variable: promotion type; Response variable: daily store income) (iii) How do the type of career and marital status of a person relate to the total cost in annual claims she/he is likely to make on her health insurance. (Explanatory variables: career and marital status; Response variable: health insurance payouts) Each value of the explanatory variable (or value-pair, if there is more than one explanatory variable) represents a population or group. In the Physicians’ Health Study of Example 3.3, p. 238, there are two factors (explanatory variables): aspirin (values are “taking it” or “not taking it”) and beta carotene (values again are “taking it” or “not taking it”), and this divides the subjects into four groups corresponding to the four cells of Figure 3.1 (p. 239). Had the response variable for this study been quantitative—like systolic blood pressure level—rather than categorical, it would have been an appropriate scenario in which to apply (2-way) ANOVA.
1.2 Hypotheses of ANOVA
These are always the same. H0 : The (population) means of all groups under consideration are equal. Ha : The (pop.) means are not all