Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable diferentially distributed over levels of the column variables. Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance.
Hypothesis tests on contingency tables are based on a statistic called Chi-square. In this chapter contingency tables will first be reviewed, followed by a discussion of the Chi-squared statistic. The sampling distribution of the Chi-squared statistic will then be presented, preceded by a discussion of the hypothesis test. A complete computational example will conclude the chapter.
REVIEW OF CONTINGENCY TABLES
Frequency tables of two variables presented simultaneously are called contingency tables. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns, then finding the joint or cell frequency for each cell. The cell frequencies are then summed across both rows and columns. The sums are placed in the margins, the values of which are called marginal frequencies. The lower right hand corner value contains the sum of either the row or column marginal frequencies, which both must be equal to N.
For example, suppose that a researcher studied the relationship between having the AIDS Syndrome and sexual preference of individuals. The study resulted in the following data for thirty male subjects:
AIDS
NY | Y | N | N | N | Y | N | N | N | Y | N | N | N | Y | N | N | N | N | N | N | N | Y | N | Y | Y | N | Y | N | Y | N | M | B | F | F | B | F | F | F | M | F | F | F | F | B | F | F | B | F | M | F | F | M | F | B | M | F | M | F |