A well-designed experiment tells us that changes in the explanatory variable cause changes in the response variable. More exactly, it tells us that this happened for specific subjects in the specific environment of this specific experiment. No doubt we had grander things in mind. We want to proclaim that our new method of teaching math does better for high school students in general or that our new drug beats a placebo for some broad class of patients. Can we generalize our conclusions from our little group of subjects to a wider population? The first step is to be sure that our findings are statistically significant, that they are too strong to often occur just by chance. That’s important, but it’s a technical detail that the study’s statistician can reassure us about. The serious threat is that the treatments, the subjects, or the environment of our experiment may not be realistic. For example, a psychologist wants to study the effects of failure and frustration on the relationships among members of a work team. She forms a team of students, brings them to the psychology laboratory, and has them play a game that requires teamwork. The game is rigged so that they lose regularly. The psychologist observes the students through a one-way window and notes the changes in their behavior during an evening of game playing. Playing a game in a laboratory for small stakes, knowing that the session will soon be over, is a long way from working for months developing a new product that never works right and is finally abandoned by your company. Does the behavior of the students in the lab tell us much about the behavior of the team whose product failed? Psychologists do their best to devise realistic experiments for studying human behavior, but lack of realism limits the usefulness of experiments in this area.
When experiments are not fully realistic, statistical analysis of the experimental data cannot tell us how far the results will