Chapter 8 is devoted to dummy (independent) variables. This How To answers common questions on working with and interpreting dummy variables.
Questions:
1) How to include dummy variables in a regression?
2) How to interpret a coefficient on a dummy variable?
3) How to test hypotheses with dummy variables and interaction terms?
4) How to create a double-log functional form with dummy variables?
5) How to interpret a coefficient on a dummy variable with a log dependent variable?
1) How to include dummy variables in a regression?
Example:
You want to include Region of the United States in your earnings function regression. You obtain the variable GMREG from the Current Population Survey (CPS), and it has four possible values that the codebook maps to a region like this:
The data are sitting in an Excel file column like this:
Obviously, you CANNOT use GMREG directly in a regression.
To incorporate region as a dummy variable, follow these steps:
1) Create Number of Categories – 1 new variables. (In this example, 4 – 1 = 3 new variables)
In a new column, enter Northeast as the label for the variable.
Use an IF statement to create a 1 is GMREG is 1; otherwise a 0.
=IF(B2=1,1,0)
(where B2 has the value of GMREG for that observation)
Repeat for Midwest and South.
2) Include the Number of Categories – 1 variables in the regression.
The choice of which category to leave out (in this example, West) is totally arbitrary and has no effect on the final results. The actual coefficients of the regression equation do, of course, depend on the category left out (called the base case), but because you interpret a dummy variable coefficient relative to the base case, the predicted values end up the same. See Section 8.2 for more.
2) How to interpret a coefficient on a dummy variable?
For a single dummy variable without an interaction term, the value of the