Fall 2006
Regression: Testing Assumptions
December 4, 2006
Linearity
The linearity of the regression mean can be examined visually by plots of the residuals against any of the independent variables, or against the predicted values. Chart 1 shows a residual plot that reveals no
Chart 2
C hart 1
0.4
0.4
0.3
0.3
0.2
0.1
0.1
Residual
Residual
0.2
0.0
-0.1
0.0
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
-0.5
-0.4
Predicted
Predicted
departures from the assumption of linearity, while Chart 2 reveals nonlinearity. In both cases, the presumed model is that of a simple linear regression, yi = α + β xi + ε i with n = 30.
A statistical test for linearity can be constructed by adding powers of fitted values to the regression model, and then testing the hypothesis of linearity by testing the hypothesis that the added parameters have values equal to zero. This is known as the RESET test (Ramsey).
The multiple regression model can be written yi = β 0 + β1 xi1 + L β k xik + ε i . Least squares estimates of the model parameters are obtained, and powers of the predicted values are added to form an augmented model: ˆ
ˆ
ˆ yi = β 0 + β1 xi1 + L β k xik + γ 1 ( yi )2 + γ 2 ( yi )3 + γ 3 ( yi )4 + ε i
The null hypotheses to be tested is H 0 : γ 1 = γ 2 = γ 3 = 0 , which is tested with the appropriate F-statistic referred to the F-distribution with degrees of freedom [3, n – (k + 3) – 1]. For the example shown in chart 1, SSE for the restricted (original) model equals 0.65 (df = 28), SSE for the unrestricted
(augmented) model equals 0.50 (df = 25), the RESET F-statistic equals 2.52 and the p-value of the test is
0.08. The hypothesis of linearity would not be rejected at the 5% level. For the example in chart 2, SSE for the original and augmented models equal 0.82 (df = 28) and 0.50 (df = 25), respectively, the RESET
F-statistic equals 5.44 and the p-value of the test