One problem that can arise in multiple regression analysis is multicollinearity. Multicollinearity is when two or more of the independent variables of a multiple regression model are highly correlated. Technically, if two of the independent variables are correlated, we have collinearity; when three or more independent variables are correlated, we have multicollinearity. However, the two terms are frequently used interchangeably. The reality of business research is that most of the time some correlation between predictors (independent variables) will be present. The problem of multicollinearity arises when the inter-correlation between predictor variables is high. This relationship causes several other problems, particularly in the interpretation of the analysis.
1. It is difficult, if not impossible, to interpret the estimates of the regression coefficients.
2. Inordinately small t values for the regression coefficients may result.
3. The standard deviations of regression coefficients are overestimated.
4. The algebraic sign of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable.
The problem of multicollinearity can arise in regression analysis in a variety of business research situations. For example, suppose a model is being developed to predict salaries in a given industry. Independent variables such as years of education, age, years in management, experience on the job, and years of tenure with the firm might be considered as predictors. It is obvious that several of these variables are correlated (virtually all of these variables have something to do with number of years, or time) and yield redundant information. Suppose a financial regression model is being developed to predict bond market rates by such independent variables as Dow Jones average, prime interest rates, GNP, producer price index, and consumer price index. Several of these predictors are likely to be