Multiple Regression Project:
Forecasting Sales for Proposed New
Sites of Pam and Susan’s Stores
I. Introduction
Pam and Susan’s is a discount department store that currently has 250 stores, most of which are located throughout the southern United States. As the company has grown, it has become increasingly more important to identify profitable locations. Using census and existing store data, a multiple regression equation will be used to forecast potential sales, and therefore which proposed new site location will be more profitable.
II. Data
The data set has 37 independent variables. This includes 7 categorical variables for competitive type and 30 numerical categories. There are 250 stores, meaning the sample size is 250. As the sales are given in $1,000’s of dollars it is best to remember that a unit change in x will correspond to that coefficient of x multiplied by 1,000.
III. Results and Discussion
Building a multiple regression model requires a step-by-step approach. Failure to follow such methodology could ultimately lead to incorrect and inaccurate forecasting for the dependent variable of interest. Below I will outline the process and findings used to obtain a multiple regression equation to forecast potential sales at newly proposed site of Pam and Susan’s discount department stores.
The initial step in building a multiple regression model is to look for outliers and non-linear relationships between your dependent (predicated sales) and independent variables. In order for multiple regression to be an accurate forecasting tool, each x-variable should have a slightly linear relationship with the y-variable. Below in Table (i) is a list of the 10 quantitative x-variables that have the highest correlation with sales. These 10 variables will be used to obtain the final multiple regression equation. Additionally, the problem of multicollinearity can result if the correlation