Purpose of the case
We have to study three competitors (Dubuque, Ball Park, Oscar Mayer) in the market of the hot dog. We want to know the impact of the variation of the price of one competitor (Ball
Park) on the market share of another competitor (Dubuque).
Origin of the dataset
The dataset comes from "a scanner study conducted at grocery stores located in the western suburbs of Chicago". So we know that the data don't represent the entire hot dog market.
There is uncertainty about the persons who collect the data, thus there could be possible error in the collection of the data (for example: we don't know if the sample has been drawn by a totally random procedure). Moreover, we don't know anything about the market share of the other competitors or about the season in which the weeks have been selected (for example, in summer you eat more hot dogs maybe).
But this is our dataset so we are going to use it, keeping in mind the comments above for the final decision we will have to give.
Scanning of our dataset
We have 113 observations. In order to have a better view of our data we can do a scatter plot for each of the independent variables. We will see how the points drag the trend line and if there are outliers.
I have found one possible outlier (in red). I need to run a multiple regression with and without the possible outlier. If there is an important change in the output, I can consider to deleting the outlier but it is always important to think about some reasons why I need to delete the outlier. Regression
MSHARE = 4.0303 - 7.5977 * PDUB + 2.6223 * PMAY + 3.4727 * PBPREG + 1.0249 * PBPALL
Without the outlier
MSHARE = 4.2352 - 6.9540 * PDUB + 2.0470 * PMAY + 2.8069 * PBPREG + 1.4745 * PBPALL
We can see that there is no big difference; the residual is not less big without the outlier.
Also, Dubuque's MSHARE in red is big because the competitors’ prices have increased and