1
Homework 1 Solutions
I. An insurance agency is examining the dollar amount of claims from clients who have homeowners insurance. For the 900 people who filed claims, the five-number summary of the amount is:
($8800, $8850, $8900, $9100, $9940).
(a) Would the histogram displaying the data for the 900 claims be nearly bell-shaped? If so, explain how the summary indicates this. If not, determine if the data is skewed left or skewed right, and explain how the summary indicates this.
Ans: The histogram would look skewed to the right. The median is much closer to the 1st quartile than it is to the 3rd quartile, and the maximum value is much farther away from the median than the minimum value.
(b) Would the boxplot for the data indicate any outliers? Explain why or why not.
Ans: IQR=9100-8850=250. 1.5 × IQR = 375. The inner fences will be 8850-375=8475 and
9100+375=9475 respectively. With a maximum of 9940, there is at least one high outlier. (There will be no low outliers because the minimum is 8800, which is within the inner fences.)
II. A drug manufacturer has hundreds of sales representatives all over the United States. A histogram for yearly sales totals for each representative is roughly bell shaped and symmetric except for 4 high outliers corresponding to representatives in Boston, MA. Their sales totals are at least
$60,000 greater than the next highest total. One analyst suggests dropping these 4 totals from the data to get a better summary of the sales across all regions of the country.
(a) If the outliers were to be dropped, which measure of central tendency of the data set would be affected the most – the mean, the median, or the mode? Explain why.
Ans: The median is based on the order of the data, so dropping the high values will most likely not have much effect on this measure. The mode is the most frequent data value, and it is unlikely that any of these 4 outliers would represent the mode. The mean depends