1a)
The histograms of page cost, circulation and median income are all skewed to the right because the majority of the data lies to the left of the mean and the tail on the right side is longer than that on the left side.
b)
There seems to be a strong logarithmic type of relationship between page cost and circulation and a slightly noticeable linear relationship between page cost and median income. As for the scatterplot between page cost and percent male, the points appear to be scattered randomly because percent male does not affect page cost.
2
a) The multiple regression model is statistically useful overall because at least one independent variable is significant as shown below. H0: No linear relationship Ha: At least one X variable affects Y F = MSR/MSE = 5177553908/184850796 = 28.009 Reject H0 at 5% level of significance since p value = 0 < 0.05 when Fα is based on 3 numerator and 40 denominator degrees of freedom
b) pagecost = - 7640 + 5.17 circ - 10.2 percmale + 1.19 medianincome
c) Page cost is expected to increase by an estimated $5.17 for each projected thousand increase in readers, decrease by an estimated $10.2 for each percent increase in male among the predicted readership and increase by an estimated $1.19 for each dollar increase in median household income. We would recommend keeping circulation and median household income of readership. Test H0: β1 = 0 t = b1/sb1 = 5.1653/0.5653 = 9.14 Reject H0 at 5% level of significance since p value = 0 < 0.05 Test H0: β2 = 0 t = b2/sb2 = -10.24/84.82 = -0.12 Fail to reject H0 at 5% level of significance since p value = 0.905 > 0.05 Test H0: β3 = 0 t = b3/sb3 = 1.1877/0.5628 = 2.11 Reject H0 at 5% level of significance since p value = 0.041 < 0.05
d) Both regression assumptions of linearity and homoscedasticity are violated because the data