Regression Analysis of Multiple Variables
Neil Bhatt
993569302
Sta 108 P. Burman
11 total pages
The question being posed in this experiment is to understand whether or not pollution has an impact on the mortality rate. Taking data from 60 cities (n=60) where the responsive variable Y = mortality rate per population of 100,000, whose variables include Education, Percent of the population that is nonwhite, percent of population that is deemed poor, the precipitation, the amount sulfur dioxide, and amount of nitrogen dioxide.
Data:
60 Standard Metropolitan Statistical Area (SMSA) in the United States, obtained for the years 1959-1961. [Source: GC McDonald and JS Ayers, “Some applications of the ‘Chernoff Faces’: a technique for graphically representing multivariate data”, in Graphical Representation of Multivariate Data, Academic Press, 1978.
Taking the data, we can construct a matrix plot of the data in order to take a visible look at whether a correlation seems to exist or not prior to calculations.
Data Distribution:
Scatter Plot Matrix
As one can observe there seems to be a cluster of data situated on what appears to be a correlation of relationship between Y=Mortality rate and X= potential variables influencing Y.
From this we construct a correlation matrix in order to see a relationship in matrix form.
Correlation Matrix EDUC MORTALITY NONWHITE NOX POOR PRECIP
EDUC 1.0000000 -0.51098130 -0.2087739 0.22440191 -0.40333845 -0.4904252
MORTALITY -0.5109813 1.00000000 0.6437364 -0.07738105 0.41045399 0.5094924
NONWHITE -0.2087739 0.64373637 1.0000000 0.01838530 0.70491501 0.4132045
NOX 0.2244019 -0.07738105 0.0183853 1.00000000 -0.10254386 -0.4873207
POOR -0.4033385 0.41045399 0.7049150 -0.10254386 1.00000000