April 16, 2014
Camesha Vera, Nadia Ferrari, Shuo Sun, Jiashun Hu, Siddhant Dutt, YuXin Xing
Alliant University
IST 6010
Professor Kumar
The US Census Service is need of analysis for some recently released data, pertaining to housing in the Boston Massachusetts area. The reports generated from this analysis will be completed using an Excel spreadsheet. Excel features calculations, graphing tools, pivot tables, “what if “ scenarios, along with other data analysis functions. It has been a very widely applied spreadsheet for these platforms, and is considered the industry standard for spreadsheets.
The US Census is looking to gather the results for the following scenarios. The function used in …show more content…
the Excel tool is known as a pivot table. In data analysis, a pivot table is a data summarization tool found in data visualization programs or business intelligence software. Among other functions, a pivot-table can automatically sort, count, total or give the average of the data stored in one table or spreadsheet. In addition, we will use other functions of excel, such as charts, filters & statistics, all calculated from the given dataset.
The dataset contains information collected from the StatLib Archive (http://lib.stat.cmu.edu/datasets/boston). The dataset contains 506 cases. The data was originally published by Harrison, D. & Rubinfeld, D.L ‘Hedonic prices & the demand for clean air”, J. Environ. Economics & Management, vol, 81-10, 1978
There are 14 attributes in each case of the dataset. They are:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town.
CHAS Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
AGE proportion of owner-occupied units built prior to 1940
RM average number of rooms per dwelling
DIS weighted distances to five Boston employment centers
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000
TASK 1-The census department would like to display the neighborhoods that are bordering the Charles River with 6-7 bedrooms, that have an average house value of 25.92 ($000). Using multiple criteria, # of rooms & location.
Average of MEDV
Column Labels
Row Labels
0
1
Grand Total
3-4
25.30
25.30
4-5
16.02
16.02
5-6
17.13
22.22
17.49
6-7
21.77
25.92
22.02
7-8
35.96
44.07
36.92
8-9
45.70
35.95
44.20
Grand Total
22.09
28.44
22.53
Solution:
Column labels
CHAS
Row Labels
RM
Values average value of MEDV
TASK 2-The Boston police department is deciding what neighborhoods to allocate funds for public safety and decreasing crime. They would also like to know what effect employment centers have on the crime rate of a neighborhood, and if more should be built.
Row Labels
Average of MEDV
Average of DIS
0.005-5.005
24.49
4.29
5.005-10.005
18.05
2.15
10.005-15.005
14.53
1.84
15.005-20.005
12.02
1.70
20.005-25.005
10.07
1.51
25.005-30.005
10.77
1.61
35.005-40.005
7.95
1.68
40.005-45.005
8.50
1.61
45.005-50.005
7.00
1.66
50.005-55.005
15.00
1.41
65.005-70.005
5.00
1.43
70.005-75.005
8.80
1.80
85.005-90.005
10.40
1.42
Grand Total
22.53
3.80
Solution:
Column labels values Row Labels
CRIM
Values average value of MEDV and DIS
—Note:
The crime rate is lower the further away (DIS) it is from the MEDV
Task 3- The government has determined that the nitric oxide coming off the Charles River is building and has determined that concentrations of 0.45 and above are dangerous.
Find neighborhoods where these levels of concentrations are dangerous. Also human rights group are interested in the correlation between median values of homes and the levels of nitric oxide concentrations
Count of MEDV
Column Labels
Row Labels
0
1
Grand Total
0.35-0.4
7
7
0.4-0.45
117
5
122
0.45-0.5
56
7
63
0.5-0.55
96
5
101
0.55-0.6
45
3
48
0.6-0.65
51
5
56
0.65-0.7
36
1
37
0.7-0.75
47
1
48
0.75-0.8
4
4
8
0.85-0.9
12
4
16
Grand Total
471
35
506
Solution:
Column labels
CHAS
Row Labels
NOX
Values
COUNT OF CHAS
Solution: In this instance 7 tract bound the river and 56
otherwise.
TASK 4- Census would like to know the # pupil-teacher ratio by town
Row Labels
Average of MEDV
Average of TAX
Average of LSTAT
12.6-13.6
36.97
277.00
7.80
13.6-14.6
50.00
226.50
3.07
14.6-15.6
27.06
358.61
11.22
15.6-16.6
25.30
301.80
8.75
16.6-17.6
27.49
306.90
7.91
17.6-18.6
25.79
283.26
9.17
18.6-19.6
23.34
297.32
11.38
19.6-20.6
17.05
608.25
17.16
20.6-21.6
18.08
358.65
14.81
21.6-22.6
19.40
334.00
6.81
Grand Total
22.53
408.24
12.65
Solution:
Column labels
Values
Row Labels
PTRATIO
Values
Average value of MEDV TAX and LSTAT(%)
Task 5-
The Boston housing commission would like to classify homes as either Small, Medium or Large depending on the number of rooms in the house. Create a function that will label the homes as either Sm, Med, or Lrg. The criteria for each are described below:
Solution:
Small homes have less than or equal to 4.5 rooms
Medium homes have more than 4.5 and less than or equal to 6.5 rooms
Large homes have greater than 6.5 rooms
We have presented to you the US Census scenario, presented issues, and solutions to those issues. The use of the Excel Database management system allows us to analyze the Boston Housing data by using the set & solution function.
References
(http://lib.stat.cmu.edu/datasets/boston).
Data Mining for Business Intelligence, 2nd Edition, Shmueli, Patel & Bruce