4. Discuss the benefits and drawbacks of a binary tree versus a bushier tree. The structure of binary is simple than a bushier tree. Each parent node only has two child. It save the storage space. Besides, binary tree may deeper than bushier tree. The result record of binary may not very refine. 5. Construct a classification and regression tree to classify salary based on the other variables. Do as much as you can by hand, before turning to the software. Data: NO. 1 2 3 4 5 6 7 8 9 10 11 Staff Sales Management Occupation Service Gender Female Male Male Male Female Male Female Female Male Female Male Age 45 25 33 25 35 26 45 40 30 50 25 Salary $48,000 $25,000 $35,000 $45,000 $65,000 $45,000 $70,000 $50,000 $40,000 $40,000 $25,000 Level Level 3 Level 1 Level 2 Level 3 Level 4 Level 3 Level 4 Level 3 Level 2 Level 2 Level 1
Candidate Splits for t=Root Node
Candidate Split 1 2 3 Left Child Node, tL Occupation = Service Occupation = Management Occupation = Sales Right Child Node, tR Occupation = {Management, Sales, Staff} Occupation = {Service, Sales, Staff} Occupation = {Service, Management, Staff}
4 5 6 7 8 9 10 11 12
Occupation = Staff Gender = Female Age 45
Values of the Components of the Optimality Measure =(s|t) for each candidate split, for the Split PL PR P(L=1|tL) P(L=2|tL) P(L=3|tL) P(L=4|tL) P(L=1|tR) P(L=2|tR) P(L=3|tR) P(L=4|tR) 2PLPR ∅(s|t)
Root Node
1 2 3 4 5 6 7 8 9
0.27 0.73
0.33
0.33
0.33
0.00
0.13
0.25
0.38 0.29
0.25
0.40
0.23
0.36 0.64 0.00 0.18 0.82 0.00 0.18 0.82 0.50 0.45 0.55 0.00 0.27 0.73 0.67 0.36 0.64 0.50 0.45 0.55 0.40 0.55 0.45 0.33
0.00 0.50 0.50 0.20 0.00 0.00 0.20 0.33 0.29 0.25 0.20
0.50 0.50 0.00 0.40 0.33 0.50 0.40 0.33 0.29 0.38 0.40
0.50 0.00 0.00 0.40 0.00 0.00 0.00 0.00 0.14 0.13 0.20
0.29 0.22 0.11 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.43 0.22 0.22 0.33 0.38 0.43 0.33 0.20 0.25 0.33 1.00
0.00 0.22 0.22 0.00 0.25 0.29 0.33 0.40 0.25 0.33