a. How would this customer be classified?
A. This customer would be classified as not accepting the personal loan offer. According to the KNN_Output there appears to be overfitting due to the discrepancies in the classification matrix for training (Class 0 = 0% error, Class 1 = 0% error, Overall = 0% error), and validation error (Class 0 = 4.2% error, Class 1 = 55.85% error, and Overall = 9.1% error).
b. What is a choice of k that balances between overfitting and ignoring the predictor information?
A. A choice of k that balances between overfitting and ignoring the predictor would be k = 6. The value is chosen because it minimizes the % validation error. After testing various k levels. According to the validation error log for different k the best k points to 6, where %error training is 7.4% and validation % error is 8.75%.
c. Show the classification matrix for the validation data that results from using the best k.
d. Classify the customer using the best k
A. According to the best k the customer would not be inclined to accept the personal loan.
e. Re-partition the data, this time into training, validation, and test sets (50%: 30%: 20%). Apply the k-NN method with the k chosen above, compare the classification matrix of the test set with that of the training and validation sets. Comment on the differences and their reason.
A. Based on the training, validation, and test matrices we can see a steady increase in the percentage errors. There does not appear to be overfitting due to the minimal error discrepancies among all three matrices, from the training to the validation error there is a 5.69% difference, and from validation to test error there is a 14.05% error difference. Based on the lift chart, the model appears to make a difference even though the loan acceptance has a 82% error rate for the test classification matrix.
9.3
i. Compare the tree generated by the CT with the one generated by the RT. Are they