a. Supervised-Deciding whether to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers).
b. Unsupervised-In an online bookstore, making recommendations to customers concerning additional items to buy based on the buying patterns in prior transactions.
c. Supervised-Identifying a network data packet as dangerous (virus, hacker attack) based on comparison to other packets whose threat status is known.
d. Unsupervised-Identifying segments of similar customers.
e. Supervised-Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and non-bankrupt firms.
f. Unsupervised-Estimating the repair time required for an aircraft based on a trouble ticket.
g. Supervised-Automated sorting of mail by zip code scanning.
h. Unsupervised-Printing of custom discount coupons at the conclusion of a grocery store checkout based on what you just bought and what others have bought previously.
2.3 Consider the sample from a database of credit applicants in Figure 2.13. Comment on the likelihood that it was sampled randomly, and whether it is likely to be a useful sample. I don’t think that the sample was random because records are taken from 8th person. If the sample were to be random it would vary more. I don’t think that the sample would be useful either because of the type of variables that are being used.
2.5 Using the concept of overfitting, explain why when a model is fit to training data, zero error with those data is not necessarily good.
It’s not good because when looking at models you want to see the relationship between the data if there are zero error in the data then the information you get is skewed and may not be a true reflection.
2.7 A dataset has 1000 records and 50