I created 6 models for this project, which are DT1, DT2, Reg1, Reg2, Reg3, and NN. After testing, the parameters I used to predict “IsBadBuy” in all my models are: PurchDate, Auction, VehicleAge, Transmission, WheelType, VehOdo, All “MMRs”, VehBCost, IsOnlineSale, and WarrantyCost. Those parameters together can help me get better models (i.e. ROC Area > 0.7)
I used the cut-off of 0.6, because after trying out other cut-offs such as 0.5, 0.7, and 0.8, the results were either “I’m eliminating too many Good Buys”, or “I’m accepting too many Bad Buys”. As we know, both of the situations will affect the business (i.e. if we want stronger confident of the model, we will have too many 0s in the result, which means we may accept more Bad Buys in accident). Finally, I decided to use 0.6 as my cut-off to balance the situation.
The best model I chose is Reg2 (Forward regression model). I have two reasons: First, Reg2 has the largest ROC Area in the Logistic Fit compression (Saved as “Lodistic1~6”), which is 0.7478; Second, it has a relatively low (the second smallest) number in the FalseNegative box from the Contingency Table among all models. For my second reason, I didn’t use overall accuracy because I think the FalseNegative will damage the business more than FalsePossitive does. Because accidentally having a BadBuy will cost the company to do all require and fix job.
For the Value-added calculation, as we can see in the Contingency tables (Saved as