Assignment 3 – CELL2CELL Case
1. What are the Business Objective(s) and Data Mining Objective(s) for the case?
Business Objectives
To develop a proactive retention program (including incentive plans) to reduce the customer churn
Data Mining Objectives 1. To predict churn accurately 2. To identify key factors that drive customer churn 2. Based on initial data understanding (Using multiplot/ statexplore node), what are some initial obvious results?
Data Understanding 1. Using StatExplore to find out the relative importance of variables
From the statexplore, we found that EQPDAYS is the most important variable as it has the highest Worth(Sum) and ChiSquare Sum.
2. Choosing which variables to reject
“csa” variable is already rejected in the data-set as it is a string (text) variable, “churndep” is rejected because it has missing values for validation set. Others variables must first be analyzed in the multiplot and statistical model before rejecting them. As a general rule a good input variable is one which supports the data model as well as makes business sense
3. Using MultiPlot to observe the distribution of various variables against the Target
From these plots and the table above we can see that the variables EQPDAYS, MONTHS, MOU, CHANGEM, DROPBLK, DROPVCE, RECCHRGE and REVENUE are able to explain “Churn” in a better way than the other variables. 4. Using StatExplore to search for variables having missing values
As we can see, there are a few missing values for the variables AGE1, AGE2, CHANGEM, CHANGER, DIRECTAS, MOU, OVERAGE, RECCHRGE, REVENUE and ROAM.
3. Run at least 6 models on SAS - Decision Trees (binary and three way tree), Logistic Regression, Logistic Regression with Transform Variables, Neural Networks, Neural Networks after selection of variables/ transform variables).
Initial Data Preparation 1. Partitioning the data
The data needs to