Business Intelligence
3. Classification using SAS Enterprise Miner
In this question you will analyze the JUNKMAIL dataset found in the SASHELP library. Follow the procedure we used for analyzing the HMEQ dataset. Detailed instructions for the HMEQ analysis are given in the emcs.pdf document.
You will need to create and execute the process flow diagram shown above. Further requirements for analyzing JUNKMAIL are as given below:
This data will be used to classify emails as junk mail or not. Create the data source and set the role for all variables, including the target variable appropriately.
You can use the default values for everything else when creating the Data Source
Partition the data into a 60/40 split with no data being used for Testing.
Follow the steps shown in the process diagram.
You will try out four different models as described below:
Regression: This model is the default regression model with the original data
Regression – No Model Selection: This is the default regression model after transforming the variables as described below.
Regression – Stepwise: This is the Regression model using stepwise regression and transformed data
Decision Tree: This is the default decision tree model using transformed data
Transform Variables:
Transform all variables using log value
Model Comparison: Run with Selection Statistic set to Misclassification Rate
Now answer the following questions:
1. Which model is selected as the best one by the Model Comparison Node? Regression on the original data.
2. What is the training misclassification rate for this model? What is the validation misclassification rate?
Training Misclassification rate : 0.064879
Validation Misclassification Rate : 0.077090
3. What are the first four most important variables used
Exclamation
CapAvg
Remove
HP
4. What is the