4.1 Decision Tree
The Decision Tree procedure creates a tree-based classification model. It classifies cases into groups or predicts values of a dependent (target) variable based on values of independent (predictor) variables. The procedure provides validation tools for exploratory and confirmatory classification analysis.
The procedure can be used for:
Segmentation Identify persons who are likely to be members of a particular group.
Stratification Assign cases into one of several categories, such as high-, medium-, and low-risk groups.
Prediction Create rules and use them to predict future events, such as the likelihood that someone will default on a loan or the potential resale value of a vehicle or home.
Data reduction and variable screening Select a useful subset of predictors from a large set of variables for use in building a formal parametric model.
Interaction identification Identify relationships that pertain only to specific subgroups and specify these in a formal parametric model. Category merging and discrediting continuous variables. Recode group predictor categories and continuous variables with minimal loss of information.
Example A bank wants to categorize credit applicants according to whether or not they represent a reasonable credit risk. Based on various factors, including the known credit ratings of past customers, you can build a model to predict if future customers are likely to default on their loans.
A tree-based analysis provides some attractive features: It allows you to identify homogeneous groups with high or low risk. It makes it easy to construct rules for making predictions about individual cases.
4.1.1 Data Considerations
Data The dependent and independent variables can be:
Nominal A variable can be treated as nominal when its values represent categories with no intrinsic ranking (for example, the