2.1 Basic Concepts of Classification and Prediction 2.1.1 Definition 2.1.2 Classification vs. Prediction 2.1.3 Classification Steps 2.1.4 Issues of Classification and Prediction 2.2 2 2 Decision Tree Induction 2.2.1 The Algorithm 2.2.2 Attribute Selection Measures 2.2.3 Tree P 223T Pruning i 2.2.4 Scalability and Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.5 Lazy Learners 2.6 2 6 Prediction 2.7 How to Evaluate and Improve Classification
2.1.1 Definition
Classification is also called Supervised Learning Supervision
The t i i Th training d t ( b data (observations, measurements, etc) are used to ti t t ) dt Training data learn a classifier The training data are labeled data New data (unlabeled) are classified Using the training data
Unlabeled data Age 29 Income 25K Classifier Age 27 35 65 Income 28K 36K 45K Class label Budget-Spenders Budget Spenders Big-Spenders Budget-Spenders
Class label [Budget Spender] Numeric value [Budget Spender (0.8)]
Principle
Construct models (functions) based on some training examples Describe and distinguish classes or concepts for future prediction Predict P di t some unknown class labels k l l b l
2.1.2 Classification vs. Prediction
Classification Predicts categorical class labels (discrete or nominal) Use labels of the training data to classify new data Example
Will buy a computer
Classifier
Customer profile Will not
Prediction Models continuous-valued functions, i.e., predicts unknown or missing values Example A marketing manager would like to predict how much a given costumer will spend during a sale
Customer profile
Numeric Prediction
150 Euro
A model or classifier is contsructed to predict categorical labels such as “safe” or “risky” for a loan application data data.
Unlike classification, it provides ordered values Regression analysis is used for prediction Prediction is a short name for numeric