Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Core Ideas in Data Mining
Classification
Prediction
Association Rules
Data Reduction
Data Visualization and exploration
Two types of methods:
Supervised and Unsupervised learning
Supervised Learning
Goal: Predict a single “target” or “outcome”
variable
Training data from which the algorithm
“learns” – value of the outcome of interest is known Apply to test data where value is not known
and will be predicted
Methods: Classification and Prediction
Unsupervised Learning
Goal: Segment data into meaningful
segments; detect patterns
There is no target (outcome) variable to
predict or classify – no need to partition data Methods: Association rules, data reduction
& exploration, visualization, clustering
Supervised
Classification:
Goal: Predict categorical target (outcome) variable
Examples: Purchase/no purchase, fraud/no fraud,
creditworthy/not creditworthy…
Target variable is often binary (yes/no)
Prediction
Goal: Predict numerical target (outcome) variable
Examples: sales, revenue, performance
Taken together, classification and prediction
constitute predictive analytics
Unsupervised: Association
Rules
Goal: Produce rules that define “what goes
with what”
Example: “If X was purchased, Y was also purchased” Rows are transactions
Used in recommender systems – “Our records show you bought X, you may also like Y”
Amazon.com, Netflix.com
Also called affinity analysis or market
basket analysis
Unsupervised
Data Reduction
Distillation of complex/large data into
simpler/smaller data
Reducing the number of variables/columns (e.g., principal components)
Reducing the number of records/rows (e.g., clustering) Data Visualization
Graphs and plots of data – histograms, scatterplots
Especially useful to examine relationships between
pairs of variables
Also useful to detect problems with data
Steps in Data Mining
1. Define/understand purpose
2.