2. List the six steps in a CRISP-DM (Cross-Industry Standard Process for Data Mining). (answer in ch5-slide16) Step 1: Business Understanding Step 2: Data Understanding Step 3: Data Preparation (!) Step 4: Model Building Step 5: Testing and Evaluation Step 6: Deployment
The first three steps are Accounts for ~85% of total project time
3. List four ways that cluster analysis for data mining can be used in.(answer in ch5-slide26) Clustering results may be used to: 1- Identify natural groupings of customers 2- Identify rules for assigning new cases to classes for targeting/diagnostic purposes 3- Provide characterization, definition, labeling of populations 4- Decrease the size and complexity of problems for other data mining methods 5- Identify outliers in a specific domain (e.g., rare-event detection)
4. List four data mining myths.(answer in ch5-slide38)
1-provides instant solutions/predictions 2- Is not yet viable for business applications 3- Requires a separate, dedicated database 4- Can only be done by those with advanced degrees 5- Is only for large firms that have lots of customer data 6- Is another name for the good-old statistics 5. What is a data mart? What is the difference between dependent and independent data marts? (answer in ch8-slide6)