Business Intelligence and Data Mining - Decision Trees

INDIAN INSTITUTE OF MANAGEMENT, INDORE
Post Graduate Programme – Term IV – AY 20012-13
Business Intelligence And Data Mining
Group Assignment on NGO Donations Maximization

Abstract

The problem is associated to devising a strategy to maximize the profits from a Direct Marketing Campaign to a selected group of customers while minimizing costs . The exercise requires the use of Business Intelligence tools and techniques to build a model , trained and tested on the historical data for the last year’s donation raising campaign . From this model it should be possible to predict the profitability of a prospective donor , hence allowing a more targeted campaign at lower cost . The difficulty is due to extremely imbalanced data and the inverse correlation between the probability of response and the dollar amount generated from it . The available data set and problem is of the KDD-CUP-98 challenge . The solution would be applicable to any direct marketing campaign which has historical data available .

Table of Contents Introduction 4 Performance Based Management 4 Balanced Scorecard 4 Problem in implementation of BSC 8 Literature Review 8 Company Name: Cipla 10 Introduction of the company 10 History 11 Vission & Mission of Cipla 12 Scorecard for Cipla 12 Market 12 Culture 12 Internal 13 R&D 13 Key Learning 15 Outcome/Conclusion 16 References 16

Introduction

The KDD-CUP-98 challenge is related to creation of a model trained and tested on historical data and capable of providing a prediction on the potential donors so as to maximise profit . It will provide a good mailing list so as to target only valuable customers . Typically the existing models predict future response behaviour . The historical database has information about mailing campaigns in the past and the response of customers and the collected dollar amount . The model should predict current customers who are likely to respond and maximize net profit

References: 16 Introduction The KDD-CUP-98 challenge is related to creation of a model trained and tested on historical data and capable of providing a prediction on the potential donors so as to maximise profit . It will provide a good mailing list so as to target only valuable customers . Typically the existing models predict future response behaviour . The historical database has information about mailing campaigns in the past and the response of customers and the collected dollar amount . The model should predict current customers who are likely to respond and maximize net profit ( Donation amount – Mailing cost ) over the contacted customers . The records are from the results of the 1997 Paralyzed Veterans of America fundraising mailing campaign and only 5% records are responders . Thus classification with response value can give 95% accuracy . An approach in ranking customers by estimated probability to respond and selecting top portion , if top 5% of the list contains 30% of responders and hence a lift of 6 , but the drawback is not using the donation amount for the customer . Here there is an inverse correlation between probability to donate and dollar amount as the donors donating higher amount are more cautious . Therefore probability based ranking tends to rank down valuable customers . Another method which adapts accuracy to cost-sensitive learning tries to minimize cost but since the initial list considers probability of response and then considers profitability , tends to ignores valuable consumers who are usually infrequent . The tweaked use of association rules leads to better result then the above suggested methods . It involves the identification of subsets of attributes which are correlated to “respond class” and then a small subset of generated association rules to identify potential customers in the current campaign . The solution tries to increase customer value by selecting association rules and increase profitability over the current customers . Negative association rules may also suggest , given some attributes the chances of not donating . The association rules do not tell how to maximize an objective function especially when there is inverse correlation . The dataset has 191,799 records of customers contacted in the 1997 mailing campaign . Each record has 479 non-target variables and two target variables indicating respond / not_respond and actual donation in dollars . 5% records are respond records and dataset is split into 50% for learning and 50% for validation . The customers are to be evaluated and predicted based on a mailing cost of $0.68 .The inverse correlation could exist in offering for the same customer which can be reduced by avoiding multiple mailings within a time period or for different customers meaning many small contributions and few big customers . The second type of inverse correlation has to be addressed . It can be done in two steps obtain probability estimation from decision trees and re-rank it using customer value , but this also ignores the value in the first step . The other problem is high dimensionality , having 481 variables and small target population leading to difficulty in identifying features for respond class . The one attribute at a time “ gain criterion “ does not search for correlated variables although it is good for maximising class probability but not when non-maximum class probability is also used for ranking customers .The notion of focussed association rules leads to features typical of response class and not of not_respond class i.e. a subset of variables in the respond class which occur infrequently in the not_respond class . This leads to data pruning of not_respond class leading to solution to scarcity of data in target class and also removal variables that are frequent in the non_respond class . The focussed association rules can then be converted into a model for predicting the donation amount for a customer by trying to cover customers using these rules and pruning over-fitting rules and estimating donation amount for rules . The assumption is that current customers follow the same class and donation distribution as that of historical records . Rule Generation ,finds a set of good rules that capture features of responders , Model Building combines rules into prediction model for donation amount and Model Pruning prunes rules that do not generalize to the entire population . Our Approach

Business Intelligence and Data Mining - Decision Trees

You May Also Find These Documents Helpful

NTC 362 Week 3 Team Assignment INDP Part 2

NTC 362 Week 3 Team Assignment INDP Part 2

Service Request Sr-Kf-013 Paper

Service Request Sr-Kf-013 Paper

Harnischfeger Case Summary

Harnischfeger Case Summary

Bus 155

Bus 155

Groupon Marketing Plan

Groupon Marketing Plan

Essay Rough Draft

Essay Rough Draft

Kool-Aid

Kool-Aid

Kudler Dimensional Model Hands-on-Project

Kudler Dimensional Model Hands-on-Project

Week 10 Assignment

Week 10 Assignment

Ace Auto Problem Analysis Report

Ace Auto Problem Analysis Report

qat1task5

qat1task5

Markstrat Final Report

Markstrat Final Report

Captain Vere In Billy Budd, By Herman Melville

Captain Vere In Billy Budd, By Herman Melville

Airbus A3Xx

Airbus A3Xx

Statistics Business Paper

Statistics Business Paper

Related Topics