Preview

Data Mining-East West Airlines

Good Essays
Open Document
Open Document
642 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Data Mining-East West Airlines
DATA MINING FOR
POTENTIAL
CUSTOMERS: East –
West Airlines/Telcon
Jermaine Paul

12/12/2013

BUSINESS PROBLEM
East-West Airlines (EA) is entering into partnership with the cellular service provider, Telcon, by marketing their service through direct mail. In order to achieve this, EA dataset is provided to categorize their customers to identify which ones would be likely to purchase Telcon’s services through direct mail. If the accurate categorization is done the partnership will save valuable resources by sending out offers to customers who are likely to accept. The dataset from EA contains 15 variables, which represents spending activity and flight patterns. The task is to use this data and classify existing customers as to whether they would buy Telcon’s service or not using the Naïve Bayes classification model. If the model used is successful then it can be deployed on future customers to categorize potential acceptance.
The data mining model chosen for this project is the Naïve Bayes classification model. This model makes no assumptions about the data and is used primarily for classification; not prediction. This model is works well with large datasets and is simple and computationally efficient in setting up.
DATA PREPARATION
The dataset contains 15 variables. Considering the number of variables in the dataset, data reduction is undertaken to identify variables that are correlated and by extensions reducing multicollinearity. 1|Page

From the correlation analysis above we see that 4 variables have a high correlation. These are:
1).flight_trans_12mo and Flight_miles_12 mo
2). any _cc_miles_12mo and cc1_miles

Data reduction will be undertaken by removing variables Flight_trans_12mo and cc1_miles.
MODEL ANALYSIS
The Naïve Bayes classification model will now be applied to the reduced variable dataset. The first step is partitioning the data using standard portioning in the ratio 60:40 for training and validation data

You May Also Find These Documents Helpful

  • Good Essays

    Nt1330 Unit 5 Study Guide

    • 398 Words
    • 2 Pages

    3. Use MS Excel to find the least-squares regression line for these data. Record the equation, paying attention to precision.…

    • 398 Words
    • 2 Pages
    Good Essays
  • Good Essays

    If the stream is not old and data that is accumulated is not old , then the given updatable Naïve Bayes can have fast adaptation to concept changes and solve problem of Concept drift.…

    • 496 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Classification Solution in order for you to get the high ratings in ones very own research paper.…

    • 1414 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    Cis 500 Data Mining Report

    • 2046 Words
    • 9 Pages

    This report is an analysis of the benefits of data mining to business practices. It also assesses the reliability of data mining algorithms and with examples. “Data Mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques…

    • 2046 Words
    • 9 Pages
    Powerful Essays
  • Good Essays

    In order to succeed in today’s highly competitive market more and more businesses regardless of their size or structure are looking for easier ways to communicate with the outside world; for that reason they are considering the incorporation of electronic commerce (e-commerce) into their business plans. Some of the ways that small businesses can benefit from e-commerce and internet trading is reduced advertising, promotion costs and at the same time they are able to increase the amount of consumer demographic information they collect as it is done easier and faster over the internet than with a traditional way of advertisement. We all know that well informed business is able to understand consumer purchasing patterns and make appropriate decisions in order to enhance customer-service and…

    • 8462 Words
    • 34 Pages
    Good Essays
  • Better Essays

    We use many models to make classification and prediction. The three models are multiple linear regression, classification tree and neural network.…

    • 1929 Words
    • 8 Pages
    Better Essays
  • Good Essays

    The Naive Bayes classification is a good medium to many user modeling situations, as in the “Iris” data set, given its advantages of fast learning or intuition and low structural cost. It would work the following way: Suppose your data consisted of vegetables, described by their color and shape. This would work by saying "If you see a vegetable that is green and spherical, what type of vegetable is it most likely to be, based on the data? In the future, classify green and spherical vegetables as that type of vegetable."…

    • 930 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Data Mining

    • 350 Words
    • 2 Pages

    There are several different types of models and algorithms used to “mine” the data. These include, but are not limited to, neural networks, decision trees, rule induction, boosting, and genetic algorithms.…

    • 350 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] Agrawal, R. and Psaila, G. "Active data mining." KDD-95, 1995. Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207-216. Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996. Dong, G. and Li, J. “Efficient mining of emerging patterns: discovering trends and differences.” KDD-99, 1999. Freund, Y and Mansour, Y. “Learning under persistent drift” Computational learning theory: Third European conference, 1997. Ganti, V., Gehrke, J., and Ramakrishnan, R. "A framework for measuring changes in data characteristics" POPS-99. Helmbold, D. P. and Long, P. M. “Tracking drifting concepts by minimizing disagreements.” Machine Learning, 14:27, 1994. Johnson T. and Dasu, T. "Comparing massive high-dimensional data sets," KDD-98. Lane, T. and Brodley, C. "Approaches to online learning and concept drift for user identification in computer security." KDD-98, 1998. Liu, B., Hsu, W., “Post analysis of learnt rules." AAAI-96. Liu, B., Hsu, W., and Chen, S. “Using general impressions to analyze discovered classification rules.” KDD-97, 1997, pp. 31-36. Merz, C. J, and Murphy, P. UCI repository of machine learning databases [http://www.cs.uci.edu/~mlearn/MLRepository.html], 1996. Moore, D.S. “Tests for chi-squared type.” In: R. B. D’Agostino and M. A. Stephens (eds), Googness-of-Fit Techniques, Marcel Dekker, New York, 1996, pp. 63-95. Nakhaeizadeh, G., Taylor, C. and Lanquillon, C. “Evaluating usefulness of dynamic classification”, KDD-98, 1998. Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992. Silberschatz, A., and Tuzhilin, A. “What makes patterns interesting in knowledge discovery systems.” IEEE Trans. on Know. and Data Eng. 8(6), 1996, pp. 970-974. Widmer, G. "Learning in the presence of concept drift and hidden contexts." Machine learning, 23 69-101, 1996.…

    • 4961 Words
    • 20 Pages
    Powerful Essays
  • Best Essays

    Data mining a process for assembling and analyzing data into useful information can be applied as rapid measures for malaria diagnosis. In this research work we implemented (knowledge-base) inference engine that will help in mining sample patient records to discover interesting relationships in malaria related cases. The computer programming language employed was the C#.NET programming language and Microsoft SQL Server 2005 served as the Relational Database Management System (RDBMS). The results obtained showed that knowledge-based data mining system was able to successfully mine out and diagnose possible diseases corresponding to the selected symptoms entered as query. With this finding, we believe the development of a Knowledge-based data mining system will not only be beneficial towards the diagnosis of malaria related cases in a more cost effective means but will assist in crucial decision making and new policy formulation in the malaria endemic regions.…

    • 2816 Words
    • 12 Pages
    Best Essays
  • Powerful Essays

    In data mining association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Piatetsky-Shapiro describes analyzing and presenting strong rules discovered in databases using different measures of…

    • 6801 Words
    • 28 Pages
    Powerful Essays
  • Satisfactory Essays

    The Handbook of News Analytics \ in Finance Edited by Gautam Mitra and Leela Mitra WILEY A John Wiley and Sons, Ltd, Publication Contents Preface xiii Acknowledgements xvii…

    • 1789 Words
    • 22 Pages
    Satisfactory Essays
  • Good Essays

    The CRISP-DM Case Study

    • 872 Words
    • 4 Pages

    Classification is the derivation of a function or model which determines the class of an object based on its attributes. A set of objects is given as the training set in which every object is represented by a vector of attributes along with its class. The examples of classification model can be used to diagnose a new patient’s disease based on the patient’s diagnostic data such as age, sex, weight, temperature and blood pressure.…

    • 872 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    and Table 5.4, it can be seen that Random Forest algorithm performs fairly well in both…

    • 854 Words
    • 4 Pages
    Satisfactory Essays
  • Powerful Essays

    Newspaper Article Classifier

    • 6617 Words
    • 27 Pages

    [6] David D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In…

    • 6617 Words
    • 27 Pages
    Powerful Essays