Preview

Naive Bayes

Good Essays
Open Document
Open Document
7200 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Naive Bayes
Chapter 9
Na¨ve Bayes ı
David J. Hand

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Despite Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 9.2 9.3 9.4 9.5 9.6 163 164 167 169 171 171 171 173 174 175 176

9.1 Introduction
Given a set of objects, each of which belongs to a known class, and each of which has a known vector of variables, our aim is to construct a rule which will allow us to assign future objects to a class, given only the vectors of variables describing the future objects. Problems of this kind, called problems of supervised classification, are ubiquitous, and many methods for constructing such rules have been developed. One very important method is the na¨ve Bayes



References: 177 Hand D.J. and Yu K. (2001) Idiot’s Bayes—not so stupid after all? International Statistical Review, 69, 385–398. Hastie T.J. and Tibshirani R.J. (1990) Generalized Additive Models. London: Chapman and Hall. Jamain A. and Hand D.J. (2005) The na¨ve Bayes mystery: A statistical detective ı story. Pattern Recognition Letters, 26, 1752–1760. Jamain A. and Hand D.J. (2008) Mining supervised classification performance studies: A meta-analytic investigation. Journal of Classification, 25, 87–112. Langley P. (1993) Induction of recursive Bayesian classifiers. Proceedings of the Eighth European Conference on Machine Learning, Vienna, Austria: SpringerVerlag, 153–164. Mani S., Pazzani M.J., and West J. (1997) Knowledge discovery from a breast cancer database. Lecture Notes in Artificial Intelligence, 1211, 130–133. Metsis V., Androutsopoulos I., and Paliouras G. (2006) Spam filtering with na¨ve ı Bayes—which na¨ve Bayes? CEAS 2006—Third Conference on Email and Antiı Spam, Mountain View, California. Sahami M., Dumains S., Heckerman D., and Horvitz E. (1998) A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization—Papers from the AAAI Workshop, Madison, Wisconsin, pp. 55–62. Titterington D.M., Murray G.D., Murray L.S., Spiegelhalter D.J., Skene A.M., Habbema J.D.F., and Gelpke G.J. (1981) Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society, Series A, 144, 145–175. © 2009 by Taylor & Francis Group, LLC

You May Also Find These Documents Helpful

  • Powerful Essays

    ICD, was develop by the World Health Organization, is designed to promote international comparability in collecting, processing, classifying and presenting mortality statistics. ICD-10 was released officially by WHO in 1993 and has been implemented in many European countries. In 1999, ICD-10 was implemented officially in the United States for reporting the cause of death on death certificates, but it has not been implemented to submit insurance claims.…

    • 977 Words
    • 4 Pages
    Powerful Essays
  • Better Essays

    Business Strategy: Tasba

    • 2937 Words
    • 12 Pages

    Founded by Mr. Patrick Reid, The Texas AllStar Baseball Academy, TASBA, has been in existence since 1994. It began as a 2,000 square foot training facility in Houston, Texas that offered private baseball instruction. In June 2002, Reid relocated to south Austin, next to the National Elite Gymnastics Facility in Oak Hill. TASBA is presently a small group training facility and the leader in Functional Athletic Movement. It is a complete athletic experience for the committed amateur baseball player. The new location is a six acre outdoor training facility with six lighted batting cages, seven lighted pitching mounds, a long toss field used for measurement, an obstacle course, and a field turf training field. TASBA is the only Austin-based training facility endorsed by Ron Wolforth, the CEO of the Texas Baseball Ranch and originator of the Velocity Improvement Arm Care Program. The program has produced 175 scholarship athletes to over 70 schools. Over 30 former students have signed professional baseball contracts, including 3 first round draft picks. Six players have even worn a Major League Baseball jersey. Differentiating itself from other baseball facilities, TASBA does not sell uniforms or bats, does not hold birthday parties or tournaments for other sports. The average training sessions are paid by the hour. Parents are typically the financial provider and purchase blocks of hours, depending on their needs. The normal sessions are held Monday through Thursday.…

    • 2937 Words
    • 12 Pages
    Better Essays
  • Satisfactory Essays

    Study Guide

    • 1489 Words
    • 6 Pages

    SYLLABUS.......................................................................................................................................................................1 Course Summary........................................................................................................................................................2 Learning Materials and References ...........................................................................................................................4 Course Outline ...........................................................................................................................................................5 Evaluation and Grading .............................................................................................................................................9 Study Guide .................................................................................................................................................................11 Week 1 .....................................................................................................................................................................12 Week 2 .....................................................................................................................................................................14 Week 3 .....................................................................................................................................................................17 Week 4 .....................................................................................................................................................................19 Week 5 .....................................................................................................................................................................22 Week 6…

    • 1489 Words
    • 6 Pages
    Satisfactory Essays
  • Best Essays

    Johnson, R. (2011). Data Classification/Handling and Risk Management. Security policies and implementation issues (p. 278). Sudbury, Mass.: Jones & Bartlett Learning.…

    • 1134 Words
    • 5 Pages
    Best Essays
  • Better Essays

    Did you know in the 1920’s two hundred and twenty seven gangsters were killed in the space of four years in Chicago(Chamernik, Mike). The period of Prohibition was very important part of America’s history . During Prohibition there was the mafia and their notorious characters such as Al Capone and the young Federal Bureau of Investigation. In the 1920’s during the period of prohibition a new kind of Gangsters came about which specialized in illegally transportation and selling of alcohol.…

    • 1754 Words
    • 8 Pages
    Better Essays
  • Powerful Essays

    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5…

    • 5536 Words
    • 23 Pages
    Powerful Essays
  • Satisfactory Essays

    2.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .…

    • 7804 Words
    • 32 Pages
    Satisfactory Essays
  • Best Essays

    It Essay - Data Mining

    • 1998 Words
    • 8 Pages

    He, J. (2009). Advances in Data Mining: History and Future. Third International Symposium on Intelligent . Retrieved November 1, 2012, from http://ieeexplore.ieee.org.ezproxy.lib.ryerson.ca/stamp/stamp.jsp?tp=&arnumber=5370232&tag=1…

    • 1998 Words
    • 8 Pages
    Best Essays
  • Good Essays

    The Naive Bayes classification is a good medium to many user modeling situations, as in the “Iris” data set, given its advantages of fast learning or intuition and low structural cost. It would work the following way: Suppose your data consisted of vegetables, described by their color and shape. This would work by saying "If you see a vegetable that is green and spherical, what type of vegetable is it most likely to be, based on the data? In the future, classify green and spherical vegetables as that type of vegetable."…

    • 930 Words
    • 3 Pages
    Good Essays
  • Best Essays

    Abstract Simulators have been used for many years to learn driving, piloting, steering, etc. but they often provide the same training for each learner, no matter his/her performance. In this paper, we present the GULLIVER system, which determines the most appropriate aids to display for learner guiding in a fluvial-navigation training simulator. GULLIVER is a decision-making system based on an evidential network with conditional belief functions. This evidential network allows graphically representing inference rules on uncertain data coming from learner observation. Several sensors and a predictive model are used to collect these data about learner performance. Then the evidential network is used to infer in real time the best guiding to display to learner in informed virtual environment.…

    • 3444 Words
    • 14 Pages
    Best Essays
  • Good Essays

    The Lazy Super Parent TAN (LSPTAN) heuristic is a postergated version of the SP-TAN that constructs a Tree Augmented Naive Bayes for each test example. Attributes dependencies are generated based on information from the example that is being classified. To build a lazy version of SP-TAN we adapted the method of evaluation and the selection of candidates for Super Parent and Favorite Children.\looseness=-1…

    • 1277 Words
    • 6 Pages
    Good Essays
  • Good Essays

    Overview .......................................................................................................................................................................................................................................................... 3 Unit 1: Introduction to AS and A Level course ............................................................................................................................................................................................ 5 Unit 2: Commentary ...................................................................................................................................................................................................................................... 11 Unit 3: Directed writing ................................................................................................................................................................................................................................ 19 Unit 4: Imaginative writing (narrative/descriptive) .................................................................................................................................................................................... 26 Unit 5: Writing for an audience (discursive/argumentative) .................................................................................................................................................................... 35 Unit 6: Text analysis ..................................................................................................................................................................................................................................... 42 Unit 7: Language topics…

    • 27530 Words
    • 111 Pages
    Good Essays
  • Powerful Essays

    Gaurang Khetan Graduate Student, Department of Computer Science, University of Southern California, Los Angeles, CA. gkhetan@usc.edu December 16, 2002…

    • 4563 Words
    • 19 Pages
    Powerful Essays
  • Good Essays

    The CRISP-DM Case Study

    • 872 Words
    • 4 Pages

    Classification is the derivation of a function or model which determines the class of an object based on its attributes. A set of objects is given as the training set in which every object is represented by a vector of attributes along with its class. The examples of classification model can be used to diagnose a new patient’s disease based on the patient’s diagnostic data such as age, sex, weight, temperature and blood pressure.…

    • 872 Words
    • 4 Pages
    Good Essays
  • Good Essays

    Bayesian theory is increasingly being adopted by the data scientists and analysts across the world. Most of the times the data set available or the information is incomplete. To deal with this realm of inductive logic, usage of probability theory becomes essential. As per the new perceptions, probability theory today is recognized as a valid principle of logic that is used for drawing inferences related to hypothesis of interest. E.T. Jaynes in the late 20th century, shared the view of “Probability theory as logic”. Today this is commonly called Bayesian probability theory in recognition with the work done in the late 18th century by an English clergyman and mathematician Thomas Bayes. (Gregory, Phil;, 2010)…

    • 1088 Words
    • 5 Pages
    Good Essays