Na¨ve Bayes ı
David J. Hand
Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Despite Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extensions of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 9.2 9.3 9.4 9.5 9.6 163 164 167 169 171 171 171 173 174 175 176
9.1 Introduction
Given a set of objects, each of which belongs to a known class, and each of which has a known vector of variables, our aim is to construct a rule which will allow us to assign future objects to a class, given only the vectors of variables describing the future objects. Problems of this kind, called problems of supervised classification, are ubiquitous, and many methods for constructing such rules have been developed. One very important method is the na¨ve Bayes
References: 177 Hand D.J. and Yu K. (2001) Idiot’s Bayes—not so stupid after all? International Statistical Review, 69, 385–398. Hastie T.J. and Tibshirani R.J. (1990) Generalized Additive Models. London: Chapman and Hall. Jamain A. and Hand D.J. (2005) The na¨ve Bayes mystery: A statistical detective ı story. Pattern Recognition Letters, 26, 1752–1760. Jamain A. and Hand D.J. (2008) Mining supervised classification performance studies: A meta-analytic investigation. Journal of Classification, 25, 87–112. Langley P. (1993) Induction of recursive Bayesian classifiers. Proceedings of the Eighth European Conference on Machine Learning, Vienna, Austria: SpringerVerlag, 153–164. Mani S., Pazzani M.J., and West J. (1997) Knowledge discovery from a breast cancer database. Lecture Notes in Artificial Intelligence, 1211, 130–133. Metsis V., Androutsopoulos I., and Paliouras G. (2006) Spam filtering with na¨ve ı Bayes—which na¨ve Bayes? CEAS 2006—Third Conference on Email and Antiı Spam, Mountain View, California. Sahami M., Dumains S., Heckerman D., and Horvitz E. (1998) A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization—Papers from the AAAI Workshop, Madison, Wisconsin, pp. 55–62. Titterington D.M., Murray G.D., Murray L.S., Spiegelhalter D.J., Skene A.M., Habbema J.D.F., and Gelpke G.J. (1981) Comparison of discrimination techniques applied to a complex data set of head injured patients. Journal of the Royal Statistical Society, Series A, 144, 145–175. © 2009 by Taylor & Francis Group, LLC