Top 10 algorithms in data mining
Xindong Wu · Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang · Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng · Bing Liu · Philip S. Yu · Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg
Received: 9 July 2007 / Revised: 28 September 2007 / Accepted: 8 October 2007 Published online: 4 December 2007 © Springer-Verlag London Limited 2007
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
X. Wu (B ) Department of Computer Science, University of Vermont, Burlington, VT, USA e-mail: xwu@cs.uvm.edu V. Kumar Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA e-mail: kumar@cs.umn.edu J. Ross Quinlan Rulequest Research Pty Ltd, St Ives, NSW, Australia e-mail: quinlan@rulequest.com J. Ghosh Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX 78712, USA e-mail: ghosh@ece.utexas.edu Q. Yang Department of Computer Science, Hong Kong University of Science and Technology, Honkong, China e-mail: qyang@cs.ust.hk H. Motoda AFOSR/AOARD and Osaka University, 7-23-17 Roppongi, Minato-ku, Tokyo 106-0032, Japan e-mail: motoda@ar.sanken.osaka-u.ac.jp
123
2
X. Wu et al.
clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development. 0 Introduction In an effort to identify some of