Volume 14, Number 3
Decision Tree Induction & Clustering Techniques In SAS Enterprise Miner, SPSS Clementine, And IBM Intelligent Miner – A Comparative Analysis
Abdullah M. Al Ghoson, Virginia Commonwealth University, USA
ABSTRACT Decision tree induction and Clustering are two of the most prevalent data mining techniques used separately or together in many business applications. Most commercial data mining software tools provide these two techniques but few of them satisfy business needs. There are many criteria and factors to choose the most appropriate software for a particular organization. This paper aims to provide a comparative analysis for three popular data mining software tools, which are SAS® Enterprise Miner, SPSS Clementine, and IBM DB2® Intelligent Miner based on four main criteria, which are performance, functionality, usability, and auxiliary Task Support. Keywords: Data mining, classification, decision tree, clustering, software evaluation, SAS Enterprise Miner, SPSS Clementine, IBM Intelligent miner, Comparative Analysis, evaluation criteria.
1.
INTRODUCTION
B
usinesses face challenges such as growth, regulations, globalization, mergers and acquisitions, competition, and economic changes, which require fast and good decisions rather than guess work. Taking good decisions requires accurate and clear analysis such as prediction, estimation, classification, or segmentation using data mining techniques. Decision tree induction and Clustering are two of the most important data mining techniques that find interesting patterns. There are many commercial data mining software in the market, and most of them provide decision trees induction and clustering data mining techniques. There is no doubt that commercial data mining software are expensive and costly, and choosing one of them is crucial and difficult decision. Therefore, this paper objective is to help
References: 1. 2. Berry, Michael J. A, and Gordon Linoff. “Data Mining Techniques: for marketing, sales, and customer support”. N.p.: John Wiley & Sons, Inc, 1997. Print. Jovanovic, N.; Milutinovic, V.; Obradovic, Z.; Foundations of Predictive Data Mining. Neural Network Applications in Electrical Engineering, 2002. NEUREL '02. 2002 6th Seminar on 26-28 Sept. 2002 Page(s):53 – 58 Berry, Michael J. A, and Gordon Linoff. Data Mining Techniques: for marketing, sales, and customer support. 2nd Edition, N.p.: John Wiley & Sons, Inc, 1997. p180-183. Print. Ajith Abraham, Swagatam Das,, and Amit Konar. "Automatic Clustering Using an Improved Differential Evolution Algorithm." IEEE Transactions On Systems, Man, And Cybernetics. 38.1 (2008): 218-236. Print. Castro, Vladimir Estivill. "Why so many clustering algorithms" SIGKDD Explorations”. 4.1 (2009): 65-75. Print. A. Ultsch, “Self Organizing Neural Networks perform different from statistical k-means clustering”. Retrieved December 6th, 2009, from http://www.mathematik.unimarburg.de/~databionics/downloads/papers/ultsch95kmeans.pdf Cabena, Peter. Discovering data mining. Prentice Hall, 1998. 78-79. Print. Collier, Ken etl. “A Methodology for Evaluating and Selecting Data Mining Software”, 32nd Hawaii International Conference on System Sciences, 1999, SAS Institute Inc. The SAS® Enterprise Intelligence Platform: SAS® Business Intelligence, 2008, retrieved in 2009 from http://www.sas.com/apps/whitepaper/index.jsp?cid=3596. Eric Hunley, SAS, Cary, NC. SAS Data Quality – A Technology Overview, SAS Inc., http://www2.sas.com/proceedings/sugi29/099-29.pdf. Randall Matignon, Data Mining Using SAS Enterprise Miner, retrieved in 2009from http://www.sasenterpriseminer.com. 69 3. 4. 5. 6. 7. 8. 9. 10. 11. International Journal of Management & Information Systems – Third Quarter 2010 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Volume 14, Number 3 Fast, scalable predictive analytics for the enterprise,SAS® Data Mining Solutions, retrieved in 2009 from www.sas.com. SAS® Enterprise Miner™ for Desktop 6.1, retrieved in 2009from www.sas.com. Dave Norris, Clementine data mining workbench from SPSS, retrieved in 2009 from www.bloorresearch.com. Data Mining: Data Understanding and Data Preparation, SPSS Inc, retrieved in 2009 from www.vcu.edu. Data Mining:Modeling, SPSS Inc, retrieved in 2009 from www.vcu.edu. Peter Cabena, Hyun Hee Choi, Il Soo Kim, Shuichi Otsuka, Joerg Reinschmidt, Gary Saarenvirta Intelligent Miner for Data Applications Guide, retrieved in 2009 from www.ibm.com. Daniel S. Tkach, Information Mining with the IBM Intelligent Miner Family, retrieved in 2009 from www.ibm.com. Joerg Reinschmidt, Helena Gottschalk, Hosung Kim, Damiaan Zwietering, Intelligent Miner for Data:Enhance Your Business Intelligence. www.ibm.com. IBM DB2 Intelligent Miner Modeling Administration and Programming, retrieved in 2009 from www.ibm.com. IBM DB2 Intelligent Miner Modeling IBM DB2 Intelligent Miner ScoringData Mining with Easy Mining procedures, retrieved in 2009 from www.ibm.com. IBM DB2 Intelligent Miner VisualizationUsing the Intelligent Miner Visualizers, retrieved in 2009 from www.ibm.com. Data Mining:Modeling, SPSS Inc retrieved in 2009 from , www.vcu.edu. SAS Enterprise Miner Help files. N. Jovanovic, V. Milutinovic, and Z. Obradovic, Member, IEEE, „Foundations of Predictive Data Mining‟, 2002. SAS Enterprise Miner help files. Retreived in 2009. 70