Bing Liu, Wynne Hsu, Heng-Siew Han and Yiyuan Xia
School of Computing National University of Singapore 3 Science Drive 2 Singapore 117543 {liub, whsu, xiayy}@comp.nus.edu.sg Abstract. Much of the data mining research has been focused on devising techniques to build accurate models and to discover rules from databases. Relatively little attention has been paid to mining changes in databases collected over time. For businesses, knowing what is changing and how it has changed is of crucial importance because it allows businesses to provide the right products and services to suit the changing market needs. If undesirable changes are detected, remedial measures need to be implemented to stop or to delay such changes. In many applications, mining for changes can be more important than producing accurate models for prediction. A model, no matter how accurate, can only predict based on patterns mined in the old data. That is, a model requires a stable environment, otherwise it will cease to be accurate. However, in many business situations, constant human intervention (i.e., actions) to the environment is a fact of life. In such an environment, building a predictive model is of limited use. Change mining becomes important for understanding the behaviors of customers. In this paper, we study change mining in the contexts of decision tree classification for real-life applications.
1.
Introduction
The world around us changes constantly. Knowing and adapting to changes is an important aspect of our lives. For businesses, knowing what is changing and how it has changed is also crucial. There are two main objectives for mining changes in a business environment: 1. To follow the trends: The key characteristic of this type of applications is the word "follow". Companies want to know where the trend is going and do not want to be left behind. They need to analyze customers ' changing behaviors in order to provide products and
References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] Agrawal, R. and Psaila, G. "Active data mining." KDD-95, 1995. Agrawal, R., Imielinski, T., Swami, A. “Mining association rules between sets of items in large databases.” SIGMOD-1993, 1993, pp. 207-216. Cheung, D. W., Han, J, V. Ng, and Wong, C.Y. “Maintenance of discovered association rules in large databases: an incremental updating technique.” ICDE-96, 1996. Dong, G. and Li, J. “Efficient mining of emerging patterns: discovering trends and differences.” KDD-99, 1999. Freund, Y and Mansour, Y. “Learning under persistent drift” Computational learning theory: Third European conference, 1997. Ganti, V., Gehrke, J., and Ramakrishnan, R. "A framework for measuring changes in data characteristics" POPS-99. Helmbold, D. P. and Long, P. M. “Tracking drifting concepts by minimizing disagreements.” Machine Learning, 14:27, 1994. Johnson T. and Dasu, T. "Comparing massive high-dimensional data sets," KDD-98. Lane, T. and Brodley, C. "Approaches to online learning and concept drift for user identification in computer security." KDD-98, 1998. Liu, B., Hsu, W., “Post analysis of learnt rules." AAAI-96. Liu, B., Hsu, W., and Chen, S. “Using general impressions to analyze discovered classification rules.” KDD-97, 1997, pp. 31-36. Merz, C. J, and Murphy, P. UCI repository of machine learning databases [http://www.cs.uci.edu/~mlearn/MLRepository.html], 1996. Moore, D.S. “Tests for chi-squared type.” In: R. B. D’Agostino and M. A. Stephens (eds), Googness-of-Fit Techniques, Marcel Dekker, New York, 1996, pp. 63-95. Nakhaeizadeh, G., Taylor, C. and Lanquillon, C. “Evaluating usefulness of dynamic classification”, KDD-98, 1998. Quinlan, R. C4.5: program for machine learning. Morgan Kaufmann, 1992. Silberschatz, A., and Tuzhilin, A. “What makes patterns interesting in knowledge discovery systems.” IEEE Trans. on Know. and Data Eng. 8(6), 1996, pp. 970-974. Widmer, G. "Learning in the presence of concept drift and hidden contexts." Machine learning, 23 69-101, 1996.