Scatter/Gather on the Web
Weimao Ke and Xuemei Gong
Laboratory for Information, Network & Computing Studies
College of Information Science and Technology
Drexel University, 3141 Chestnut St, Philadelphia, PA 19104
wk@drexel.edu, xg45@drexel.edu
ABSTRACT
Scatter/Gather is a powerful browsing model for exploratory information seeking. However, its potential on the web scale has not been demonstrated due to scalability challenges of interactive clustering. We have developed in previous research a two-stage method to support on-the-fly Scatter/Gather, in which an offline module pre-computes a hierarchical structure to support constant time on-line interaction. In this work, we focus on the offline hierarchy construction and develop a novel distributed approach to hierarchical agglomerative clustering (HAC). Relying on Javascript that is commonly supported by browsers, the distributed clustering method has the potential to scale with growing traffics of a site. We show in experiments that a moderate increase in the number of parallel processes (in visitors’ browsers) leads to a dramatic decrease of clustering time. This demonstrates great potentials in supporting large-scale Scatter/Gather interactions on the web. We present preliminary analysis of clustering effectiveness and a related Scatter/Gather prototype for web search.
Keywords text clustering, Scatter/Gather, distributed computing, parallel clustering, browser server, Javascript, interactive information retrieval, exploratory search
can be conducted without explicit query specification
(Cutting et al., 1992). Based on iterative user selection and interactive text clustering, Scatter/Gather offers a powerful tool for navigating a large, complex information space. It enables the user to explore inherent associations among documents and topics in the data, supporting learning and investigation (Hearst and Pedersen, 1996).
However,
References: Arthur, H. (2012). (1998). Learning to extract symbolic knowledge from the world wide web Gong, X., Khare, R., and Ke, W. (2012). (1995). Scatter/Gather as a tool for the navigation of retrieval results Hearst, M. A. and Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/Gather on retrieval Jain, A. K., Murty, M. N., and Flynn, P. J. (1999). Data clustering: a review Ke, W., Mostafa, J., and Liu, Y. (2008). Toward responsive visualization services for scatter/gather browsing. Proceedings of the American Society for Information Science and Technology, 45(1):1–10. Ke, W., Sugimoto, C. R., and Mostafa, J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11:22–31. Manning, C. D., Raghavan, P., and Sch¨ tze, H. (2008). Witten, I. H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition.