‡ ‡ Kumar Ishan, Mohit Gupta, Naresh Kumar, Ankush Mittal† ‡
Department of electronics & Computer Engineering, Indian Institute of Technology, Roorkee, India. {kicomuec, mickyuec, naresuec, ankumfec}@iitr.ernet.in
Abstract Efficiency of any search engine mostly depends on how efficiently and precisely it can determine the importance and popularity of a web document. Page Rank algorithm and HITS algorithm are widely known approaches to determine the importance and popularity of web pages. Due to large number of documents available on World Wide Web, huge amount of computations are required to determine the rank of web pages making it very time consuming. Researchers have devoted much attention in parallelizing PageRank on PC Cluster, Grids, and Multi-core processors like Cell Broadband Engine to overcome this issue but with little or no success. In this paper, we discuss the issues in porting these algorithms on Compute Unified Device Architecture (CUDA) and introduce efficient parallel implementation of these algorithms on CUDA by exploiting the block structure of web, which not only cut down the computation time but also significantly reduces of the cost of hardware required (only few thousands).
1. Introduction In present days, the unceasing growth of World Wide Web has lead to a lot of research in page ranking algorithms used by the search engines to provide the most relevant results to the user for any particular query. The dynamic and diverse nature of web graph further exaggerates the challenges in achieving the optimum results. Web link analysis provides a way to order the web pages by studying the link structure of web graphs. PageRank and HITS (Hyperlink - Induced Topic Search) are two such most popular algorithms widely used by the current search engines either in same or modified form to rank the documents based on the link structure of the documents. PageRank, originally
References: [1] S. Brin and L. Page, “The Anatomy of a Large Scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems archive, Volume 30, Issue 1-7, April. 1998. [2] B. Manaskasemsak, P. Uthayopas, A. Rungsawang, “A Mixed MPI-Thread Approach for Parallel Page Ranking Computation”, OTM 2006, LNCS Volume 4276, 2006, pp. 1223-1233. [3] A. Rungsawang and B. Manaskasemsak, “Partition-Based Parallel PageRank Algorithm”, Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05), Sydney, 4th - 7th July, 2005. [4] A. Rungsawang and B. Manaskasemsak, “PageRank Computation Using PC Cluster”, Proceedings of the 10th European PVM/MPI User’s Group Meeting, Venice, Italy, 29th Sep – 2nd Oct 2003. [5] C. Kohlschutter, P. Chirita, and W. Nejdl, “E cient Parallel Computation of PageRank”, Proceedings of the 28th European Conference on Information Retrieval (ECIR), London, United Kingdom, 2006. [6] S. Kamvar, T.H. Haveliwala, C. D. Manning ,G. H. Golu, “Exploiting the Block Structure of the Web for Computing PageRank”, Technical Report CSSM-03-02, Computer Science Department, Stanford University, 2003. [7] T.H. Haveliwala, “Efficient Computation of PageRank”, Technical Report, Computer Science Department, Stanford University, 1999. [8] A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, “PageRank Computation and the Structure of the Web: Experiments and Algorithms”, In Proceedings of the 11th World Wide Web Conference, poster track, Honolulu, Hawaii, 7-11 May 2002. [9] G. Buehrer, S. Parthasarathy, and M. Goyder, "Data mining on the cell broadband engine", Proceedings of ICS’08, Cairo, Egypt, 20-24 October, 2008. [10] B. Manaskasemsak and A. Rungsawang, “Parallel PageRank Computation on a Gigabit PC Cluster”. Proceedings of the 18th International Conference on Advanced Information Networking and Application (AINA ’04), Fukuoka, Japan, 29-31 March 2004. [11] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, Journal of the ACM (JACM) archive. Volume 46, Issue 5, September 1999. [12] Y.G. Saffar, K.S. Esmaili, M. Ghodsi, and H. Abolhassani, “Parallel Online Ranking of Web Pages”, The 4th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-06), UAE, March 2006, pp. 104-109. [13] S. Nomura Satoshi Oyama Tetsuo Hayamizu, and Toru Ishida, “Analysis and Improvement of HITS Algorithm for DetectingWeb Communities”. [14] NVIDIA CUDA Programming Guide 2.2 by NVIDIA Corporation. [15] Daily estimated size of World Wide Web, http://www.worldwidewebsize.com [16] WebGraph Laboratory, http://webgraph.dsi.unimi.it/ in 2006