Web Usage Mining: A Survey on Pattern Extraction from Web Logs
S. K. Pani, , 2L. Panigrahy, 2V.H.Sankar, 3Bikram Keshari Ratha, 2A.K.Mandal, 2S.K.Padhi 1 P.G. Department Of Computer Science, RCMA; Bhubaneswar, Orissa, India 2 Department of Computer Science and Engineering; Konark Institute of Science and Technology; Bhubaneswar, Orissa, India 3 P.G. Department Of Computer Science, Utkal University,Bhubaneswar, Orissa, India E-mail: Subhendu_pani@rediffmail.com; mynamelingaraj@gmail.com; Himashankar.V@gmail.com; vkramus@gmail.com; Sanjaya2004@yahoo.com
1
Abstract— As the size of web increases along with number of users, it is very much essential for the website owners to better understand their customers so that they can provide better service, and also enhance the quality of the website. To achieve this they depend on the web access log files. The web access log files can be mined to extract interesting pattern so that the user behaviour can be understood. This paper presents an overview of web usage mining and also provides a survey of the pattern extraction algorithms used for web usage mining.
To mine the interesting data from this huge pool, data mining techniques can be applied. But the web data is unstructured or semi structured. So we can not apply the data mining techniques directly. Rather another discipline is evolved called web mining which can be applied to web data. Web mining is used to discover interest patterns which can be applied to many real world problems like improving web sites, better
Keywords— web mining, pattern extraction, usage mining, preprocessing
I. INTRODUCTION In this world of
Information
Technology, accessing
understanding the visitor’s behavior, product recommendation etc. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services (Etzioni,1996). Web mining is
information
References: [1] Chen Hu, Xuli Zong, Chung-wei Lee and Jyh-haw Yeh, “World Wide Web Usage Mining Systems and Technologies”, Journal of SYSTEMICS, CYBERNETICS AND INFORMATICS Vol. 1, No. 4, Pages53-59, 2003. 22 International Journal of Instrumentation, Control & Automation (IJICA), Volume 1, Issue 1, 2011 Web Usage Mining: A Survey on Pattern Extraction from Web Logs [2] FlorentMasseglia, Pascal Poncelet, Rosine Cicchetti, “An efficient algorithm for Web usage mining”, Networking and Information Systems Journal. Volume X, 2000 [3] R. Pamnani, P. Chawan “Web Usage Mining: A Research Area in Web Mining” [4] Qiankun Zhao, Sourav S. Bhowmick, “Sequential Pattern Mining: A Survey”, Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003118 , 2003. [5] S. Rawat, L. Rajamani, “Discovering Potential User Browsing Behaviors Using Custom-Built APRIORI Algorithm”, International journal of computer science & information Technology (IJCSIT) Vol.2, No.4, August 2010 [6] Ming-Syan Chen, Jong Soo Park, Philip S. Yu, “Efficient Data Mining for Path Traversal Patterns”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 10, NO. 2, MARCH/APRIL 1998. [7] Jianhan Zhu, Jun Hong, John G. Hughes, “Using Markov Chains for Link Prediction in Adaptive Web Sites”, Soft-Ware 2002, LNCS 2311, pp. 60–73, 2002 [8] WANG Tong, HE Pi-lian, “Web Log Mining by an Improved AprioriAll Algorithm”, World Academy of Science, Engineering and Technology 4 2005 [9] Hengshan Wang, Cheng Yang, Hua Zeng, “ Design and Implementation of a Web Usage Mining Model Based On Fpgrowth and Prefixspan”, Communications of the IIMA 2006 Volume 6 Issue 2 [10] Paola Britos, Damián Martinelli, Hernán Merlino, Ramón GarcíaMartínez, “Web Usage Mining Using Self Organized Maps”, International Journal of Computer Science and Network Security, VOL.7 No.6, June 2007 [11] Mehrdad Jalali, Norwati Mustapha, Ali Mamat, Md. Nasir B Sulaiman, “WEB USER NAVIGATION PATTERN MINING APPROACH BASED ON GRAPH PARTITIONING ALGORITHM”, Journal of Theoretical and Applied Information Technology [12] Kobra Etminani, Mohammad-R. Akbarzadeh-T., Noorali Raeeji Yanehsari, “Web Usage Mining: users ' navigational patterns extraction from web logs using Ant-based Clustering Method”, IFSA-EUSFLAT 2009 [13] Sandeep Singh Rawat, Lakshmi Rajamani, “DISCOVERING POTENTIAL USER BROWSING BEHAVIORS USING CUSTOM-BUILT APRIORI ALGORITHM”, International journal of computer science & information Technology (IJCSIT) Vol.2, No.4, August 2010 [14] Mahdi Khosravi, Mohammad J. Tarokh, “Dynamic Mining of Users Interest Navigation Patterns Using Naive Bayesian Method”, 978-1-42448230-6/10/$26.00 ©2010 IEEE [15] N. Sujatha, K. Iyakutty, “Refinement of Web usage Data Clustering from K-means with Genetic Algorithm”, European Journal of Scientific Research ISSN 1450-216X Vol.42 No.3 (2010), pp.464-476 [16] http://httpd.apache.org/docs/1.3/logs.html [17] http://www.w3.org/TR/WD-logfile.html [18] http://www.internetworldstats.com [19] http://www.domaintools.com/internet-statistics/ 23 International Journal of Instrumentation, Control & Automation (IJICA), Volume 1, Issue 1, 2011