Classified Ads Harvesting Agent and Notification System
Razvi Doomun*, Lollmahamod N., Auleear Nadeem, Mozafar Aukin
Faculty of Engineering University of Mauritius, Reduit, E-mail : r.doomun@uom.ac.mu
ABSTRACT The shift from an information society to a knowledge society require rapid information harvesting, reliable search and instantaneous on demand delivery. Information extraction agents are used to explore and collect data available from Web, in order to effectively exploit such data for business purposes, such as automatic news filtering, advertisement or product searching and price comparing. In this paper, we develop a real-time automatic harvesting agent for adverts posted on Servihoo web portal and an SMS-based notification system. It uses the URL of the web portal and the object model, i.e., the fields of interests and a set of rules written using the HTML parsing functions to extract latest adverts information. The extraction engine executes the extraction rules and stores the information in a database to be processed for automatic notification. This intelligent system
aggregation for information portals, scientific research and business activity monitoring. A lot of work has been carried out into the idea of using agents to aid e-commerce, the majority of the attention being focused on B2B agents, with B2C agents receiving a little attention. Sen and Hernandez (2000) discuss the fact that many e-businesses have “seller 's agents” whose function it is to push merchandise or services to customers, and there are also “buyer 's agents" whose goal is to best serve the user 's interests. Maes (1994) discusses how agents used as “personal assistants” that collaborate with the user can be used to reduce work carried out by the user. They can also be used to help with information overload by learning a
References: C.-H. Chang, C.-N. Hsu, and S.-C. Lui. (2003) Automatic Information Extraction from Semi-Structured Web Pages by Pattern Discovery. Decision Support Systems Journal, 35(1). Crescenzi V., Mecca G., and Merialdo P. (2001) RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In The VLDB Journal, pages 109– 118. Gao X. and Sterling L (1999) Semi-Structured Data Extraction from Heterogeneous Sources. In Second International Workshop on Innovative Internet Information Systems (IIIS’99), Copenhagen. Habegger B. and Quafafou M. (2002) Multi-pattern wrappers for relation extraction. In Proceedings of the 15th European Conference on Artificial Intelligence, Amsterdam, IOS Press. Hannes Marais and Tom Rodeheffer (1999). Automating the Web with WebL. In Dr. Dobb 's Journal, January 1999. http://www.w3.org/DOM/DOMTR 6.0 DISCUSSION The system developed is an Intelligent Information Harvester and SMS Agent that is the system once started, automatically launches connection to the Servihoo Web Portal Site, extracts the latest ads information from the “Petites Annonces” section and downloads it to a database. The downloaded information is then dispatched as SMS to registered clients. With such a system, no need for viewers of “Petites Annonces” to each time visit the Servihoo Portal Site and lose time and effort in navigating the classified ads section to obtain latest ads details, what they need to do is just register on the system through the client interface and specify what type of information they want the system to harvest for them and receive the latest ad details on their mobile phone. International Conference on Information and Communication Technology for the Muslim World (ICT4M 2006), 21-23 November 2006, Kuala Lumpur, Malaysia Kistler T., Marais H, (1998) WebL - A Programming Language for the Web,” in Proceedings of the 7th International World Wide Web Conference. Brisbane, Australia. Kushmerick N. (2000). “Wrapper induction: Efficiency and expressiveness” Artificial Intelligence. Laender, A., Ribeiro-Neto, B., Silva, A. and Teixeira, J. (2002) A Brief Survey of Web Data Ex-traction Tools, in: SIGMOD Record, Volume 31, Number 2, June 2002 Maes P. (1994). Agents that reduce work and information overload, Communications of the ACM, Volume 37, Number 7 (July 1994) Muslea I., Minton S. and Knoblock, C. A. (2001). Hierarchical wrapper induction for semi-structured information sources. Journal of Autonomous Agents and Multi-Agent Systems 4:93–114. Sahuguet, A., Azavant F, (2000) WysiWyg Web Wrapper Factory (W4F), in Proceedings of the 8th International World Wide Web Conference, A. Mendelzon Editor, Elzevier Science, Toronto. Sen S. and Hernandez K (2000). A buyer 's agent, In Proc ' Fourth International Conference on Autonomous Agents, 2000. W3C DOM Technical Committee. 2003 Document object model technical reports.