Stéphanie Weiser*, Philippe Laublet**, Jean-Luc Minel*
* MoDyCo, UMR 7114, CNRS 200 avenue de la République, 92001 Nanterre ** LaLIC, Université Paris-Sorbonne Maison de la recherche, 28 rue Serpente 75006 Paris E-mail: steph.weiser@gmail.com, Philippe.Laublet@paris-sorbonne.fr, jminel@u-paris10.fr
Abstract
This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the temporal information in question is thus particular to this area. The differences that exist between extraction from plain textual data and extraction from the web are brought to light. These differences mainly concern the spatial arrangement of the text, the use of punctuation and the respect of traditional syntactic rules. The temporal expressions to be extracted are classified into two kinds: temporal information that concerns one particular event and repetitive temporal information. We adopt a symbolic approach relying on patterns and rules for the detection, extraction and annotation of temporal expressions; our method is based on the use of transducers. First evaluations have shown promising results. Since the visual structure of a web page is very important and often informs the user before he has even read the text, a semiotic study is also presented in this paper.
1. Introduction
With the methods of the Semantic Web, portal applications can be created, relying on ontologies. For these applications and many service applications, temporal information is often essential. For example, a tourism web portal would need information about the type of tourism object and its location in time and space. In addition, the extracted information must be stored in the knowledge base according to the ontology used by the application. In this paper we will focus on temporal information in tourism web pages. The temporal
References: Battistelli, D., Minel, J.-L., Schwer, S. (2006). Représentation des expressions calendaires dans les textes : une application à la lecture assistée de biographies, Traitement Automatique des Langues, 47, 3, pp.1--26. Bry, F. Lorenz, B. Ohlbach, H. J. Spranger, S. (2003). On Reasoning on Time and Location on the Web, Lecture Notes in Computer Science, Springer-Verlag, Germany, pp. 69--83. Noël, L., Carloni, O., Moreau, N., Weiser, S. (2008). Designing a Knowledge-Based Tourism Information System, Int. J. of Digital Culture and Electronic Tourism, Special Issue on National Tourism Organisations and Exploitation of Information Technologies, to be published. Stern, R.-D. (2007). Expression linguistique du temps et représentation ontologique : OWL-Time et étude des adverbiaux temporels, Mémoire de Master IILGI, Université de Paris-Sorbonne. Tenier, S., Toussaint, Y., Napoli, A. et Polanco, X. (2006). Instantiation of relations for semantic annotation, In the 2006 IEEE/WIC/ACM International Conference on Web Intelligence - WI 2006, pp. 463-472 131