Anoop Sarkar
School of Computing Science,
Simon Fraser University, Canada.
E-mail: anoop@cs.sfu.ca
Abstract
Parsing uncovers the hidden structure of linguistic input. In many applications involving natural language, the underlying predicate-argument structure of sentences can be useful. The syntactic analysis of language provides a means to explicitly discover the various predicate-argument dependencies that may exist in a sentence. In natural language processing, the syntactic analysis of natural language input can vary from being very low-level, such as simply tagging each word in the sentence with a part of speech, or very high level, such as recovering a structural analysis that identifies the dependency between each predicate in the sentence and its explicit and implicit arguments. The major bottleneck in parsing natural language is the fact that ambiguity is so pervasive. In syntactic parsing, ambiguity is a particularly difficult problem since the most plausible analysis has to be chosen from an exponentially large number of alternative analyses. From tagging to full parsing, algorithms have to be carefully chosen that can handle such ambiguity. This chapter explores syntactic analysis methods from tagging to full parsing and the use of supervised machine learning to deal with ambiguity.
1
Parsing Natural Language
In a text to speech application input sentences are to be converted to a spoken output that should sound like it was spoken by a native speaker of the language. Consider the following pair of sentences (imagine them spoken rather than written):1
1. He wanted to go for a drive in movie .
2. He wanted to go for a drive in the country .
There is a natural pause between the words ‘drive’ and ‘in’ in sentence 2 which reflects an underlying hidden structure to the sentence. Parsing can provide a structural description that identifies such a break in the intonation. A simpler case occurs in the following
References: 116–150. Addison-Wesley, Reading, MA, 1964. editors, Empirical methods in natural language generation, Lecture Notes in Computer Science (LNAI 5790) data. In Proceedings of the 18th International Conf. on Machine Learning (ICML), pages 282–289, 2001. on Acoustics, Speech, and Signal Processing (ICASSP’05), 2005. [21] Xiaoqiang Luo. A maximum entropy chinese character-based parser. In Proceedings of the 2003 conference on Empirical methods in natural language processing - Volume 10, pages 192–199, Morristown, NJ, USA, 2003. treebank: Annotating predicate argument structure. In In ARPA Human Language Technology Workshop, pages 114–119, 1994. pages 226–233. Association for Computational Linguistics, 2000. D. Yuret, editors. Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, Prag, Tjeckien, 2007. CSLI Publications, 2005. of Conference on Computational Linguistics (COLING-02), pages 577–583, Taipei, Taiwan, 2002.