Investigating spoken Arabic digits in speech recognition setting
Yousef Ajami Alotaibi
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, P.O. Box 57168, Riyadh 11574, Saudi Arabia Received 3 October 2003; received in revised form 18 May 2004; accepted 14 July 2004
Abstract Arabic language is a Semitic language that has many differences when compared to European languages such as English. One of these differences is how to pronounce the 10 digits, zero through nine. Except for zero, all Arabic digits are polysyllabic words. In this paper Arabic digits were investigated from the speech recognition problem point of view. An artificial neural network based speech recognition system was designed and tested with automatic Arabic digit recognition. The system is an isolated whole word speech recognizer and it was implemented as both a multi-speaker and speaker-independent modes. During the recognition process, noise was removed from digitized speech by means of band-pass filters, the signal was also pre-emphasized, and windowed and blocked by Hamming window. A time alignment algorithm was used to compensate for differences in utterance lengths and misalignments between phonemes. Frame features were extracted by using MFCC coefficients to reduce the amount of the information in the input signal. Finally the neural network classified the unknown digit. This recognition system achieved a 99.5% correct digit recognition in the multispeaker mode, and 94.5% in speaker-independent mode. This paper also investigated Arabic digits as ‘‘patterns on paper’’ by using spectrogram and waveform information to cross check and investigate digit recognition system results and to try to locate the causes of miss-recognized digits. All Arabic digits were described by showing their
E-mail address: yalotaibi@ccis.ksu.edu.sa 0020-0255/$ - see front matter Ó 2004 Elsevier
References: [1] M. Al-Zabibi, An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition, The British Library in Association with UMI, 1990. [2] A. Muhammad, Alaswaat Alaghawaiyah, Daar Alfalah, Jordan, 1990 (in Arabic). [3] J. Deller, J. Proakis, J.H. Hansen, Discrete-Time Processing of Speech Signal, Macmillan, NY, 1993. [4] M. Elshafei, Toward an arabic text-to-speech system, The Arabian Journal for Science and Engineering 16 (4B) (1991) 565–583. [5] Y.A. El-Imam, An unrestricted vocabulary arabic speech synthesis system, IEEE Transactions on Acoustic, Speech, and Signal Processing 37 (12) (1989) 1829–1845. [6] E. Hagos, Implementation of an Isolated Word Recognition System, Master thesis, University of Petroleum and Minerals, Dhahran, Saudi Arabia, 1985. [7] W. Abdulah, M. Abdul-Karim, Real-time spoken arabic recognizer, International Journal of Electronics 59 (5) (1984) 645–648. [8] A. Al-Otaibi, Speech Processing, The British Library in Association with UMI, 1988. [9] G. Pullum, W. Ladusaw, Phonetic Symbol Guide, The University of Chicago Press, 1996. [10] R. Lippmann, Review of Neural Networks for Speech Recognition, Neural Computation, MIT press, 1989, pp. 1–38. [11] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall, Englewood Cliffs, NJ, 1999. [12] T.H. Nong, J. Yunus, S.H. Salleh, Classification of Malay speech sounds based on place of articulation and voicing using neural networks, in: Proceedings of IEEE Electrical and Electronic Technology, TENCON, 2001, pp. 170–173. [13] S.A. Selouani, D. OÕShaughnessy, Hybrid architectures for complex phonetic features classification: a unified approach, in: International Symposium on Signal Processing and its Applications (ASSPA), Kuala Lumpur, Malaysia, August 2001, pp. 719–722. [14] M. Salam, D. Mohamad, S. Salleh, Neural Network speaker dependent isolated malay speech recognition system: handcrafted vs. genetic algorithm, in: International Symposium on Signal Processing and its Application (ISSPA), Kuala Lumpur, Malaysia, August 2001, pp. 731–734. [15] L. Rabiner, M. Samber, An algorithm for determining the endpoints of isolated utterances, The Bell System Technical Journal 54 (2) (1975) 297–315. [16] N. Nocerino, F. Soong, L. Rabiner, D. Klatt, Comparative study of several distortion measures for speech recognition, Speech Communication 4 (1985) 317–331. [17] S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustic, Speech, and Signal Processing ASSP-28 (4) (1980) 357–366. [18] Yousef A. Alotaibi, A simple and effective time-alignment algorithm for spoken arabic digits, unpublished. [19] P.C. Loizou, A.S. Spanias, High-performance alphabet recognition, IEEE Transactions on Speech and Audio Processing 4 (6) (1996) 430–445.