Preview

Speech Communication for Interactive Reading Tutors

Powerful Essays
Open Document
Open Document
11306 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Speech Communication for Interactive Reading Tutors
Speech Communication 49 (2007) 861–873 www.elsevier.com/locate/specom

Highly accurate children’s speech recognition for interactive reading tutors using subword units
Andreas Hagen, Bryan Pellom *, Ronald Cole
Center for Spoken Language Research, University of Colorado at Boulder, 1777 Exposition Drive, Suite #171, Boulder, CO 80301, USA Received 15 December 2005; received in revised form 20 February 2007; accepted 9 May 2007

Abstract Speech technology offers great promise in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, assessing comprehension of the text being read by estimating if the prosodic structure of the speech is appropriate to the discourse structure of the story, or by engaging the child in interactive dialogs to assess and train comprehension. Despite such promises, speech recognition systems exhibit higher error rates for children due to variabilities in vocal tract length, formant frequency, pronunciation, and grammar. In the context of recognizing speech while children are reading out loud, these problems are compounded by speech production behaviors affected by difficulties in recognizing printed words that cause pauses, repeated syllables and other phenomena. To overcome these challenges, we present advances in speech recognition that improve accuracy and modeling capability in the context of an interactive literacy tutor for children. Specifically, this paper focuses on a novel set of speech recognition techniques which can be applied to improve oral reading recognition. First, we demonstrate that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling. Next, we propose extending our baseline system by introducing a novel token-passing search architecture targeting subword unit based



References: Aist, G., Chan, P., Huang, X., Jiang, L., Kennedy, R., Latimer, D., Mostow, J., Yeung, C., 1998. How effective is unsupervised data collection for children’s speech recognition? In: Proc. ICSLP 98 Sydney, Australia. Arcy, S., Wong, L., Russel, M., 2004. Recognition of read and spontaneous children’s speech using two new corpora. In: Proc. ICSLP 2004, Jeju Island, Korea. Banerjee, S., Beck, J., Mostow, J., 2003a. Evaluating the effect of predicting oral reading miscues. In: Proc. Eurospeech 2003, Geneva, Switzerland. Banerjee, S., Mostow, J., Beck, J., Tam, W., 2003b. Improving language models by learning from speech recognition errors in a reading tutor that listens. In: Proc. Second Internat. Conf. on Applied Artificial Intelligence 2003, Fort Panhala, Kolhapur, India. Bazzi, I., 2002. Modelling out-of-vocabulary words for robust speech recognition. Ph.D. Thesis, MIT, June 2002, Department of Electrical Engineering and Computer Science. Cole, R., Hosom, P., Pellom, B., 2006a. University of Colorado Prompted and Read Children’s Speech Corpus. Technical Report TR-CSLR2006-02, Center for Spoken Language Research, University of Colorado, Boulder. Cole, R., Pellom, B., 2006b. University of Colorado Read and Summarized Stories Corpus. Technical Report TR-CSLR-2006-03, Center for Spoken Language Research, University of Colorado, Boulder. Cole, R.A., Van Vuuren, S., Pellom, B., Hacioglu, K., Ma, J., Movellan, J., Schwartz, S., Wade-Stein, D., Ward, W., Yan, J., 2003. Perceptive animated interfaces: first steps toward a new paradigm for human– computer interaction. Proc. IEEE: Special Issue on Human–Computer Multimodal Interface 91 (9), 1391–1405. Cole, R., Wise, B., Van Vuuren, S., 2006. How Marni teaches children to read. Educ. Technol. 47 (1), 14–18. COLit, 2004. Colorado Literacy Tutor Project. . Cosi, P., Pellom, B., 2005. Italian Children’s speech recognition for advanced interactive literacy tutors. In: Proc. Eurospeech 2005, Lisbon, Portugal. Creutz, M., Lagus, K., 2002. Unsupervised discovery of morphemes. In: Proc. Workshop on Morphological and Phonological Learning of ACL-02, Philadelphia, pp. 21–30. Das, S., Nix D., Picheny, M., 1998. Improvements in children’s speech recognition performance. In: Proc. ICASSP 98, Seattle, WA. Eskenazi, M., 1996. KIDS: A database of childrens speech. J. Acoust. Soc. Amer. 100 (4, Part 2). Fogarty, J., Dabbish, L., Steck, D.M., Mostow, J., 2001. Mining a database of reading mistakes: For what should an automated Reading Tutor listen? In: Proc. Tenth Internat. Conf. on Artificial Intelligence in Education (AI-ED) 2001, San Antonio, Texas. Gales, M., 1997. Maximum likelihood linear transformations for HMMbased speech recognition. Technical Report, CUED/F-INFENG/ TR291, Cambridge University. Giuliani, D., Gerosa, M., 2003. Investigating recognition of children’s speech. In: Proc. ICASSP 2003, Hong Kong. Gustafson, J., Sjolander, K., 2002. Voice transformations for improving children’s speech recognition in a publicly available dialogue system. In: Proc. ICSLP 2002, Denver, Colorado. Hacioglu, K., Pellom, B., Ciloglu, T., Ozturk, O., Kurimo, M., Creutz, M., 2003. On lexicon creation for Turkish LVCSR. In: Proc. Eurospeech 2003, Geneva, Switzerland. Hagen, A., Pellom, B., 2005a. A Multi-layered lexical-tree based token passing architecture for efficient recognition of subword speech units. In: The 2nd Language and Tech. Conf., Poznan, Poland. A. Hagen et al. / Speech Communication 49 (2007) 861–873 Hagen, A., Pellom, B., 2005b. Data driven subword unit modeling for speech recognition and its application to interactive reading tutors. In: Interspeech 2005, Lisbon, Portugal. Hagen, A., Pellom, B., Cole, R., 2003. Children’s speech recognition with application to interactive books and tutors. In: IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, St. Thomas. Hagen, A., Pellom, B., Van Vuuren, S., Cole, R., 2004. Advances in children’s speech recognition within an interactive literacy tutor. HLTNAACL, Boston, May 2004. Lee, S., Potamianos, A., Narayanan, S., 1997. Analysis of children’s speech: duration, pitch and formants, In: Proc. EUROSPEECH 97, Rhodes, Greece. Lee, S., Potamianos, A., Narayanan, S., 1999. Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Amer. 105, 1455–1468. Lee, K., Hagen, A., Romanyshyn, N., Martin, S., Pellom, B., 2004. Analysis and detection of reading miscues for interactive literacy tutors. COLING, Geneva, Switzerland. Li, Q., Russell, M., 2002. An analysis of the causes of increased error rates in children’s speech recognition. In: Proc. ICSLP 02, Denver, Colorado. McCandless, M., 1992. Word rejection for a literacy tutor. S.B. Thesis, MIT, May 1992, Department of Electrical Engineering and Computer Science. Mostow, J., Roth, S.F., Hauptmann, A.G., Kane, M., 1994. A prototype reading coach that listens. In: Proc. of AAAI-94, Seattle, WA, pp. 785– 792. Mostow, J., Beck, J., Winter, S., Wang, S., Tobin, B., 2002. Predicting oral reading miscues. In: ICSLP 2002, Denver, Colorado. Pellom, B., 2001. SONIC: The University of Colorado Continuous Speech Recognizer. Technical Report TR-CSLR-2001-01, University of Colorado. 873 Pellom, B., Hacioglu, K., 2003. Recent improvements in the CU SONIC ASR system for noisy speech: the SPINE task. In: Proc. ICASSP 2003, Hong Kong. Potamianos, A., Narayanan, S., 2003. Robust recognition of children’s speech. IEEE Trans. Speech Audio Process. 11, 603–616. Potamianos, A., Narayanan, S., Lee, S., 1997. Automatic speech recognition for children. In: Proc. EUROSPEECH 97, Rhodes, Greece. Shobaki, K., Hosom, J.P., Cole, R., 2000. The OGI Kids’ Speech Corpus and recognizers. In: Proc. ICSLP 2000, Beijing, China. Siohan, O., Myrvoll, T., Lee, C.H., 2002. Structural maximum a posteriori linear regression for fast HMM adaptation. Computer, Speech and Language 16, 5–24. Spache, G.D., 1981. Diagnostic Reading Scales. Del, Monte Research Park, Monterey, CA 93940: CTB, Macmillan/McGraw-Hill. Tam, Y.C., Mostow, J., Beck, J., Banerjee, S., 2003. Training a confidence measure for a reading tutor that listens. In: Proc. Eurospeech 2003, Geneva, Switzerland. van Vuuren, S., Cole, R., Ngampatipatpong, N., 2006. Providing feedback to students while reading out loud in interactive books. Technical Report TR-CSLR-2006-01, Center for Spoken Language Research, University of Colorado, Boulder. Welling, L.,Kanthak, S., Ney, H., 1999. Improved methods for vocal tract length normalization. In: Proc. ICASSP 99, Phoenix, Arizona. Wise, B., Cole, R., Van Vuuren, S., Schwartz, S., Snyder, L., Ngampatipatpong, N., Tuantranont, J., Pellom, B., 2005. Learning to read with a virtual tutor: foundations to literacy. In: Kinzer, C., Verhoeven, L. (Eds.), Interactive Literacy Education: Facilitating Literacy Environments through Technology. Lawrence Erlbaum, Mahwah, NJ. Young, S.J., Russell, N.H., Thornton, J.H.S., 1989. Token passing: a simple conceptual model for connected speech recognition systems. Cambridge University, Technical Report CUED/F-INFENG/TR.38.

You May Also Find These Documents Helpful

  • Good Essays

    Nt1310 Unit 3 Assignment 1

    • 3299 Words
    • 14 Pages

    Screen reading software - describes what is on the screen using a synthesized vocal engine…

    • 3299 Words
    • 14 Pages
    Good Essays
  • Powerful Essays

    I have been working with Colin for the past year to assist him with auditory processing and phoneme awareness skills. Following completion of The Listening Program, Colin’s ears are now ‘activated’ to better differentiate between the many sounds of the English language. He can better identify, segment, manipulate and blend these sounds in words. Upon completion of The John’s Basic Reading Inventory, I will determine how I can best work with Colin to help him develop better word recognition and reading comprehension skills.…

    • 2312 Words
    • 15 Pages
    Powerful Essays
  • Satisfactory Essays

    Frist being: student will be provided with computer lap top that is equipped with apps for decoding, text-to speech, programs where the text is able to be read to them.…

    • 80 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    Csd 269 Study Guide Week 6

    • 2843 Words
    • 12 Pages

    8. What has research shown about the effectiveness of Manually Coded English (MCE) systems for literacy in deaf children?…

    • 2843 Words
    • 12 Pages
    Good Essays
  • Good Essays

    JNT2 Task 1 1

    • 787 Words
    • 4 Pages

    Data Analysis Techniques Used: District-trained evaluators came to the school and individually called students into a room to assess their phonemic understanding in 3 areas: letter sound fluency, beginning/first sound fluency, and phonemic segmentation. For letter sound fluency, students were shown a letter and had to correctly identify its sound. Then, each student was given 1 minute while assessors dictated words and students repeated sounds. (For example, the assessor might say “cat”, and the student must then return with a segmented sound of…

    • 787 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    |Promote communication in health, social care or children and young peoples settings | | |…

    • 398 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Soloba Observation

    • 403 Words
    • 2 Pages

    I can do this by providing one sentence with one miscue and see if the child can detect what the miscue is and give the correct word, practicing reading fluency by using word cards, and create a strategies checklist. Soloba can also self-monitor her progress and her own comprehension by using graphic organizers. Graphic organizers are useful in demonstrating relationships between texts and to what you already know. They can be scaffolded as see fit and can be used in many ways to help understand a text, vocabulary words, and writing in response to reading. Also, the teacher and/or reading specialist will use prompts while reading such as “Look at the last letter, Get yourself ready to say the first sound, Look for chunks/letters/sounds that you…

    • 403 Words
    • 2 Pages
    Good Essays
  • Good Essays

    Phonological Assessment

    • 622 Words
    • 3 Pages

    The article, “Phonological Assessment: A Systematic Comparison of Conversation and Picture Naming” by Lesley Wolk and Andrew W. Meisler, compares to methods of speech elicitation. Both of these methods have positive and negative aspects. Assessing phonological treatment as citing is easy and effective. It allows the Speech Pathologist to have control with a set list of words. However, a main weakness is that a citing procedure may not be accurate. A clinician can overestimate a child’s abilities. This leaves an unnatural sample. An advantage of obtaining a sample through spontaneous conversation is that it allows a sample from the most natural situation. However, a sample from children who do not want to communicate, are shy, or have behavioral problems will affect results.…

    • 622 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Text to Speech Engine

    • 432 Words
    • 2 Pages

    A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. Let us try to be clear. There is a fundamental difference between the system we are about to discuss here and any other talking machine (as a cassette-player for example) in the sense that we are interested in the automatic production of new sentences. This definition still needs some refinements. Systems that simply concatenate isolated words or parts of sentences, denoted as Voice Response Systems, are only applicable when a limited vocabulary is required (typically a few one hundreds of words), and when the sentences to be pronounced respect a very restricted structure, as is the case for the announcement of arrivals in train stations for instance. In the context of TTS synthesis, it is impossible (and luckily useless) to record and store all the words of the language. It is thus more suitable to define Text-To-Speech as the automatic production of speech, through a grapheme-to-phoneme transcription of the sentences to utter.…

    • 432 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    Goal: To help children to express themselves verbally, develop language skills, and learning through rhyme, repetition, and recognition.…

    • 1091 Words
    • 6 Pages
    Satisfactory Essays
  • Good Essays

    Curriculum Guides

    • 3978 Words
    • 16 Pages

    Objective of Strategy: building phonological awareness by segmenting and blending sounds and syllables as well as identifying phonemes in a word…

    • 3978 Words
    • 16 Pages
    Good Essays
  • Good Essays

    Wk7Assgn7NixL

    • 2825 Words
    • 13 Pages

    Morris, D. (2014). Diagnosis and correction of reading problems (2nd ed.) p. 101-102. New York, NY: Guilford Press.…

    • 2825 Words
    • 13 Pages
    Good Essays
  • Better Essays

    The two main components of reading identified in the SVR are ‘decoding’ and ‘comprehension’. Word recognition is necessary, but not sufficient for reading. The ability of one to pronounce printed words does not guarantee understanding of the text. Decoding means children may break down work easier but still find little understanding within many words. Likewise, language comprehension is also required, but not sufficient. If you cannot recognise the words you cannot recover the information you will need to…

    • 1263 Words
    • 6 Pages
    Better Essays
  • Good Essays

    Phonological awareness is the ability to attend explicitly to the phonological structure of spoken words. Failure to develop an adequate vocabulary, understanding of print concepts, or phonological awareness during the early (preschool) years constitutes some risks for reading difficulties. Phonological awareness skills are believed to be predictive of a child’s ease in learning to read. More than 20 percent of student’s struggle with some aspects phonological awareness, while 8-10 percent exhibit significant delays (Adams et al. 2.). Phonemic awareness is the insight that every spoken word can be conceived as a sequence of phonemes. It is the understanding that spoken language can be analyzed into strings of separate words and that words can be analyzed in sequences of syllables and phonemes within syllables. Young children begin to notice sound similarities in the words they hear. People who can apart words into sounds, recognize their identity, and put…

    • 754 Words
    • 4 Pages
    Good Essays
  • Better Essays

    Eat Task 1

    • 1171 Words
    • 5 Pages

    Reading, which is the ability to understand written language, is the most important goal of any comprehensive language arts program. The foundational skills that the students master in kindergarten and the first grade will determine the success, or failure, of the students reading abilities in the later grades. Often when students first enter school they are able to read some letters, their name, and perhaps a few sight words and other words that they see on a regular basis in their home environment (Roe & Ross, 2006). To nurture an understanding of reading, students must first develop their phonemic awareness, which is the relationship between words that are heard and the phonemic structure of language. Students then progress to learning more about phonics, the letter and sound correspondence used to identify words, which is very fundamental to independent, effortless, and rapid word recognition. After students…

    • 1171 Words
    • 5 Pages
    Better Essays

Related Topics