Preview

noise reduction

Powerful Essays
Open Document
Open Document
3029 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
noise reduction
Non-negative Matrix Factorization Based Noise
Reduction for Noise Robust Automatic Speech
Recognition
Seon Man Kim1, Ji Hun Park1, Hong Kook Kim1,*,
Sung Joo Lee2, and Yun Keun Lee2
1
School of Information and Communications
Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
{kobem30002,jh_park,hongkook}@gist.ac.kr
2
Speech/Language Information Research Center
Electronics and Telecommunications Research Institute, Daejeon 305-700, Korea
{lee1862,yklee}@etri.re.kr

Abstract. In this paper, we propose a noise reduction method based on nonnegative matrix factorization (NMF) for noise-robust automatic speech recognition (ASR). Most noise reduction methods applied to ASR front-ends have been developed for suppressing background noise that is assumed to be stationary rather than non-stationary. Instead, the proposed method attenuates non-target noise by a hybrid approach that combines a Wiener filtering and an NMF technique. This is motivated by the fact that Wiener filtering and NMF are suitable for reduction of stationary and non-stationary noise, respectively. It is shown from ASR experiments that an ASR system employing the proposed approach improves the average word error rate by 11.9%, 22.4%, and 5.2%, compared to systems employing the two-stage mel-warped Wiener filter, the minimum mean square error log-spectral amplitude estimator, and NMF with a Wiener postfilter , respectively.
Keywords: Automatic speech recognition (ASR), Non-negative matrix factorization (NMF), Noise reduction, Non-stationary background noise, Wiener filter.

1

Introduction

Most automatic speech recognition (ASR) systems often suffer considerably from unexpected background noise [1]. Thus, many noise-robust methods in the frequency domain have been reported such as spectral subtraction [2], minimum mean square error log-spectral amplitude (MMSE-LSA) estimation [3], and Wiener filtering [4][5].
In general, conventional front-ends employing



References: ASRU, pp. 321–326 (2003) 2 3. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error logspectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985) In: IEEE Workshop on ASRU, pp. 67–70 (1999) 5 801–809 (2010) 6 Nature 401, 788–791 (1999) 7 matrix factorization with priors. In: ICASSP, pp. 4029–4032 (2008) 346 1066–1074 (2007) 10 speech enhancement in nonstationary noise environments. In: ICASSP, pp. 789–792 (1999)

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Basically what silent suppression is a way to save bandwidth when using voice communications like voice IP services which is needed especially for a large company like apple or Microsoft who use these VoIP services to speak with their customers from various countries. If a company were to use a phone service to get in contact with everyone around the world it would not be efficient while lines would be jammed and the service would be horrible. Silent suppression allows intermittent data to be sent through easy especially over the internet and when doing…

    • 538 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Without proper measures to decrease the amount of noise, the signals from the microphone would be so interfered that the microprocessor would be unable to detect the presence of any sound from the environment. The biggest source of the noise were the motors. To decrease that, the power source of the microphone circuit had to be separated from that of the motor driving circuit. To complete this aim, a total of two pack of batteries were used, as the microphone was powered by its own set of batteries. However, that was not suffice to completely eliminate the noise in the circuit. Considerable amount of noise still existed when the motor started running, which was probably from Arduino board and from the batteries themselves. Without proper treatment the noise would still be able to cause an unexpected cut out of the motors, as they exceeded the threshold that turned the car on and off. Further filtration of the noise in the circuit was completed by the program that made decisions. The first step we took was to increase the voltage of the threshold, so that the probability of a noise reaching the threshold was significantly lowered. However, noise with extremely high voltage still existed. Therefore, another step was taken to enable the microprocessor to distinguish sound from the environment from internal noise. This was done by making sure the car only stops when the signal from the microphone exceeded the threshold twice in two consecutive readings. This method worked because the distribution of noise was sporadic and not continuous, which meant that there was little chance that it could maintain a high value in two consecutive readings of value. In contrast, the noise from the environment made to control the car could last a span of time, which was sure to pass this check and trigger the…

    • 1184 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    Automatic speech recognition is the most successful and accurate of these applications. It is currently making a use of a technique called "shadowing" or sometimes called "voicewriting." Rather than have the speaker's speech directly transcribed by the system, a hearing person…

    • 416 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Text to Speech Engine

    • 432 Words
    • 2 Pages

    The study process is initialized by going through different web sites and blogs in order to know about the Text-To-Speech methodology. We have tried to understand the purpose of voice synthesis. Whatever we have discovered from the Internet is described below.…

    • 432 Words
    • 2 Pages
    Good Essays
  • Good Essays

    both music conditions and the changing-state speech compared to quiet and steady-state speech conditions. The lack of…

    • 6361 Words
    • 26 Pages
    Good Essays
  • Satisfactory Essays

    1. First, I will begin by getting everything I need such as coins, a jar, a hard back book, and paper.…

    • 1097 Words
    • 5 Pages
    Satisfactory Essays
  • Best Essays

    Light, J., & Lindsay, P. (1992). Message-encoding techniques for augmentative communication systems: The recall performance of adults with severe speech impairments. Journal of Speech and Hearing Research, 35, 853-864.…

    • 4916 Words
    • 20 Pages
    Best Essays
  • Powerful Essays

    Audiology

    • 1156 Words
    • 5 Pages

    "Speech & Hearing Science :: University of Illinois at Urbana-Champaign." Audiology Clinic ::. N.p., n.d. Web. 11 Aug. 2012. <http://shs.illinois.edu/outreach/clinics/audiology.aspx>.…

    • 1156 Words
    • 5 Pages
    Powerful Essays
  • Better Essays

    (HCW). This novel wavelet can overcome the deficit of CMW in not detecting all the available R peaks and can overcome the deficit…

    • 1588 Words
    • 7 Pages
    Better Essays
  • Better Essays

    Non-native English speaker result from the common linguistic phenomenon in which non-native users of any language tend to carry the intonation, phonological processes and pronunciation rules from their mother tongue into their English speech. They may also create innovative pronunciations for English sounds not found in the speaker's first language. Current English speech recognition systems are commonly trained from speech data of native English speakers. Although these systems can work very well for native English speakers, their performances drop dramatically for nonnative speakers. In general, it is difficult to train speech models for each foreign accent due to wide varieties of accent, different proficiency levels of English and limited amounts of available data (MacDonald, 1989).…

    • 1620 Words
    • 7 Pages
    Better Essays
  • Good Essays

    ● Used the Berlin speech emotion database Emo-DB: http://www.expressive-speech.net/emodb/ ● Feature extraction : openSMILE ● Trained multi-class SMO classifer in Weka ○ Accuracy : 82.4%…

    • 1309 Words
    • 6 Pages
    Good Essays
  • Powerful Essays

    Dfine2 Userguide

    • 11906 Words
    • 48 Pages

    User Guide © 2008 Josh Haftel Chapter 1 Introduction Introduction to Dfine® 2.0 and the User’s Manual Dfine® 2.0 Chapter 1: Introduction User Guide The result? Dfine 2.0 is a powerful, yet easy-to-use tool that anyone can use to perform high quality noise reduction without needing to understand the complex underlying theory. Because Dfine 2.0 is a plug-in for Adobe Photoshop, Adobe Photoshop Lightroom, Apple Aperture and other compatible applications, you must have Photoshop or a compatible application installed on your computer.…

    • 11906 Words
    • 48 Pages
    Powerful Essays
  • Powerful Essays

    Oscillator

    • 7822 Words
    • 32 Pages

    References: 1. Aron Kain, Final Report for Bias Dependence Noise Modeling of Heterojunction Bipolar Transistors, USAF SBIR Phase II (PIIN), F33615-95-C-1707, November 1997. Issued by USAF/AFMC/ASC, Wright Laboratory WL/AAKE BLD 7, 2530 C ST, Wright-Patterson AFB, OH 45433-7607. 2. Robert A. Pucel and Ulrich L. Rohde, "An Accurate Expression for the Noise Resistance Rn of a Bipolar Transistor for Use with the Hawkins Noise Model," IEEE Microwave and Guided Wave Letters, Vol. 3, No. 2, February 1993, pp. 35-37. 3. Robert A. Pucel, W. Struble, Robert Hallgren and Ulrich L. Rohde, "A General Noise Deembedding Procedure for Packaged Two-Port Linear Active Devices," IEEE Transactions on Microwave Theory and Techniques, Vol. 40, No. 11, November 1993, pp. 2013-2024. 4. C. N. Rheinfelder et alia, "47-GHz SiGe MMIC Oscillator," 1999 IEEE MTT-S Digest, pp. 58. 5. V. Rizzoli, F. Mastri, and C. Cecchefti, "Computer-Aided Noise Analysis of MESFET and HEMT Mixers," IEEE Transactions on Microwave Theory and Techniques, Vol. MTT-37, September 1989, pp. 1401-1410. 6. V. Rizzoli and A. Lippadni, "Computer-Aided Noise Analysis of Linear Multiport Networks of Arbitrary Topology," IEEE Transactions on Microwave Theory and Techniques, Vol. MTT33, December 1985, pp. 1507-1512. 7. V. Rizzoli, F. Mastri, and D. Masotti, "General-Purpose Noise Analysis of Forced Nonlinear Microwave Circuits," published in Military Microwave, 1992. 8. Ulrich L. Rohde, "Improved Noise Modeling of GaAs FETs," Microwave Journal, November 1991, pp. 87-101 (Part I) and December 1991, pp. 87-95 (Part II). 9. Ulrich L. Rohde, Chao-Ren Chang, and Jason Gerber, "Design and Optimization of LowNoise Oscillators Using Nonlinear CAD Tools," 1994 IEEE International Frequency Control Symposium, pp. 548-554. 10. Ulrich L. Rohde, "Oscillator Design for Lowest Phase Noise," Microwave Engineering Europe, May 1994, pp. 31-40. 11. Ulrich L. Rohde, Microwave and Wireless Synthesizers: Theory and Design (New York: John Wiley & Sons, 1997, ISBN 0-471-52019-5), Section 5-3 (Low-Noise Microwave Synthesizers) and Appendix B (A General-Purpose Nonlinear Approach to the Computation of Sideband Phase Noise in Free-Running Microwave and RF Oscillators). 12. Ulrich L. Rohde and David P. Newkirk, RF/Microwave Circuit Design for Wireless Applications, by John Wiley & Sons, April 2000, ISBN 0471298182. 13. F. X. Sinnesbichler et alia, "A 50-GHz SiGe HBT Push-Push Oscillator," 1999 IEEE MTT-S Digest, pp. 9-12.…

    • 7822 Words
    • 32 Pages
    Powerful Essays
  • Good Essays

    This signal represents a great challenge for automatic speech applications. Performance of automatic speech recognition (ASR) and automatic speaker identification (ASI) systems , has been shown to degrade significantly in the presence of such signal.…

    • 959 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Me and You

    • 268 Words
    • 2 Pages

    TCD will contribute the expertise in the area of speech synthesis and social interaction, as well as providing an ideal environment within which to robustly test the tools developed for multimodal input recognition, multimodal fusion, fission and…

    • 268 Words
    • 2 Pages
    Satisfactory Essays

Related Topics