Reduction for Noise Robust Automatic Speech
Recognition
Seon Man Kim1, Ji Hun Park1, Hong Kook Kim1,*,
Sung Joo Lee2, and Yun Keun Lee2
1
School of Information and Communications
Gwangju Institute of Science and Technology, Gwangju 500-712, Korea
{kobem30002,jh_park,hongkook}@gist.ac.kr
2
Speech/Language Information Research Center
Electronics and Telecommunications Research Institute, Daejeon 305-700, Korea
{lee1862,yklee}@etri.re.kr
Abstract. In this paper, we propose a noise reduction method based on nonnegative matrix factorization (NMF) for noise-robust automatic speech recognition (ASR). Most noise reduction methods applied to ASR front-ends have been developed for suppressing background noise that is assumed to be stationary rather than non-stationary. Instead, the proposed method attenuates non-target noise by a hybrid approach that combines a Wiener filtering and an NMF technique. This is motivated by the fact that Wiener filtering and NMF are suitable for reduction of stationary and non-stationary noise, respectively. It is shown from ASR experiments that an ASR system employing the proposed approach improves the average word error rate by 11.9%, 22.4%, and 5.2%, compared to systems employing the two-stage mel-warped Wiener filter, the minimum mean square error log-spectral amplitude estimator, and NMF with a Wiener postfilter , respectively.
Keywords: Automatic speech recognition (ASR), Non-negative matrix factorization (NMF), Noise reduction, Non-stationary background noise, Wiener filter.
1
Introduction
Most automatic speech recognition (ASR) systems often suffer considerably from unexpected background noise [1]. Thus, many noise-robust methods in the frequency domain have been reported such as spectral subtraction [2], minimum mean square error log-spectral amplitude (MMSE-LSA) estimation [3], and Wiener filtering [4][5].
In general, conventional front-ends employing
References: ASRU, pp. 321–326 (2003) 2 3. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error logspectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985) In: IEEE Workshop on ASRU, pp. 67–70 (1999) 5 801–809 (2010) 6 Nature 401, 788–791 (1999) 7 matrix factorization with priors. In: ICASSP, pp. 4029–4032 (2008) 346 1066–1074 (2007) 10 speech enhancement in nonstationary noise environments. In: ICASSP, pp. 789–792 (1999)