IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-34, NO. 4, AUGUST
Speech Analysis/Synthesis Based on a Sinusoidal Representation
Abstract-A sinusoidal model for the speech waveform used to de- speech compression. The amplitudes and frequencies of is velop a new analysislsynthesis technique that is characterized by the the underlying sine waves are estimated using Kalman filamplitudes,frequencies, andphases of thecomponentsine waves. tering techniques, and each sine-wave phase is defined to These parameters are estimated from the short-time Fourier transform be the integral of the associated instantaneous frequency. using a simple peak-picking algorithm. Rapid changes in the highly Another sine-wave-based speech compression system is resolved spectral components are tracked using the concept“birth” of and “death” of the underlying sine waves. For a given frequency track being developed by Almeida and Silva [4]. In contrast to a cubic function isused to unwrap and interpolate the phase such thatHedelin’s approach, their system uses a pitch estimate to the phase track is m,aximally smooth. This phase function is applied to establisha harmonic set of sinewaves.Thesine-wave a sine-wave generator, which is amplitude modulated and added to the To other sinewaves to give the final speech output. The resulting syntheticphases are computed at the harmonic frequencies. compensate for any errors that might be introduced as a waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in result of the harmonic sine-wave representation, a residthe presence of noise the perceptual characteristics of the speech as ual waveform is codedalong with the underlying sinewell as the noise are maintained. In addition, it was found that the wave parameters. representation was sufficiently general that high-quality reproduction In this paper a sinusoidal model for the speech
References: [ l ] B. S. Atal and J . R. Remde, “A new model of LPC excitation for producingnatural-sounding speech at lowbit rates,” in Proc. Int. Con5Acoust.,Speech, Signal Processing, Paris,France,1982, p. 614. [2] H.Van Trees, Detection, Estimation and Modulation Theory, Part I . New York: Wiley, 1968, ch. 3 . [3] P. Hedelin, “A tone-oriented voice-excited vocoder,” in Proc. Int. Con5 Acoust., Speech,Signal Processing, Atlanta, GA, 1981, p. 205. [4] L. B. Almeida and F. M. Silva, “Variable-frequency synthesis: An improved harmoniccodingscheme,“in Proc. Int. Con$ Acoust., Speech, Signal Processing, San Diego, CA, 1984, p. 27.5.1. [5] R. J. McAulay and T. F. Quatieri, “Magnitude-only reconstruction using a sinusoidalspeech model,” in Proc. Int.Con5 Acoust., Speech, Signal Processing, San Diego, CA, 1984, p. 27.6.1. [6] J . L. Flanagan, “Parametric coding of speech spectra,” J . Acoust. SOC. Arner.,vol. 68, p. 412, 1980. [7] J. L. Flanagan and S. W. Christensen, “Computer studies on parametric coding of speech spectra,” J . Acoust. SOC. Amer., vol. 68, p. 420,1980. [8] T. F. Quatieri and R. J. McAulay, “Speech transformations based on asinusoidalrepresentation,” in Proc.Int. Con$ Acoust.,Speech, Signal Processing, Tampa, FL, 1985, p. 489. [9] R. J. McAulay and T. F. Quatieri, “Mid-rate coding based on a sinusoidal representation of speech,” in Proc. Int. Con$ Acoust., Speech, Signal Processing, Tampa, FL, 1985, p. 945. Thomas F. Quatieri (S’73-M’79)wasbornin Somerville,MA,on January 31,1952.He received the B.S. degree (summa cum laude) from Tufts University, Medford, MA, in 1973 and the S.M., E.E., and Sc.D. degrees from the Massachusetts Institute of Technology (M.I.T.), Cambridge, in 1975, 1977, and 1979, respectively. From 1973 to 1975 he was a Teaching Assistant and from 1975 to 1979 a Research Assistant in the area of digital signal processing, both within the Department of Electrical Engineering and Computer Scienceof M.I.T. His research for the Masters degree involved the design of two-dimensional digital filters and for the Sc.D. involved phase estimation with application to speech analysis/synthesis. He is presently a Research Staff Member at the M.I.T. Lincoln Laboratory where he is working on problems in digital signal processing with applications to speech communications and image processing. IEEE Dr. Quatieri is therecipient of the 1982PaperAwardofthe Acoustics, Speech, and Signal Processing Society for the best paper by an author under 30 years of age. He is a member of the IEEE Digital Signal Processing Technical Committeeand has served on the steering committee for the 1984 Digital Signal Processing Workshop. He is also a member of Tau Beta Pi, Eta Kappa Nu, and Sigma Xi. Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:023 UTC from IE Xplore. Restricon aply.