Preview

Linear Predictive Coding

Powerful Essays
Open Document
Open Document
6950 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Linear Predictive Coding
Linear Predictive Coding
Jeremy Bradbury December 5, 2000

0 Outline
I. II. Proposal Introduction A. Speech Coding B. Voice Coders C. LPC Overview III. Historical Perspective of Linear Predictive Coding A. B. C. IV. V. VI. History of Speech & Audio Compression History of Speech Synthesis Analysis/Synthesis Techniques

Human Speech Production LPC Model LPC Analysis/Encoding A. B. C. D. E. Input speech Voice/Unvoiced Determination Pitch Period Estimation Vocal Tract Filter Transmitting the Parameters

VII. VIII.

LPC Synthesis/Decoding LPC Applications A. B. C. D. Telephone Systems Text-to-Speech Synthesis Voice Mail Systems Multimedia

IX. X.

Conclusion References

1

1 Proposal
Linear predictive coding(LPC) is defined as a digital method for encoding an analog signal in which a particular value is predicted by a linear function of the past values of the signal. It was first proposed as a method for encoding human speech by the United States Department of Defence in federal standard 1015, published in 1984. Human speech is produced in the vocal tract which can be approximated as a variable diameter tube. The linear predictive coding (LPC) model is based on a mathematical approximation of the vocal tract represented by this tube of a varying diameter. At a particular time, t, the speech sample s(t) is represented as a linear sum of the p previous samples. The most important aspect of LPC is the linear predictive filter which allows the value of the next sample to be determined by a linear combination of previous samples. Under normal circumstances, speech is sampled at 8000 samples/second with 8 bits used to represent each sample. This provides a rate of 64000 bits/second. Linear predictive coding reduces this to 2400 bits/second. At this reduced rate the speech has a distinctive synthetic sound and there is a noticeable loss of quality. However, the speech is still audible and it can still be easily understood. Since there is information loss



References: [1] [2] V. Hardman and O. Hodson. Internet/Mbone Audio (2000) 5-7. Scott C. Douglas. Introduction to Adaptive Filters, Digital Signal Processing Handbook (1999) 7-12. Poor, H. V., Looney, C. G., Marks II, R. J., Verdú, S., Thomas, J. A., Cover, T. M. Information Theory. The Electrical Engineering Handbook (2000) 56-57. R. Sproat, and J. Olive. Text-to-Speech Synthesis, Digital Signal Processing Handbook (1999) 9-11 . Richard C. Dorf, et. al.. Broadcasting (2000) 44-47. Richard V. Cox. Speech Coding (1999) 5-8. Randy Goldberg and Lance Riek. A Practical Handbook of Speech Coders (1999) Chapter 2:1-28, Chapter 4: 1-14, Chapter 9: 1-9, Chapter 10:1-18. Mark Nelson and Jean-Loup Gailly. Speech Compression, The Data Compression Book (1995) 289-319. Khalid Sayood. Introduction to Data Compression (2000) 497-509. Richard Wolfson, Jay Pasachoff. Physics for Scientists and Engineers (1995) 376-377. [3] [4] [5] [6] [7] [8] [9] [10] 22

You May Also Find These Documents Helpful

  • Powerful Essays

    Nt1330 Unit 1 Assignment

    • 2415 Words
    • 10 Pages

    It usually measured as the average number of bits per second. The lower the bit rate, the greater is the reliance on the speech production model. You can use fixed-rate or variable-rate speech coders depending on system and design constraints. For most real-time communication systems, a maximum fixed bit rate is specified and for non-real-time applications, such as voice Speech Coding Techniques and Standards 123 storage or for packet voice transmissions, Variable-rate coders are used. If the bit rate of the encoded bit stream is less, then smaller the bandwidth is required for transmission. Most speech coders are targeted for telephone communications, limiting the bandwidth between 300 and 3400 Hz, This requires a sampling frequency of 8 kHz and hence, this is referred to as narrow-band coding. The narrow-band speech coders that have been standardized have bit rates from 800 bps to 64 kbps. Wide-band speech signals have a bandwidth of 50–7000 Hz and are sampled at 16 kHz. For wide-band telephony, the current standards rates is from 8 kbps to 64…

    • 2415 Words
    • 10 Pages
    Powerful Essays
  • Good Essays

    Nt1310 Unit 9 Lab Report

    • 3131 Words
    • 13 Pages

    Speech morphing can be achieved by transforming the signal’s representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band.…

    • 3131 Words
    • 13 Pages
    Good Essays
  • Powerful Essays

    netwk 320 week 7 i lab

    • 4646 Words
    • 19 Pages

    A codec is a device capable of performing encoding and decoding on a digital signal. Each codec provides a different level of speech quality. The reason for this is that codecs use different types of compression techniques in order to require less bandwidth. The more the compression, the less bandwidth you will require. However, this will ultimately be at the cost of sound quality, as high-compression/low-bandwidth algorithms will not have the same voice quality as low-compression/high-bandwidth algorithms.…

    • 4646 Words
    • 19 Pages
    Powerful Essays
  • Satisfactory Essays

    Automatic speech recognition is the most successful and accurate of these applications. It is currently making a use of a technique called "shadowing" or sometimes called "voicewriting." Rather than have the speaker's speech directly transcribed by the system, a hearing person…

    • 416 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Text to Speech Engine

    • 432 Words
    • 2 Pages

    In speech generation, there are three basic techniques (in order of increasing complexity): 1) "waveform encoding “, 2) “analog formant frequency synthesis” and 3) "digital vocal tract modeling" of speech. Each of these techniques will be described in brief detail.…

    • 432 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    LEHMAN COLLEGE ♦ THE CITY UNIVERSITY OF NEW YORK Department of Speech-Language-Hearing Sciences ♦ Bedford Park Blvd West ♦ Bronx, NY 10468 ♦(718) 960-8138…

    • 1441 Words
    • 6 Pages
    Powerful Essays
  • Powerful Essays

    Automatic Sentence Generator

    • 3412 Words
    • 14 Pages

    Bibliography: [1] A. Bonafonte and J. Mariño, "Language Modeling using X-Grams", International Conference on Spoken Language Processing, ICSLP-96. [2] J. Deller, J. Proakis and J. Hansen, Discrete-Time Processing of Speech Signals. Macmillan Publishing Company.…

    • 3412 Words
    • 14 Pages
    Powerful Essays
  • Satisfactory Essays

    Checkpoint: Signals

    • 422 Words
    • 2 Pages

    Audio or voice supports applications based on sound, usually of the human voice. Primarily used in telephone communications Audio or voice signals also are used in other applications such as voice mail, radio, telemarketing, and teleconferencing. Voice quality is characterized mainly by its bandwidth used, the higher quality sound of course using the most bandwidth (Stallings, 2009).…

    • 422 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Text to Speech

    • 781 Words
    • 4 Pages

    At present most speech synthesis systems use raw text as their input which is understandable from a human point of view but problematic for the machines since the process of converting text to speech is very complex; in this paper we discuss the need for having a specific SSML tag for each “mention” (1st occurrence, 2nd occurrence) of a proper noun in the text or paragraph. We discuss that when a proper noun appears first time in the text, then it is spoken more prominently than its second or third or subsequent occurrence. We highlight the need for incorporating a specific tag in SSML to take care of this mention-case. The SSML format is a compromise between human and machine needs. SSML is often embedded in Voice-XML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. The advantage that SSML brings is that the designers of such language generation systems need only understand the basic SSML language and do not need specialist speech synthesis knowledge. Introduction Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. SSML directs all Text Analysis steps, providing a standard way to control aspects of speech such as pronunciation, acronym expansion, volume, pitch, rate, range, duration, pause, emphasis, etc., across different synthesis-capable platforms. The intended use of SSML is to improve the quality of synthesized content. Different markup elements impact different stages of the synthesis process. The markup may be produced either automatically, for instance via XSLT or CSS3 from an XHTML document, or by human authoring. Markup may be present within a complete SSML document or as part of a fragment embedded in another language, although no interactions with other languages are specified as…

    • 781 Words
    • 4 Pages
    Good Essays
  • Good Essays

    In a documentary called ‘The Method Man’ which was released in 1997, we see opinions of The Method and thoughts on Strasberg himself from people within the world of Hollywood. Additionally, we view footage of his teaching techniques and how it affected his students. It is clear from watching several interviews that there are juxtaposing opinions of him and his approach to teaching, “Nobody had a neu-tral reaction about him. People either were worshippers and acolytes, or they des-pised him”. Even his daughter said, “My father only liked beautiful women, genius-es and psychotics.”…

    • 650 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Working memory model

    • 759 Words
    • 3 Pages

    The phonological loop consists of two sub- systems; the phonological store, “the inner ear”, which allows acoustically coded information to be stored for a brief period of time (about two seconds), and the articulatory control system, “the inner voice”, which helps maintain information by sub-vocal information. The phonological loop also has a limited capacity.…

    • 759 Words
    • 3 Pages
    Good Essays
  • Best Essays

    Light, J., & Lindsay, P. (1992). Message-encoding techniques for augmentative communication systems: The recall performance of adults with severe speech impairments. Journal of Speech and Hearing Research, 35, 853-864.…

    • 4916 Words
    • 20 Pages
    Best Essays
  • Good Essays

    Data Communications

    • 1027 Words
    • 5 Pages

    Given the narrow (usable) audio bandwidth of a telephone transmission facility, a nominal SNR of 56dB (400,000), and a…

    • 1027 Words
    • 5 Pages
    Good Essays
  • Good Essays

    White, G. D. & Louie, G. J. (2005). The Audio Dictionary (3rd ed.) Seattle: University of Washington Press.…

    • 821 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    By being broken into very small samples, and then binary values are assigned to them…

    • 706 Words
    • 4 Pages
    Satisfactory Essays