Automatic speech recognition transformations of acoustic micro structure of speech signal into its implicit phonetic macro-structure. In other words, a speech recognition system is a speech-to-text conversion wherein the output of the system displays text corresponding to the recognized speech.
Typology of ASR systems
Several ASR systems can be developed, depending on:
• Speaker-dependent vs. independent
• Language constraints:
o Isolated word recognition
o Connected word recognition
o Continuous speech recognition
o Keyword spotting
Approaches to ASR
Pattern recognition approach
Pattern training and pattern comparison are the two essential steps in this approach. First feature measurement is done through Filter Bnk, LPC, DFT. Then pattern training is done by creation of a reference pattern derived from an averaging technique. Next step is comparing speech patterns with a local distance measure and a global time alignment procedure (DTW). Similarity scores are used to decide which the best reference pattern is.
Acoustic-Phonetic approach
This is also known as rule-based approach. Here we use knowledge of phonetics and linguistics to guide search process. Usually some rules are defined expressing everything (anything) that might help to decode. At each decision point, lay out the possibilities and apply rules to determine which sequences are permitted.
Template bases Approach
In this approach, a collection of prototypical speech patterns are stored as reference patterns which represents the dictionary of candidate words. An unknown spoken utterance is matched with each of these reference templates and a category of the best matching pattern is selected. DTW is used to find best possible alignment.
Stochastic Approach
This approach is based on the use of probabilistic models so that uncertain or incomplete information, such as confusable sounds,