Hidden Markov models

The pronunciation of a word, in all its variant forms, can be seen as a stochastic process (involving chained events): in a sequence of slices through a speech spectrogram, the probabilities at each step depend on the outcome of previous steps. Each time the process is applied to the word, it generates a slightly different acoustic specification, within the limits of the model. Once a speech recogniser has been provided with Markov models for the words it contains, it can use these to evaluate the properties of a new speech event. When someone speaks a word into the recognition system, the acoustic event can be treated as if it were the output of a hidden Markov model. The output of the model is known (i.e. the event), but not the model itself (i.e. it is hidden), and the job of the recogniser is to reconstruct it.

Link
Utrecht Lexicon of Linguistics