Post by William DePalo [MVP VC++]audio
Ok, then there are two options - dictation or command and control.
In any case, this is only close to the truth (I've not seen the source of
the recognizer).
For dictation, the engine is looking at the incoming digitized audio and
trying to detect bits of words called phonemes. It compares the phonemes
against the phonemes it finds in the audio stream against those that make
up words in its dictionary. When it thinks it has got a match (deciding
that's quite complicated) it returns a guess as to the most likely word
and repeats until there is silence.
For command and control, the task is simplified because the recognizer
needs to compare the audio it hears against only the phrases that it has
in the active grammars.
The nice thing about using SAPI is that your friend needn't be concerned
with the details - SAPI operates at a much higher level. Besides they are
far too complicated for most of us without PhDs in the field and lots of
years of experience.
If your friend has a real need to understand this stuff at that level of
detail she can take a look at this
http://cmusphinx.sourceforge.net/html/cmusphinx.php
which is an open source project out of Carnegie Mellon University where a
lot of research into natural languages is done or this page from MS research
http://research.microsoft.com/en-us/groups/srg/
though I don't know if they publish the source (I doubt it).
Regards,
Will