Discussion:
SAPI 5.1 WAV File Format
(too old to reply)
Hans
2009-12-09 21:45:43 UTC
Permalink
Apparently, you can simply read the event data off the end of the file which
SAPI nicely places for you there.

I couldn't find any documentation on this anywhere (the file format) so I
looked into reading it raw and it turned out to be pretty simple (so far.)

First, you want to familiarize yourself with a basic WAV file's format:

https://ccrma.stanford.edu/courses/422/projects/WaveFormat/

I wondered how it was that SAPI could be dumping data to the file and it not
screwing up WAV players but I guessed that it was tacking it on at the end
and it turns out that's exactly what it does.

Once you read past the 'data' section of the WAV file you run into a special
SubChunkID (see WAV format info in URL above) of 4 bytes in length that
spells 'EVNT', I can tell you I was happy to see that ;).

The SubChunkID you should encounter is: 45 56 4E 54

After the SubChunkID, the next 4 bytes represent the length of the EVNT data
chunk (don't forget your endian-ness, i.e. in hex, these 4 bytes '00 08 00
00' mean '2048'.)

Now for the best part. All events are stored as 24-byte chunks that
literally represent an SPEVENT object. The format, so far, is as below:

Event ID is 2 bytes (ENUM)
ParamType is 2 bytes (ENUM)
StreamNumber is 4 bytes (ULONG)
AudioStreamOffset is 8 bytes (ULONGLONG)
wParam is 4 bytes (WPARAM)
lParam is 4 bytes (LONG_PTR)

An example from a WAV file generated by SAPI 5.1 I'm using to debug is the
very first event after the sub chunk ID and the sub chunk length:

EVENT: 01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00

ID: 01 00 = SPEI_START_INPUT_STREAM
PARAMTYPE: 00 00 = SPET_LPARAM_IS_UNDEFINED
STREAMNUMBER: 01 00 00 00 = 1
AUDIOSTREAMOFFSET: 00 00 00 00 00 00 00 00 = 0
WPARAM: 00 00 00 00 = 0
LPARAM: 00 00 00 00 = 0


Now, some events, such as the SPEI_VOICE_CHANGE event have a paramtype that
means data is tacked on after the particular event, in this case denoted by
the param type SPET_LPARAM_IS_TOKEN.

An example from the same WAV file is:

EVENT: 03 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 9E 00 00 00 18 00
00 00

ID: 03 00 = SPEI_VOICE_CHANGE
PARAMTYPE: 01 00 = SPET_LPARAM_IS_TOKEN
STREAMNUMBER: 01 00 00 00 = 1
AUDIOSTREAMOFFSET: 00 00 00 00 00 00 00 00 = 0
WPARAM: 9E 00 00 00 = 158
LPARAM: 18 00 00 00 = 24

The event specifies that the next 158 bytes of data represent the token for
the voice (a string denoting registry information in this case), I don't
know what the 24 indicates in lParam:

TOKEN DATA: 158 bytes: 48 00 4B 00 45 00 59 00 5F 00 4C 00 4F 00 43 00 41 00
4C 00 5F 00 4D 00 41 00 43 00 48 00 49 00 4E 00 45 00 5C 00 53 00 4F 00 46
00 54 00 57 00 41 00 52 00 45 00 5C 00 4D 00 69 00 63 00 72 00 6F 00 73 00
6F 00 66 00 74 00 5C 00 53 00 70 00 65 00 65 00 63 00 68 00 5C 00 56 00 6F
00 69 00 63 00 65 00 73 00 5C 00 54 00 6F 00 6B 00 65 00 6E 00 73 00 5C 00
4D 00 53 00 2D 00 41 00 6E 00 6E 00 61 00 2D 00 31 00 30 00 33 00 33 00 2D
00 32 00 30 00 2D 00 44 00 53 00 4B 00 00 00
PADDING: 2 bytes: 00 08

It is important to realize (and this took me 30 minutes to realize - I know,
slow...) that the token data must end on a 4-byte boundary. So, since 158
doesn't fall on a 4 byte boundary, there are two padding bytes added. I do
not know why they aren't both 00 but hey, I'm just figuring this stuff out
right now.

If I run into any weirdness, I'll post more, but this seems to be pretty
much it.

Hope this helps someone some day.

Hans :)
Klaus Jakobsen
2020-04-07 13:42:00 UTC
Permalink
Hi there. Did you find out more about the EVNT format in the WAV files? I can't find any info on it anywhere...
Loading...