Discussion:
phoneme extraction using stream
(too old to reply)
metar
2010-09-27 10:24:53 UTC
Permalink
hey speech community :)

since im pretty new to the whole speechtechnology-community i was
looking for an easy and flexible way to extract phonemes out of an
audiofile.
I came over James N. Anderson's "Quick'n'Dirty Phoneme Extractor"
based on SAPI which im using as basis.
His Project works very nice for standalone .wav files.

What im going to try is to feed the SAPI with a audiostream which i
receive via tcp socket, generate the phonemes+timings and passing
those data over via another tcp link..

Receiving of the raw-Stream works fine but how can I pass my stream
over to SAPI?
Is it possible to feed raw-audio files? since my microphone stream is
raw..

thanks for any answers :O
metar
2010-10-12 12:57:24 UTC
Permalink
hey there ;)
too bad that no one has any idea or hint

well, im still stuck at the stream input.
for testing purpose im using a *.wav-file which I wrote into a buffer.
before i started to use a custom audio objects I've tested everything
with BindToFile to load a local wav-file which worked like a charm.
now with the custom object I always get SPEI_END_SR_STREAM and no
output ;(
so i assume that my Read-function is not working correct?!?
well, he is reading in something.. but look at the size, its much more
higher than the actual inputfile

output:
..
Size of input: 831532
Bytes read: 25427252
OK - SPEI_END_SR_STREAM


well, here is a lilttle code snippet of the important parts:


if (cpInputStream.CoCreateInstance(CLSID_SpStream) != S_OK)
ERROR_EXIT("Cannot create recognition engine.\n");

// will automatically create an IStream object from global memory.
if (CreateStreamOnHGlobal(NULL, TRUE, &cpStreamWithResult) != S_OK)
ERROR_EXIT("CreateStreamOnHGlobal failed.\n");

//CSpStreamFormat sInputFormat;
if (sInputFormat.AssignFormat(SPSF_22kHz16BitMono) != S_OK)
ERROR_EXIT("!!! - Format assign failed.");

if (cpInputStream->SetBaseStream(cpStreamWithResult,
sInputFormat.FormatId(), sInputFormat.WaveFormatExPtr()) != S_OK)
ERROR_EXIT("!!! - SetBaseStream failed.\n");



FILE * pFile;
long lSize;
char *mybuffer;

pFile = fopen ( "D:\\16bit_sign.wav" , "rb" );
if (pFile==NULL) exit (1);

// obtain file size.
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);

// allocate memory to contain the whole file.
mybuffer = (char*) malloc (lSize);
if (mybuffer == NULL) exit (2);

// copy the file into the buffer.
size_t ret = fread(mybuffer, lSize, 1, pFile);
printf("fread returned %d\n", ret);
printf("Size of input: %ld\n", lSize);



// is there any data??
STATSTG stats;
if (cpStreamWithResult->Stat(&stats, STATFLAG_DEFAULT) != S_OK)
ERROR_EXIT("!!! - Stats() error.\n");

// rewind to start
LARGE_INTEGER pos;
pos.QuadPart = 0;

if (cpStreamWithResult->Seek( pos, STREAM_SEEK_SET, NULL) != S_OK)
ERROR_EXIT("!!! - Seek() error.\n");
printf("Stream starts at: %ld\n", pos);

ULONG cbRead = 0;
if (cpStreamWithResult->Read(mybuffer, NULL, &cbRead) != S_OK)
ERROR_EXIT("!!! - Read() error.\n");
printf("Bytes read: %ld\n", &cbRead);

// Create recognition engine
if (cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer) != S_OK)
ERROR_EXIT("!!! - Cannot create recognition engine.");

// Set the audio input to our object.
if (cpRecognizer->SetInput(cpInputStream, FALSE) != S_OK)
ERROR_EXIT("!!! - Cannot set input for recognition engine.");


// Create recognition context
if (cpRecognizer->CreateRecoContext(&cpRecoContext) != S_OK)
ERROR_EXIT("Cannot create recognition context.");

// Set state for recognition engine
if (cpRecognizer->SetRecoState(SPRST_ACTIVE) != S_OK)
ERROR_EXIT("Cannot set state for recognition engine.")

....

Loading...