The Speech Zone: Simple TTS application

Ok, so this is my first post on this blog dedicated to speech related topics, mainly, but it can degenerate by touching other topics also.

First of all, allow me to introduce myself: I am a software developer (working with embedded systems at the moment) that has some experience with SAPI, and since the documentation for SAPI is not maintained properly, I thought for a long time to make a webpage in which I can share my knowledge. And what better way to do this but by the use of blogs (it's somewhat fashionable).

I've had a lot of trouble using SAPI for some advanced stuff, lots and lots of tests, and I wouldn't wish for anyone else to go through these trials of patience and nerve crushing despair (ok, I'm exaggerating, but it was at least annoying at a moment in time).

Good, now that we're done with all that, it's time to get to the good stuff: SAPI (Speech Application Programming Interface) is an interface provided by Microsoft, which can be used to develop speech enabled applications. It provides a level of abstraction between the application and the speech engines (ASR and TTS - I'll explain what's with these later).

There are two ways for an application to become speech enabled: one way would be for the application to be able to speak a piece of text (TTS - Text To Speech), the other way would be for the application to be able to recognize the text from a spoken sentence (ASR - Automatic Speech Recognition). I'm not going to explain how these two work, if you want to find out more, you can use Google. It's enough to say that SAPI provides the tools to use these lower layers (speech engines) to create complex speech applications.

SAPI is a very cool thingy but don't expect to just call a few methods and get an excellent speech application, if you want to do something complex, there's a lot of stuff that the documentation just "omits", and you have to do a lot of tests to get to the bottom of it.

Ok, that's enough praising, it's time to do something practical.

First step would be, if you want to create something that contains speech, to download the Microsoft Speech SDK (it's free, so, no money needed). You can download it from the Microsoft site. It installs the binaries and header files for SAPI, some sample applications, a few voices for the TTS, and a recognition engine from Microsoft, not very good, but it does the job.

I'm getting bored here just talking about the interface, I'm going to post some code now. It's just a simple TTS application, console, just trying to speak something:

#include "windows.h"
#include "sphelper.h"

int main(void)
{
HRESULT hr = -1;

CComPtr cpVoice;

//initialize COM
hr = ::CoInitialize(NULL);
if(FAILED(hr))
{
printf("COInitialize FAILED \n");
return -1;
}

//create the ISpVoice object: this is the TTS object
hr = cpVoice.CoCreateInstance(CLSID_SpVoice);
if(FAILED(hr))
{
printf("CoCreateInstance FAILED \n");
return -1;
}

//set events that you are interested in
hr = cpVoice->SetInterest(SPFEI_ALL_TTS_EVENTS, SPFEI_ALL_TTS_EVENTS);
if(FAILED(hr))
{
printf("SetInterest FAILED \n");
}

//set object that will receive notifications from the TTS engine
hr = cpVoice->SetNotifyWin32Event();
if(FAILED(hr))
{
printf("SetNotifyWin32Event FAILED \n");
return -1;
}

//speak some text
hr = cpVoice->Speak(L"Hello World", NULL, NULL);
if(FAILED(hr))
{
printf("Speak FAILED \n");
return -1;
}

cpVoice.Release();

::CoUninitialize();

return 0;
}

Ok, that was all. This should speak "Hello World". Pretty simple, eh? Maybe tomorrow I'll have time to make a more complex application, with event handling, and other stuff.

The Speech Zone

luni, 29 septembrie 2008

Simple TTS application

Niciun comentariu:

Arhivă blog

Despre mine