SALS-SIG Research Seminar

Home ButtonPeople ButtonDOTG Buttonltg buttonEmail MRI

Data-to-Speech Generation in Spoken Dialogue Systems


Speaker:

Esther Klabbers

IPO - Center for Research on User-System Interaction
Technical University Eindhoven
Date: Tuesday 16th December
Time: 11:30am
Place: E6A 357

Abstract:

I will be talking about data-to-speech generation in a spoken dialogue system called OVIS that gives train timetable information. There are two components involved in data-to-speech generation, viz. language generation and speech output generation.

The language generation module takes as input an abstract representation that represents what the system has to say to the user. Then it uses so-called syntactic templates to generate the sentences. These templates contain fixed parts called carriers and variable parts called slots. Prosody is computed by taking into account syntactic, semantic and discourse information.

The speech output generation module takes as input an enriched text, viz. a text with prosodic markers for accents and phrase boundaries, which the language generation module has created. Speech can be generated via two methods. One method is phrase concatenation which yields a high quality output, but is rather inflexible. In our approach, we use several prosodically distinct versions of the slot fillers, e.g. station names, so as to be able to deal with variation in accentuation and phrasing properly. The other method is diphone synthesis which is highly flexible, but less natural. Here, we try to improve the segmental quality of the diphones and tailor the prosody to the application. In this talk I will discuss both methods and how they use the information from the language generation module.


Enquiries: sals@mri.mq.edu.au

Last modified: November, 1997