SALS-SIG Research Seminar

Home ButtonPeople ButtonDOTG Buttonltg buttonEmail MRI

A Formal Foundation for Databases of Annotated Speech


Speaker:

Steven Bird

Linguistic Data Consortium, University of Pennsylvania
Date: Friday 28th January 2000
Time: 14:00--15:30
Place: E6A 357, Macquarie University

This seminar is jointly hosted by SALS-SIG and the Department of Computing.

Abstract:

Databases of annotated speech have been a critical component of research in the speech sciences for some years. Today, these corpora are being created and deployed in a rapidly expanding set of languages, disciplines and technologies. A wealth of formats and tools have sprung up around this enterprise, a diversity which at once facilitates and frustrates progress. The linguistic annotation page has drawn attention to the scale of ongoing activity. Despite the explicit formats and well-documented user interfaces which are referenced there, insights about the structure of the annotations themselves are often buried in coding manuals and internal data structures. Today the tools and databases proliferate, with little apparent prospect for interoperability and reusability, and escalating infrastructure costs for resource-rich language technologies.

Recently, Mark Liberman and I have proposed a solution to this problem which does not depend on the imposition of standard tools and formats. We have described a formal framework for speech annotation based on labelled acyclic digraphs (where `annotation' is broadly construed to cover any kind of symbolic description of portions of a pre-existing linguistic object). These `annotation graphs' offer an extremely simple method for representing a wide variety of complex annotation structures. With this model we are not fielding a new entry into the existing bazaar of formats and tools. Rather, we seek to recapitulate a development in the database world 30 years ago, which saw the emergence of the relational model and the so-called ``three-level architecture''. With an API based on the annotation graph model, the burden of re-formatting databases and re-engineering tools is substantially alleviated, instead we have the expected properties of multiple views and data-independence. This facilicates stable progress on new tools and databases, plus a high degree of reusability for existing tools and databases. This is a critical development, given the expense of creating and maintaining these tools and databases. With colleagues at NIST, MITRE, CMU and DGA Paris, we are developing parallel, open-source versions of this API in tcl/tk, Perl/tk, C++ and Java. New task-specific user interfaces and an XML interchange format are in development.

In this talk I will present the formalism and show how its algebraic properties are well-suited to databases of annotated speech. I will show how the model can be mapped onto a simple relational structure, permitting the use of existing technologies for efficient storage and access, and facilitating substantive comparison with Emu, a popular speech annotation system developed at Macquarie by Jonathan Harrington and Steve Cassidy. I will briefly describe some common kinds of query and their translation into datalog, and report on some shortcomings of first-order languages (joint work with Steve Cassidy). Time permitting, I will describe a query language based on Blackburn's Nominal Tense Logic, and some new research on data provenance (with Peter Buneman, Penn) and on computational infrastructure for empirical studies of communicative interaction (with Brian MacWhinney, CMU). The talk will begin with a brief overview of the work of the Linguistic Data Consortium.

Bio:

Steven Bird is associate director of the Linguistic Data Consortium (LDC) where his primary activity is the development of new models, tools and formats for databases of text, audio and video. He is a principal investigator on two new projects sponsored by the US National Science Foundation, whose shared goal is to provide computational infrastructure for large-scale empirical studies of communicative interaction across the social sciences. With Jonathan Harrington (Macquarie) Steven is editing a special issue of the journal Speech Communication on the topic of speech annotation and corpus tools. Steven also teaches in the computer science and linguistics departments at the University of Pennsylvania.

Steven studied computer science at Melbourne University, where he gained his BSc and MSc before moving to Edinburgh. His PhD thesis (Edinburgh 1990), on a logical model for representing and reasoning about the structure of speech, was published by Cambridge University Press (1995). From 1990-98 Steven conducted postdoctoral research in Edinburgh, focussing on finite-state natural language processing and on computational support for the study of undescribed languages. The latter involved a 2.5-year field trip to Cameroon (West Africa) where, equipped with a linux laptop and portable phonetic hardware, Steven developed new techniques for collecting, analyzing and disseminating annotated speech data.


Enquiries: sals@mri.mq.edu.au

Last modified: 23rd January 2000