People love to talk; but so far, speech has not fulfilled its potential as a mode of interacting with computers. Spontaneous speech is easy for people to produce and understand, but still quite difficult for machines to process robustly. Part of the problem is that spontaneous speech is highly variable: Utterances vary in dialect, pronunciation, perspective, word choice, information packaging, voice quality, and prosody. Much variability in human speech is meaningful but ignored by current systems. Some variability, we propose, can be implicitly and explicitly shaped by the speech of a partner (whether human or computer).

This 4-year project takes an innovative and multidisciplinary approach to characterizing how speech production and interpretation are coordinated in dialogs. It examines how speakers adapt to both human and computer conversational partners, and will produce prototype systems that flexibly adapt to a human user over the course of a dialog. Adaptations include those that make processing easier for both partners (and that may be fairly automatic), such as converging on the same wording or dialect; adaptations also include adjustments made by one partner explicitly "for" the other. Findings from human dialog will be applied to systems that use speech recognition and generation, with the goals of (1) adapting the system's vocabulary, dialect, and perspective to the user's needs whenever feasible (responsive generation), and (2) shaping users to spontaneously adapt their utterances to forms that the system can process more robustly (directive generation).

The project brings together methods and theoretical perspectives from computer science, linguistics and psychology to advance theories and improve applications. Our methods include controlled experiments; data collection in the lab, in the field, and on the Web; corpus analysis; simulation studies; and prototyping and evaluation of spoken dialog systems. Three applications are planned: a picture matching game, a PDA-based calendar system, and a telephone-based course evaluation system for Stony Brook's undergraduate community.

Broader impacts: The project will enhance training of young scientists in computer science, linguistics and psychology, and will include underrepresented groups in basic research and user-interface engineering. The project aims to provide a scientific foundation for developing flexible and robust spoken dialog systems that serve the needs of diverse users.


Acknowledgement This material is based upon work supported by the National Science Foundation under Grant No. 0325188. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation