People love to talk; but so far, speech has not fulfilled its
potential as a mode of interacting with computers. Spontaneous speech
is easy for people to produce and understand, but still quite
difficult for machines to process robustly. Part of the problem is
that spontaneous speech is highly variable: Utterances vary in
dialect, pronunciation, perspective, word choice, information
packaging, voice quality, and prosody. Much variability in human
speech is meaningful but ignored by current systems. Some
variability, we propose, can be implicitly and explicitly shaped by
the speech of a partner (whether human or computer).
This 4-year project takes an innovative and multidisciplinary approach
to characterizing how speech production and interpretation are
coordinated in dialogs. It examines how speakers adapt to both human
and computer conversational partners, and will produce prototype
systems that flexibly adapt to a human user over the course of a
dialog. Adaptations include those that make processing easier for both
partners (and that may be fairly automatic), such as converging on the
same wording or dialect; adaptations also include adjustments made by
one partner explicitly "for" the other. Findings from human dialog
will be applied to systems that use speech recognition and generation,
with the goals of (1) adapting the system's vocabulary, dialect, and
perspective to the user's needs whenever feasible (responsive
generation), and (2) shaping users to spontaneously adapt their
utterances to forms that the system can process more robustly
(directive generation).
The project brings together methods and theoretical perspectives from
computer science, linguistics and psychology to advance theories and
improve applications. Our methods include controlled experiments; data
collection in the lab, in the field, and on the Web; corpus analysis;
simulation studies; and prototyping and evaluation of spoken dialog
systems. Three applications are planned: a picture matching game, a
PDA-based calendar system, and a telephone-based course evaluation
system for Stony Brook's undergraduate community.
Broader impacts: The project will enhance training of young scientists
in computer science, linguistics and psychology, and will include
underrepresented groups in basic research and user-interface
engineering. The project aims to provide a scientific foundation for
developing flexible and robust spoken dialog systems that serve the
needs of diverse users.
Acknowledgement
This material is
based upon work supported by the National Science Foundation under
Grant No. 0325188. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s)
and do not necessarily reflect the views of the National Science
Foundation