Nicole Yankelovich




When: 1993-1997.

What: SpeechActs was an early conversational speech application framework that incorporated natural language processing. The system included multiple telephone-based speech applications including email, calendar, reminders, weather, stock quotes, and others.

Where: Speech Applications Group, Sun Microsystems Laboratories.

How: Principal Investigator. Tasks included project/people management, recruiting, user research, interaction design, grammar development, usability testing, creation of design guidelines, paper writing, documentation, presentations, and software demonstrations.

About the Project

I started the Speech Applications group in Sun Labs at the request of the Lab director, recruiting and hiring software engineers and natural language experts to work on the project. We were the first group that was not a speech engine developer to create speech applications using different recognizers and synthesizers as plug-in components.

SpeechActs was the flagship project of the Speech Applications group during my tenor as the Principal Investigator. SpeechActs was a framework for developing telephony speech applications with conversational interaction. This project involved research on fundamental speech user interface issues and as well as architectures for speech applications. This 1995 video provides a flavor for the style of interaction that was possible with the system.

Based on my experience conceptualizing, designing, and user testing SpeechActs, we developed design guidelines and practical hints for speech application developers. These are summarized below, but can also be found in my various publications related to speech interface design.

Simulate Conversation

Herb Clark says that “speaking and listening are two parts of a collective activity.” A major design challenge in creating speech applications, therefore, is to simulate the role of the speaker/listener convincingly enough to produce successful communication with the human collaborator. Here are some of the techniques we applied in our speech user interface design effort.

Study Human Dialogs

In most of our application development projects, we begin the design process by studying human-human dialogs in the domain of the application. In one pilot field study, for example, we analyzed telephone conversations between two sales managers and their assistant. The assistant interacted with Sun’s Calendar Manager while talking to the managers. We discovered, that although she had the graphical interface in front of her, neither she or the managers used the vocabulary from the interface. For example, relative date expressions and anaphoric references were common conversational elements that do not appear in the graphical interface.

Use Conversational Devices

Adhering to conversational conventions helps improve the speech interface. Just as in human-human dialog, grounding the conversation, avoiding explicit prompts, and using discourse cues enhances communication.

For example, the use of the discourse segment pop cue, “What now?”, helps to reorient users after a sub-dialog. Listen again to the fax sub-dialog within the mail application.

Our user studies demonstrated that adding this small prompt did, in fact, reorient users and help them to figure out what to say next.

Tailor Feedback

Speaker-independent, continuous speech recognition over the telephone is still quite error prone; therefore, feedback is essential. Verification should be commensurate with the cost. We implicitly verify commands which involve presentation of data, but explicitly verify commands that might destroy data or trigger future events. For example, when reading calendar events for a particular day, the verification of the date is woven into the response. Contrast that with what happens when the user says “so long.”

Design for Errors

Recognition errors are inevitable. In our user study, we found that people become frustrated very quickly with these errors, particularly if the error feedback is repetitive. One user said, “It was repetitive when it didn’t understand what I said–then it turned into a machine.”

We redesigned the error messages using a technique we call progressive assistance. After the redesign, a user said “It gave me the perception that it’s trying to understand what I’m saying.”

It is also helpful to provide a safety net to take common types of errors into account. For example, in applications that allow users to create recorded messages, users often start speaking the content of their message too soon. In the Office Monitor application, the system asks, “Do you want to leave a message?” Some users will compliantly say “Yes” or “No,” as in this example:

More often, however, users just start speaking the message instead of answering “Yes” or “No.” The Office Monitor application was designed to handle both cases. At the prompt, the system turns on both the speech recognizer and the recording mechanism. If the recognizer returns an error, the system assumes that the user spoke a message.

Taper Presentation of Data

Since speech is such a slow output medium, it is important to be as brief as possible. Tapering the presentation of repetitive data can cut down on the length of the speech output. Here is how extraneous words are eliminated when SpeechActs reads calendar appointments or the status of a stock portfolio.

On Wed Sept 28th, From 10:00 to 11:00, you have, Staff Meeting.
                  From 11:00 to 12:00,           Meeting with Bob. 
                  From 4:00 to 5:00,             Beer Bust... 

Your portfolio status as of an hour ago: 
	Sun   was trading at 28 and 3/8, down 1/4 since yesterday. 
	IBM   was         at 69 and 5/8, down 5/8. 
	Apple was         at 33 and 7/8, down 1/4.

In the recorded demo, listen to how natural it sounds when implied words are dropped after a pattern for presentation has been established.

When tapered presentations are still too long, users are able to interrupt the synthesizer with their voice or with a telephone key.

Take Personality into Account

In designing SpeechActs, we did not set out to create a computer character with personality. Our experience, however, suggests that, like it or not, people attribute personality traits to a speech-only system. In a user study conducted in July 1994, we asked participants to complete 22 tasks and to answer a set of questions. When users were asked to describe the personality of SpeechActs, comments included:

“Friendly,” “Benign,” “Quirky,” “Empty.”
“The voice was friendly.”
“The voice wasn’t warm.”
“It doesn’t have a personality.””It sounded like a computer talking to me.”
“It sounded like a stuffed shirt.”
“What detracted from it was the electronic reading of the text–the broken words…not fluid.”
“Very choppy–not a soothing tone to listen to.”
“The way it pronounces ‘Hi’ is kind of sleazy.”

“It didn’t seem like I was talking to a computer after a while.”
“It wasn’t argumentative on the negative responses. It tried to be accommodating.”
[After the system asked “Is it OK to hang up?”] “I thought that the machine had given up on me and I had reached my limit and that it wanted to do something else.”

Related Publications

Designing SpeechActs: Issues in speech user interfaces
N Yankelovich, GA Levow, M Marx
Proceedings of the SIGCHI conference on Human factors in computing systems …
How do users know what to say?
N Yankelovich
interactions 3 (6), 32-43
SpeechActs: a spoken-language framework
P Martin, F Crabbe, S Adams, E Baatz, N Yankelovich
Computer 29 (7), 33-40
Conversational speech interfaces
J Lai, N Yankelovich
The human-computer interaction handbook, 698-713
Talking vs taking: Speech access to remote computers
N Yankelovich
Conference companion on Human factors in computing systems, 275-276
Designing the user interface for speech recognition applications
A Mane, S Boyce, D Karis, N Yankelovich
Conference companion on Human factors in computing systems: common ground, 431
Using natural dialogs as the basis for speech interface design
N Yankelovich
Human Factors and Voice Interactive Systems, 255-290
Designing speech user interfaces
N Yankelovich, J Lai
CHI’99 extended abstracts on Human factors in computing systems, 124-125
Conversational speech interfaces and technologies
J Lai, CM Karat, N Yankelovich
Human-Computer Interaction: Design Issues, Solutions, and Applications, 53
SpeechActs & the design of speech interfaces
N Yankelovich
the Adjunct Proceedings of the 1994 ACM Conference on Human Factors and …
Speech user interface design challenges
S Boyce, A Mane, D Karis, N Yankelovich
CHI’97 extended abstracts on Human factors in computing systems: looking to …
N Yankelovicfi
The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies …
SpeechActs (video): a conversational speech system
N Yankelovich
Proceedings of the third ACM international conference on Multimedia, 541-542
SpeechActs: A Conversational Speech System
N Yankelovich
Speech Applications
N Yankelovich, P Martin
Fiscal 1994 project portfolio report, 26


This entry was posted on June 1, 2013 by in Speech.
%d bloggers like this: