Chapter 8

Conversational Interface Technologies

Clare-Marie Karat, John Vergo, and David Nahamoo
IBM TJ Watson Research

 

Outline

A Conceptual Framework for Conversational Interface Technologies and Application Development

User Characteristics

Native and Nonnative Speakers

Casual Versus Expert Training

Age

Physical Condition

Education

Conversational Tasks

Composition

Transcription

Transaction

Collaboration

Physical and Social Context of Use

Audio Channel and Device Characteristics

Physical Context of Interaction

Social Context

User Centered Design Approach to Conversational Technology Applications

Automatic Speech Recognition

How Does Automatic Speech Recognition (ASR) Work?

Current Capabilities and Limitations of Speech Recognition Software

User Interface Guidelines for ASR Application Design

Examples of Successful Applications of Speech Recognition Technology

Commercially Available ASR Tools, Engines, and APIs

Speech Synthesis

How Does Speech Synthesis Work?

Current Capabilities and Limitations of Speech Synthesis Software

User Interface Guidelines for Application Design Using Speech Synthesis

Examples of Successful Applications of Speech Synthesis

Commercially Available Tools, Engines, and APIs

Natural Language Processing (NLP) and Understanding (NLU)

How do NLP and NLU Work?

Mixed-Initiative Dialogue

VoiceXML

Current Capabilities and Limitations of NLU Applications

User Interface Guidelines for NLU Applications

Examples of Successful Applications of NLU

Commercially Available NLU-Related Tools, Engines, and APIs

Speaker Recognition: Verification, Identification, and Classification

What is Speaker Recognition, and How Does It Work?

Basic Capabilities and Requirements

User Interface Guidelines for Speaker Verification Applications

Examples of Successful Applications of Speech Recognition Verification Technology

Commercially Available Speaker Verification Tools, Engines, and APIs

Acknowledgements

References

 

Figures

Figure 8.1: Overview of human–computer interaction model for speech recognition.

Figure 8.2: Model of text-to-speech synthesis.

Figure 8.3: Diagram of a prototypical multimodal conversational system.