Clare-Marie Karat, John Vergo, and David Nahamoo
IBM TJ Watson Research
A Conceptual Framework for Conversational Interface Technologies and Application Development
User Characteristics
Native and Nonnative Speakers
Casual Versus Expert Training
Age
Physical Condition
Education
Conversational Tasks
Composition
Transcription
Transaction
Collaboration
Physical and Social Context of Use
Audio Channel and Device Characteristics
Physical Context of Interaction
Social Context
User Centered Design Approach to Conversational Technology Applications
Automatic Speech Recognition
How Does Automatic Speech Recognition (ASR) Work?
Current Capabilities and Limitations of Speech Recognition Software
User Interface Guidelines for ASR Application Design
Examples of Successful Applications of Speech Recognition Technology
Commercially Available ASR Tools, Engines, and APIs
Speech Synthesis
How Does Speech Synthesis Work?
Current Capabilities and Limitations of Speech Synthesis Software
User Interface Guidelines for Application Design Using Speech Synthesis
Examples of Successful Applications of Speech Synthesis
Commercially Available Tools, Engines, and APIs
Natural Language Processing (NLP) and Understanding (NLU)
How do NLP and NLU Work?
Mixed-Initiative Dialogue
VoiceXML
Current Capabilities and Limitations of NLU Applications
User Interface Guidelines for NLU Applications
Examples of Successful Applications of NLU
Commercially Available NLU-Related Tools, Engines, and APIs
Speaker Recognition: Verification, Identification, and Classification
What is Speaker Recognition, and How Does It Work?
Basic Capabilities and Requirements
User Interface Guidelines for Speaker Verification Applications
Examples of Successful Applications of Speech Recognition Verification Technology
Commercially Available Speaker Verification Tools, Engines, and APIs
Acknowledgements
References
Figure 8.1: Overview of human–computer interaction model for speech recognition.
Figure 8.2: Model of text-to-speech synthesis.
Figure 8.3: Diagram of a prototypical multimodal conversational system.