Preface to the IEEE Edition

xvii

(2)

Preface

xix

(4)

Acronyms and Abbreviations

xxiii

I Signal Processing Background

(96)

1 Propaedeutic

(96)

1.0 Preamble

(3)

1.0.1 The Purpose of Chapter 1

(1)

1.0.2 Please Read This Note on Notation

(1)

1.0.3 For People Who Never Read Chapter 1 (and Those Who Do)

(1)

1.1 Review of DSP Concepts and Notation

(23)

1.1.1 "Normalized Time and Frequency"

(3)

1.1.2 Singularity Signals

(1)

1.1.3 Energy and Power Signals

(1)

1.1.4 Transforms and a Few Related Concepts

(6)

1.1.5 Windows and Frames

(4)

1.1.6 Discrete-Time Systems

(4)

1.1.7 Minimum, Maximum, and Mixed-Phase Signals and Systems

(5)

1.2 Review of Probability and Stochastic Processes

(26)

1.2.1 Probability Spaces

(3)

1.2.2 Random Variables

(9)

1.2.3 Random Processes

(10)

1.2.4 Vector-Valued Random Processes

(3)

1.3 Topics in Statistical Pattern Recognition

(18)

1.3.1 Distance Measures

(2)

1.3.2 The Euclidean Metric and "Prewhitening" of Features

(5)

1.3.3 Maximum Likelihood Classification

(3)

1.3.4 Feature Selection and Probablistic Separability Measures

(4)

1.3.5 Clustering Algorithms

(3)

1.4 Information and Entropy

(6)

1.4.1 Definitions

(4)

1.4.2 Random Sources

(1)

1.4.3 Entropy Concepts in Pattern Recognition

(1)

1.5 Phasors and Steady-State Solutions

(2)

1.6 Onward to Speech Processing

(4)

1.7 Problems

(5)

Appendices: Supplemental Bibliography

(9)

1.A Example Textbooks on Digital Signal Processing

(1)

1.B Example Textbooks on Stochastic Processes

(1)

1.C Example Textbooks on Statistical Pattern Recognition

(1)

1.D Example Textbooks on Information Theory

(1)

1.E Other Resources on Speech Processing

(1)

1.E.1 Textbooks

(1)

1.E.2 Edited Paper Collections

(1)

1.E.3 Journals

(1)

1.E.4 Conference Proceedings

(1)

1.F Example Textbooks on Speech and Hearing Sciences

(1)

1.G Other Resources on Artificial Neural Networks

(1)

1.G.1 Textbooks and Monographs

(1)

1.G.2 Journals

(1)

1.G.3 Conference Proceedings

(4)

II Speech Production and Modeling

(126)

2 Fundamentals of Speech Science

(52)

2.0 Preamble

(1)

2.1 Speech Communication

100

(1)

2.2 Anatomy and Physiology of the Speech Production System

101

(14)

2.2.1 Anatomy

101

(3)

2.2.2 The Role of the Vocal Tract and Some Elementary Acoustical Analysis

104

(6)

2.2.3 Excitation of the Speech System and the Physiology of Voicing

110

(5)

2.3 Phonemics and Phonetics

115

(31)

2.3.1 Phonemes Versus Phones

115

(1)

2.3.2 Phonemic and Phonetic Transcription

116

(1)

2.3.3 Phonemic and Phonetic Classification

117

(20)

2.3.4 Prosodic Features and Coarticulation

137

(9)

2.4 Conclusions

146

(1)

2.5 Problems

146

(5)

3 Modeling Speech Production

151

(74)

3.0 Preamble

151

(1)

3.1 Acoustic Theory of Speech Production

151

(36)

3.1.1 History

151

(5)

3.1.2 Sound Propagation

156

(3)

3.1.3 Source Excitation Model

159

(7)

3.1.4 Vocal-Tract Modeling

166

(20)

3.1.5 Models for Nasals and Fricatives

186

(1)

3.2 Discrete-Time Modeling

187

(13)

3.2.1 General Discrete-Time Speech Model

187

(5)

3.2.2 A Discrete-Time Filter Model for Speech Production

192

(5)

3.2.3 Other Speech Models

197

(3)

3.3 Conclusions

200

(1)

3.4 Problems

201

(2)

3.A Single Lossless Tube Analysis

203

(8)

3.A.1 Open and Closed Terminations

203

(3)

3.A.2 Impedance Analysis, T-Network, and Two-Port Network

206

(5)

3.B Two-Tube Lossless Model of the Vocal Tract

211

(6)

3.C Fast Discrete-Time Transfer Function Calculation

217

(8)

III Analysis Techniques

225

(184)

4 Short-Term Processing of Speech

225

(41)

4.1 Introduction

225

(1)

4.2 Short-Term Measures from Long-Term Concepts

226

(10)

4.2.1 Motivation

226

(1)

4.2.2 "Frames" of Speech

227

(1)

4.2.3 Approach 1 to the Derivation of a Short-Term Feature and Its Two Computational Forms

227

(4)

4.2.4 Approach 2 to the Derivation of a Short-Term Feature and Its Two Computational Forms

231

(3)

4.2.5 On the Role of "1/N" and Related Issues

234

(2)

4.3 Example Short-Term Features and Applications

236

(26)

4.3.1 Short-Term Estimates of Autocorrelation

236

(8)

4.3.2 Average Magnitude Difference Function

244

(1)

4.3.3 Zero Crossing Measure

245

(1)

4.3.4 Short-Term Power and Energy Measures

246

(5)

4.3.5 Short-Term Fourier Analysis

251

(11)

4.4 Conclusions

262

(1)

4.5 Problems

263

(3)

5 Linear Prediction Analysis

266

(86)

5.0 Preamble

266

(1)

5.1 Long-Term LP Analysis by System Identification

267

(13)

5.1.1 The All-Pole Model

267

(3)

5.1.2 Identification of the Model

270

(10)

5.2 How Good Is the LP Model?

280

(10)

5.2.1 The "Ideal" and "Almost Ideal" Cases

280

(1)

5.2.2 "Nonideal" Cases

281

(6)

5.2.3 Summary and Further Discussion

287

(3)

5.3 Short-Term LP Analysis

290

(41)

5.3.1 Autocorrelation Method

290

(2)

5.3.2 Covariance Method

292

(4)

5.3.3 Solution Methods

296

(29)

5.3.4 Gain Computation

325

(2)

5.3.5 A Distance Measure for LP Coefficients

327

(2)

5.3.6 Preemphasis of the Speech Waveform

329

(2)

5.4 Alternative Representations of the LP Coefficients

331

(2)

5.4.1 The Line Spectrum Pair

331

(2)

5.4.2 Cepstral Parameters

333

(1)

5.5 Applications of LP in Speech Analysis

333

(9)

5.5.1 Pitch Estimation

333

(3)

5.5.2 Formant Estimation and Glottal Waveform Deconvolution

336

(6)

5.6 Conclusions

342

(1)

5.7 Problems

343

(5)

5.A Proof of Theorem 5.1

348

(2)

5.B The Orthogonality Principle

350

(2)

6 Cepstral Analysis

352

(57)

6.1 Introduction

352

(3)

6.2 "Real" Cepstrum

355

(31)

6.2.1 Long-Term Real Cepstrum

355

(9)

6.2.2 Short-Term Real Cepstrum

364

(2)

6.2.3 Example Applications of the stRC to Speech Analysis and Recognition

366

(14)

6.2.4 Other Forms and Variations on the stRC Parameters

380

(6)

6.3 Complex Cepstrum

386

(11)

6.3.1 Long-Term Complex Cepstrum

386

(7)

6.3.2 Short-Term Complex Cepstrum

393

(1)

6.3.3 Example Application of the stCC to Speech Analysis

394

(3)

6.3.4 Variations on the Complex Cepstrum

397

(1)

6.4 A Critical Analysis of the Cepstrum and Conclusions

397

(4)

6.5 Problems

401

(8)

IV Coding, Enhancement and Quality Assessment

409

(192)

7 Speech Coding and Synthesis

409

(92)

7.1 Introduction

410

(1)

7.2 Optimum Scalar and Vector Quantization

410

(24)

7.2.1 Scalar Quantization

411

(14)

7.2.2 Vector Quantization

425

(9)

7.3 Waveform Coding

434

(25)

7.3.1 Introduction

434

(1)

7.3.2 Time Domain Waveform Coding

435

(16)

7.3.3 Frequency Domain Waveform Coding

451

(6)

7.3.4 Vector Waveform Quantization

457

(2)

7.4 Vocoders

459

(29)

7.4.1 The Channel Vocoder

460

(2)

7.4.2 The Phase Vocoder

462

(1)

7.4.3 The Cepstral (Homomorphic) Vocoder

462

(7)

7.4.4 Formant Vocoders

469

(2)

7.4.5 Linear Predictive Coding

471

(14)

7.4.6 Vector Quantization of Model Parameters

485

(3)

7.5 Measuring the Quality of Speech Compression Techniques

488

(1)

7.6 Conclusions

489

(1)

7.7 Problems

490

(4)

7.A Quadrature Mirror Filters

494

(7)

8 Speech Enhancement

501

(67)

8.1 Introduction

501

(3)

8.2 Classification of Speech Enhancement Methods

504

(2)

8.3 Short-Term Spectral Amplitude Techniques

506

(11)

8.3.1 Introduction

506

(1)

8.3.2 Spectral Subtraction

506

(10)

8.3.3 Summary of Short-Term Spectral Magnitude Methods

516

(1)

8.4 Speech Modeling and Wiener Filtering

517

(11)

8.4.1 Introduction

517

(1)

8.4.2 Iterative Wiener Filtering

517

(4)

8.4.3 Speech Enhancement and All-Pole Modeling

521

(3)

8.4.4 Sequential Estimation via EM Theory

524

(1)

8.4.5 Constrained Iterative Enhancement

525

(2)

8.4.6 Further Refinements to Iterative Enhancement

527

(1)

8.4.7 Summary of Speech Modeling and Wiener Filtering

528

(1)

8.5 Adaptive Noise Canceling

528

(13)

8.5.1 Introduction

528

(2)

8.5.2 ANC Formalities and the LMS Algorithm

530

(4)

8.5.3 Applications of ANC

534

(7)

8.5.4 Summary of ANC Methods

541

(1)

8.6 Systems Based on Fundamental Frequency Tracking

541

(11)

8.6.1 Introduction

541

(1)

8.6.2 Single-Channel ANC

542

(3)

8.6.3 Adaptive Comb Filtering

545

(4)

8.6.4 Harmonic Selection

549

(2)

8.6.5 Summary of Systems Based on Fundamental Frequency Tracking

551

(1)

8.7 Performance Evaluation

552

(4)

8.7.1 Introduction

552

(1)

8.7.2 Enhancement and Perceptual Aspects of Speech

552

(2)

8.7.3 Speech Enhancement Algorithm Performance

554

(2)

8.8 Conclusions

556

(1)

8.9 Problems

557

(4)

8.A The INTEL System

561

(4)

8.B Addressing Cross-Talk in Dual-Channel ANC

565

(3)

9 Speech Quality Assessment

568

(33)

9.1 Introduction

568

(2)

9.1.1 The Need for Quality Assessment

568

(2)

9.1.2 Quality Versus Intelligibility

570

(1)

9.2 Subjective Quality Measures

570

(10)

9.2.1 Intelligibility Tests

572

(3)

9.2.2 Quality Tests

575

(5)

9.3 Objective Quality Measures

580

(13)

9.3.1 Articulation Index

582

(2)

9.3.2 Signal-to-Noise Ratio

584

(3)

9.3.3 Itakura Measure

587

(1)

9.3.4 Other Measures Based on LP Analysis

588

(1)

9.3.5 Weighted-Spectral Slope Measures

589

(1)

9.3.6 Global Objective Measures

590

(1)

9.3.7 Example Applications

591

(2)

9.4 Objective Versus Subjective Measures

593

(2)

9.5 Problems

595

(6)

V Recognition

601

(298)

10 The Speech Recognition Problem

601

(22)

10.1 Introduction

601

(5)

10.1.1 The Dream and the Reality

601

(3)

10.1.2 Discovering Our Ignorance

604

(1)

10.1.3 Circumventing Our Ignorance

605

(1)

10.2 The "Dimensions of Difficulty"

606

(14)

10.2.1 Speaker-Dependent Versus Speaker-Independent Recognition

607

(1)

10.2.2 Vocabulary Size

607

(1)

10.2.3 Isolated-Word Versus Continuous-Speech Recognition

608

(6)

10.2.4 Linguistic Constraints

614

(5)

10.2.5 Acoustic Ambiguity and Confusability

619

(1)

10.2.6 Environmental Noise

620

(1)

10.3 Related Problems and Approaches

620

(1)

10.3.1 Knowledge Engineering

620

(1)

10.3.2 Speaker Recognition and Verification

621

(1)

10.4 Conclusions

621

(1)

10.5 Problems

621

(2)

11 Dynamic Time Warping

623

(54)

11.1 Introduction

623

(1)

11.2 Dynamic Programming

624

(10)

11.3 Dynamic Time Warping Applied to IWR

634

(17)

11.3.1 DTW Problem and Its Solution Using DP

634

(4)

11.3.2 DTW Search Constraints

638

(11)

11.3.3 Typical DTW Algorithm: Memory and Computational Requirements

649

(2)

11.4 DTW Applied to CSR

651

(21)

11.4.1 Introduction

651

(1)

11.4.2 Level Building

652

(8)

11.4.3 The One-Stage Algorithm

660

(9)

11.4.4 A Grammar-Driven Connected-Word Recognition System

669

(1)

11.4.5 Pruning and Beam Search

670

(1)

11.4.6 Summary of Resource Requirements for DTW Algorithms

671

(1)

11.5 Training Issues in DTW Algorithms

672

(2)

11.6 Conclusions

674

(1)

11.7 Problems

674

(3)

12 The Hidden Markov Model

677

(68)

12.1 Introduction

677

(2)

12.2 Theoretical Developments

679

(44)

12.2.1 Generalities

679

(5)

12.2.2 The Discrete Observation HMM

684

(21)

12.2.3 The Continuous Observation HMM

705

(4)

12.2.4 Inclusion of State Duration Probabilities in the Discrete Observation HMM

709

(6)

12.2.5 Scaling the Forward-Backward Algorithm

715

(3)

12.2.6 Training with Multiple Observation Sequences

718

(2)

12.2.7 Alternative Optimization Criteria in the Training of HMMs

720

(2)

12.2.8 A Distance Measure for HMMs

722

(1)

12.3 Practical Issues

723

(11)

12.3.1 Acoustic Observations

723

(1)

12.3.2 Model Structure and Size

724

(4)

12.3.3 Training with Insufficient Data

728

(2)

12.3.4 Acoustic Units Modeled by HMMs

730

(4)

12.4 First View of Recognition Systems Based on HMMs

734

(6)

12.4.1 Introduction

734

(1)

12.4.2 IWR Without Syntax

735

(3)

12.4.3 CSR by the Connected-Word Strategy Without Syntax

738

(2)

12.4.4 Preliminary Comments on Language Modeling Using HMMs

740

(1)

12.5 Problems

740

(5)

13 Language Modeling

745

(60)

13.1 Introduction

745

(1)

13.2 Formal Tools for Linguistic Processing

746

(8)

13.2.1 Formal Languages

746

(3)

13.2.2 Perplexity of a Language

749

(2)

13.2.3 Bottom-Up Versus Top-Down Parsing

751

(3)

13.3 HMMs, Finite-State Automata, and Regular Grammars

754

(5)

13.4 A "Bottom-Up" Parsing Example

759

(5)

13.5 Principles of "Top-Down" Recognizers

764

(15)

13.5.1 Focus on the Linguistic Decoder

764

(6)

13.5.2 Focus on the Acoustic Decoder

770

(2)

13.5.3 Adding Levels to the Linguistic Decoder

772

(3)

13.5.4 Training the Continuous-Speech Recognizer

775

(4)

13.6 Other Language Models

779

(10)

13.6.1 N-Gram Statistical Models

779

(6)

13.6.2 Other Formal Grammars

785

(4)

13.7 IWR As "CSR"

789

(1)

13.8 Standard Databases for Speech-Recognition Research

790

(1)

13.9 A Survey of Language-Model-Based Systems

791

(10)

13.10 Conclusions

801

(1)

13.11 Problems

801

(4)

14 The Artificial Neural Network

805

(94)

14.1 Introduction

805

(3)

14.2 The Artificial Neuron

808

(5)

14.3 Network Principles and Paradigms

813

(24)

14.3.1 Introduction

813

(2)

14.3.2 Layered Networks: Formalities and Definitions

815

(4)

14.3.3 The Multilayer Perceptron

819

(15)

14.3.4 Learning Vector Quantizer

834

(3)

14.4 Applications of ANNs in Speech Recognition

837

(9)

14.4.1 Presegmented Speech Material

837

(2)

14.4.2 Recognizing Dynamic Speech

839

(2)

14.4.3 ANNs and Conventional Approaches

841

(4)

14.4.4 Language Modeling Using ANNs

845

(1)

14.4.5 Integration of ANNs into the Survey Systems of Section 13.9

845

(1)

14.5 Conclusions

846

(1)

14.6 Problems

847

(52)

Index

899

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Discrete-Time Processing of Speech Signals

9780780353862

0780353862

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program