Series Introduction

Preface

vii

I ANALYTICAL BACKGROUND AND TECHNIQUES

(200)

Discrete-Time Signals, Systems, and Transforms

(26)

Signal Sampling

(8)

Sampling basics

(1)

Sampling theorem

(2)

Practical cases of sampling

(5)

Discrete-Time Systems and z-Transforms

(5)

Classification of systems

(3)

Fundamentals of linear time-invariant systems

(1)

z-transforms

(1)

Characterizations of Digital Filters

(4)

Filter transfer functions

(1)

Filters described by difference equations

(1)

Poles and zeros in a digital filter

(3)

Frequency Responses of Digital Filters

(3)

Frequency response as related to pole and zero locations

(1)

Digital resonator

(1)

All-pass filter

(1)

Discrete Fourier Transform

(2)

Definition of DFT

(1)

Frequency range and frequency resolution of the DFT

(1)

Zero-padding technique

(1)

Short-Time Fourier transform

(3)

Definition of STFT: two alternative views

(1)

STFT magnitude (spectrogram)

(1)

Summary

(1)

Analysis of Discrete-Time Speech Signals

(36)

Time-Frequency Analysis of Speech

(12)

Time-domain and frequency-domain properties of speech

(6)

Joint time-frequency properties of speech

(3)

Spectrogram reading

(2)

Analysis Based on Linear Predictive Coding

(9)

Least-squares estimate of LPC coefficients

(1)

Autocorrelation and covariance methods

(3)

Spectral estimation via LPC

(2)

Pre-emphasis

(1)

Choice of order of the LPC model

(1)

Summary

(1)

Cepstral Analysis of Speech

(3)

Principles

(2)

Mel-frequency cepstral coefficients

(1)

Automatic Extraction and Tracking of Speech Formants

(3)

Formants and vocal tract resonances

(2)

Formant extraction and tracking methods

(1)

Automatic Extraction of Voicing Pitch

(5)

Basics of pitch estimation methods

(1)

Time-domain F0 estimation

(1)

Short-time spectral techniques for F0 estimation

(2)

Auditory Models for Speech Analysis

(2)

Perceptual linear prediction

(1)

Other methods

(1)

Summary

(2)

Probability and Random Processes

(32)

Random Variables, Distributions, and Summary Statistics

(10)

Random variables and their distributions

(1)

Summary statistics --- expectations, moments, and covariances

(2)

Common PDF's

(5)

Common PMF's

(2)

Conditioning, Total Probability Theorem, and Bayes' Rule

(7)

Conditional probability, conditional PDF, and conditional independence

(3)

The total probability theorem

(1)

Bayes' rule and its sequential form

(3)

Conditional Expectations

(1)

Discrete-Time Random Processes

(4)

Summary statistics of a random sequence

(1)

Stationary random sequences

(2)

White sequence, Markov sequence, Gauss-Markov sequence, and Wiener sequence

(1)

Markov Chain and Hidden Markov Sequence

(7)

Markov chain as discrete-state Markov sequence

(1)

From Markov chain to hidden Markov sequence

(6)

Summary

(3)

Linear Model and Dynamic System Model

(24)

Linear Model

(3)

Canonical form of the model

(1)

Examples of the linear model

(2)

Likelihood computation

100

(1)

Time-Varying Linear Model

100

(9)

Time-varying linear predictive model

100

(2)

Markov modulated linear predictive model

102

(1)

Markov modulated linear regression model

102

(2)

Speech data and the time-varying linear models

104

(5)

Linear Dynamic System Model

109

(6)

State space formulation of the model

112

(1)

Relationship to high-dimensional linear model

113

(1)

Likelihood computation

114

(1)

Time-Varying Linear Dynamic System Model

115

(1)

From time-invariant model to time-varying model

115

(1)

Likelihood computation

116

(1)

Non-Linear Dynamic System Model

116

(4)

From linear model to nonlinear model

116

(1)

Nonlinearity and its approximations

117

(3)

Summary

120

(1)

Optimization Methods and Estimation Theory

121

(58)

Classical Optimization Techniques

122

(4)

Basic definitions and results

122

(2)

Necessary and sufficient conditions for an optimum

124

(1)

Lagrange multiplier method for constrained optimization

125

(1)

Numerical Methods for Optimization

126

(4)

Methods based on finding roots of equations

126

(2)

Methods based on gradient descent

128

(2)

Dynamic Programming Techniques for Optimization

130

(5)

Principle of optimality

131

(1)

Dynamic programming for the hidden Markov model

132

(2)

Dynamic programming for the trended hidden Markov model

134

(1)

Preliminaries of Estimation Theory

135

(8)

Cramer-Rao lower bound and minimum variance unbiased estimator

136

(2)

Example: MVU estimator for generalized linear model

138

(1)

Sufficient statistic

139

(1)

Best linear unbiased estimator

140

(2)

Method of moments

142

(1)

Least Squares Estimation

143

(6)

Basic LSE procedure

143

(1)

Least squares estimator for the linear model

144

(2)

Order-recursive least squares

146

(1)

Sequential least squares

147

(1)

Nonlinear least squares

148

(1)

Maximum Likelihood Estimation

149

(11)

Basic MLE procedure for fully observed data

149

(4)

MLE for the linear model

153

(1)

EM algorithm --- Introduction

153

(4)

EM algorithm example --- Markov modulated Poisson process

157

(3)

Estimation of Random Parameters

160

(8)

Minimum mean square error (MMSE) estimator

161

(2)

Bayesian linear model

163

(1)

General Bayesian estimators and MAP estimator

163

(2)

Linear minimum mean square error (LMMSE) estimator

165

(2)

Sequential LMMSE estimator

167

(1)

State Estimation

168

(8)

Generic Kalman filter algorithm

169

(2)

Kalman filter algorithms for the linear state-space system

171

(2)

Extended Kalman filter for nonlinear dynamic systems

173

(3)

Summary

176

(3)

Statistical Pattern Recognition

179

(22)

Bayes' Decision Theory

180

(2)

Bayes' risk and MAP decision rule

180

(1)

Practical issues

181

(1)

Minimum Classification Error Criterion for Recognizer Design

182

(2)

MCE classifier design steps

182

(1)

Optimization of classifier parameters

183

(1)

Hypothesis Testing and the Verification Problem

184

(4)

MAP decision rule and hypothesis testing

184

(1)

Verification problem in pattern recognition

185

(1)

Neymann-Pearson approach to verification

186

(1)

Bayesian approach to verification

187

(1)

Examples of Applications

188

(10)

Discriminative training for HMM

188

(2)

Discriminative training for the trended HMM

190

(3)

Discriminative feature extraction

193

(3)

Bayesian approach to verification using the Gaussian mixture model

196

(2)

Summary

198

(3)

II FUNDAMENTALS OF SPEECH SCIENCE

201

(94)

Phonetic Process

203

(60)

Introduction

203

(1)

Articulatory Phonetics and Speech Generation

203

(15)

Anatomy and physiology of the vocal tract

204

(3)

Major features of speech articulation

207

(3)

Phonemes, coarticulation, and acoustics

210

(3)

Source-filter description of speech production

213

(5)

Acoustic Models of Speech Production

218

(9)

Resonances in a nonuniform vocal tract model

218

(2)

Two-tube vowel models

220

(1)

Three-tube consonant modeling

221

(1)

Speech production involving both poles and zeros

222

(2)

Transmission line analog of the vocal tract

224

(3)

Coarticulation: Its Origins and Models

227

(5)

Effects of coarticulation

228

(1)

Coarticulation effects for different articulators

229

(1)

Invariant features

230

(1)

Effects of coarticulation on duration

231

(1)

Models for coarticulation

231

(1)

Acoustic-Phonetics and Characterization of Speech Signals

232

(6)

Acoustics of vowels

233

(2)

Diphthongs and diphthongization

235

(1)

Glides and liquids

235

(1)

Nasals

235

(1)

Fricatives

236

(1)

Stop consonants

237

(1)

Introduction to Auditory Phonetics

238

(2)

Outer ear

239

(1)

Middle ear

239

(1)

Inner ear

239

(1)

Sound Perception

240

(6)

Thresholds

241

(1)

Just-noticeable differences (JNDs)

241

(1)

Pitch perception

241

(1)

Masking

242

(1)

Critical bands

243

(1)

Nonsimultaneous (temporal) masking

243

(1)

Just-noticeable differences (JNDs) in speech

244

(1)

Timing

245

(1)

Speech Perception

246

(15)

Physical aspects of speech important for perception

246

(1)

Experiments using synthetic speech

247

(1)

Models of speech perception

248

(3)

Vowel perception

251

(2)

Consonant perception

253

(3)

Duration as a phonemic cue

256

(1)

Perception of intonational features

257

(4)

Summary

261

(2)

Phonological Process

263

(32)

Introduction

263

(1)

Phonemes: Minimal Contrastive Units of Speech Sounds

264

(2)

Phonemes and allophones

264

(1)

Phoneme identification

265

(1)

Features: Basic Units of Phonological Representation

266

(4)

Why a phonemic approach is not adequate

266

(1)

Feature systems

267

(3)

Natural classes

270

(1)

Phonological Rules Expressed by Features

270

(4)

Formalization of phonological rules

271

(1)

Common phonological rule types

271

(3)

Feature Geometry --- Internal Organization of Speech Sounds

274

(6)

Introduction

274

(1)

From linear phonology to nonlinear phonology

274

(2)

Feature hierarchy

276

(1)

Phonological rules in feature geometry

277

(3)

Articulatory Phonology

280

(8)

Articulatory gestures and task dynamics

281

(4)

Gestural scores

285

(2)

Phonological contrast in articulatory phonology

287

(1)

Syllables: External Organization of Speech Sounds

288

(5)

The representation of syllable structure

288

(2)

Phonological function of the syllable: basic phonotactic unit

290

(3)

Summary

293

(2)

III COMPUTATIONAL PHONOLOGY AND PHONETICS

295

(136)

Computational Phonology

297

(36)

Articulatory Features and a System for Their Specification

298

(4)

Cross-Tier Overlapping of Articulatory Features

302

(10)

Major and secondary articulatory features

302

(2)

Feature assimilation and overlapping examples

304

(3)

Constraining rules for feature overlapping

307

(5)

Constructing Discrete Articulatory States

312

(4)

Motivations from symbolic pronunciation modeling in speech recognition

312

(2)

Articulatory state construction

314

(2)

Use of High-Level Linguistic Constraints

316

(3)

Types of high-level linguistic constraints

316

(1)

A parser for English syllable structure

317

(2)

Implementation of Feature Overlapping Using Linguistic Constraints

319

(12)

Feature specification

320

(1)

A generator of overlapping feature bundles: Overview and examples of its output

321

(3)

Demi-syllable as the rule organizational unit

324

(2)

Phonological rule formulation

326

(5)

Summary

331

(2)

Computational Models for Speech Production

333

(50)

Introduction and Overview of Speech Production Modeling

334

(3)

Two types of speech production modeling and research

334

(2)

Multiple levels of dynamics in human speech production

336

(1)

Modeling Acoustic Dynamics of Speech

337

(17)

Hidden Markov model viewed as a generative model for acoustic dynamics

337

(3)

From stationary state to nonstationary state

340

(1)

ML learning for the trended HMM via the EM algorithm

341

(4)

Example: Model with state-dependent polynomial trends

345

(1)

Recursively-defined acoustic trajectory model using a linear dynamic system

346

(2)

ML learning for linear dynamic system

348

(6)

Modeling Hidden Dynamics of Speech

354

(9)

Derivation of discrete-time hidden-dynamic state equation

355

(2)

Nonlinear state space formulation of hidden dynamic model

357

(1)

Task dynamics, articulatory dynamics, and vocal-tract resonance dynamics

357

(6)

Hidden Dynamic Model Implemented Using Piecewise Linear Approximation

363

(11)

Motivations and a new form of the model formulation

365

(1)

Parameter estimation algorithm

366

(8)

Likelihood-scoring algorithm

374

(1)

A Comprehensive Statistical Generative Model of the Dynamics of Casual Speech

374

(7)

Overlapping model for multi-tiered phonological construct

376

(1)

Segmental target model

377

(1)

Functional model for hidden articulatory dynamics

378

(1)

Functional model for articulatory-to-acoustic mapping

379

(2)

Summary

381

(2)

Computational Models for Auditory Speech Processing

383

(48)

A Computational Model for the Cochlear Function

384

(4)

Introduction

384

(2)

Mathematical formulation of the cochlear model

386

(2)

Frequency-Domain Solution of the Cochlear Model

388

(2)

Time-Domain Solution of the Cochlear Model

390

(2)

Stability Analysis for Time-Domain Solution of the Cochlear Model

392

(7)

Derivation of the stability condition

392

(6)

Application of the stability analysis

398

(1)

Computational Models for Inner Hair Cells and for Synapses to Auditory Nerve Fibers

399

(2)

The inner hair cell model

399

(1)

The synapse model

399

(2)

Interval-Based Speech Feature Extraction from the Cochlear Model Outputs

401

(4)

Inter-peak interval histogram construction

401

(1)

Matching neural and modeled IPIHs for tuning BM-model's parameters

402

(3)

Interval-Histogram Representation for the Speech Sound in Quiet and in Noise

405

(5)

Inter-peak interval histograms for clean speech

406

(2)

Inter-peak interval histograms for noisy speech

408

(2)

Computational Models for Network Structures in the Auditory Pathway

410

(18)

Introduction

411

(2)

Modeling action potential generation in the auditory nerve

413

(2)

Neural-network models central to the auditory nerve

415

(8)

Model simulation with speech inputs

423

(2)

Discussion

425

(3)

Summary

428

(3)

IV SPEECH TECHNOLOGY IN SELECTED AREAS

431

(150)

Speech Recognition

433

(78)

Introduction

433

(4)

The speech recognition problem

434

(1)

ASR system specifications

435

(1)

Dimensions of difficulty

436

(1)

Evaluation measures for speech recognizers

437

(1)

Mathematical Formulation of Speech Recognition

437

(3)

A fundamental equation

437

(1)

Acoustic model, language model, and sequential optimization

438

(1)

Differentially weighting acoustic and language models

439

(1)

Word insertion penalty factor

439

(1)

Acoustic Pre-Processor

440

(3)

What is acoustic pre-processing

440

(1)

Some common acoustic pre-processors

441

(2)

Use of HMMs in Acoustic Modeling

443

(3)

HMMs in ASR applications

443

(1)

Relationships between HMM states and speech units

444

(1)

Construction of context-dependent HMMs

444

(1)

Some advantages of the HMM formulation for ASR

445

(1)

Use of Higher-Order Statistical Models in Acoustic Modeling

446

(5)

Why higher-order models are needed

446

(1)

Stochastic segment models for speech acoustics

447

(2)

Super-segmental, hidden dynamic models

449

(1)

Higher-order pronunciation models

450

(1)

Case Study I: Speech Recognition Using a Hidden Dynamic Model

451

(12)

Model overview

452

(1)

Model formulation

453

(2)

Learning model parameters

455

(4)

Likelihood-scoring algorithm

459

(1)

Experiments on spontaneous speech recognition

459

(4)

Case Study II: Speech Recognition Using HMMs Structured by Locus Equations

463

(15)

Model overview

463

(1)

Model formulation

464

(2)

Learning locus-HMM parameters

466

(8)

Phonetic classification experiments

474

(4)

Robustness of Acoustic Modeling and Recognizer Design

478

(4)

Introduction

478

(1)

Model-space robustness by adaptation

479

(2)

Adaptive training

481

(1)

Case Study III: MAP Approach to Speaker Adaptation Using Trended HMMs

482

(9)

Derivation of MAP estimates for the trended HMM

483

(4)

Speaker adaptation experiments

487

(4)

Case Study IV: Bayesian Adaptive Training for Compensating Acoustic Variability

491

(13)

Background

492

(2)

Overview of the compensation strategy

494

(1)

Bayesian adaptive training algorithm

495

(2)

Robust decoding using Bayesian predictive classification

497

(3)

Experiments on spontaneous speech recognition

500

(4)

Statistical Language Modeling

504

(5)

Introduction

504

(1)

N-gram language modeling

504

(2)

Decision-tree language modeling

506

(1)

Context-free grammar as a language model

506

(1)

Maximum-entropy language modeling

507

(1)

Adaptive language modeling

508

(1)

Summary

509

(2)

Speech Enhancement

511

(48)

Introduction

511

(1)

Classification of Basic Techniques for Speech Enhancement

512

(2)

Classification by what and how information is used

512

(1)

Classification by waveform or feature as the output

512

(1)

Classification by single or multiple sensors

513

(1)

Classification by the general approaches employed

513

(1)

Spectral Subtraction

514

(2)

Wiener Filtering

516

(1)

Use of HMM as the Prior Model for Speech Enhancement

517

(6)

Training AR-HMMs for clean speech and for noise

518

(1)

The MAP enhancement technique

518

(1)

The approximate MAP enhancement technique

519

(1)

The MMSE enhancement technique

520

(2)

Noise adaptation

522

(1)

Case Study I: Implementation and Evaluation of HMM-Based MMSE Enhancement

523

(6)

Double pruning the MMSE filter weights

524

(1)

PDF approximation for noisy speech

524

(2)

Overview of speech enhancement system and experiments

526

(1)

Enhancement results using SNR as an evaluation measure

527

(2)

Enhancement results using subjective evaluation

529

(1)

Case Study II: Use of the Trended HMM for Speech Enhancement

529

(15)

Formulation of the prior model

530

(1)

Derivation of the MMSE estimator using the prior model

531

(5)

Implementation of the MMSE enhancement technique

536

(2)

Approximate MMSE enhancement technique

538

(1)

Diagnostic experiments

539

(4)

Speech waveform enhancement results

543

(1)

Use of Speech Feature Enhancement for Robust Speech Recognition

544

(13)

Roles of speech enhancement in feature-space robust ASR

544

(1)

A statistical model for log-domain acoustic distortion

545

(3)

Use of prior models for clean speech and for noise

548

(2)

Use of the MMSE estimator

550

(1)

MMSE estimator with prior speech model of static features

551

(2)

Estimation with prior speech model for joint static and dynamic features

553

(2)

Implementation issues

555

(2)

Summary

557

(2)

Speech Synthesis

559

(22)

Introduction

559

(2)

Basic Approaches

561

(1)

Choice of Units

562

(1)

Synthesis Methods

563

(8)

Articulatory method for speech synthesis

564

(1)

Spectral method for speech synthesis

565

(4)

Waveform methods for speech synthesis

569

(2)

Phase mismatch

571

(1)

Databases

571

(1)

Case Study: Automatic Unit Selection for Waveform Speech Synthesis

572

(3)

Intonation

575

(2)

Text Processing

577

(2)

Evaluation of Speech Synthesis Output

579

(1)

Summary

580

(1)

References

581

(39)

Index

620

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Speech Processing: A Dynamic and Optimization-Oriented Approach

9780824740405

0824740408

Supplemental Materials

Summary

Table of Contents

Supplemental Materials

Rewards Program