did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780780353862

Discrete-Time Processing of Speech Signals

by ; ;
  • ISBN13:

    9780780353862

  • ISBN10:

    0780353862

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 1999-10-05
  • Publisher: Wiley-IEEE Press
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $267.67 Save up to $0.34
  • Buy New
    $267.33
    Add to Cart Free Shipping Icon Free Shipping

    PRINT ON DEMAND: 2-4 WEEKS. THIS ITEM CANNOT BE CANCELLED OR RETURNED.

Supplemental Materials

What is included with this book?

Summary

Commercial applications of speech processing and recognition are fast becoming a growth industry that will shape the next decade. Now students and practicing engineers of signal processing can find in a single volume the fundamentals essential to understanding this rapidly developing field. IEEE Press is pleased to publish a classic reissue of Discrete-Time Processing of Speech Signals. Specially featured in this reissue is the addition of valuable World Wide Web links to the latest speech data references.This landmark book offers a balanced discussion of both the mathematical theory of digital speech signal processing and critical contemporary applications. The authors provide a comprehensive view of all major modern speech processing areas: speech production physiology and modeling, signal analysis techniques, coding, enhancement, quality assessment, and recognition. You will learn the principles needed to understand advanced technologies in speech processing -- from speech coding for communications systems to biomedical applications of speech analysis and recognition.Ideal for self-study or as a course text, this far-reaching reference book offers an extensive historical context for concepts under discussion, end-of-chapter problems, and practical algorithms. Discrete-Time Processing of Speech Signals is the definitive resource for students, engineers, and scientists in the speech processing field. An Instructor's Manual presenting detailed solutions to all the problems in the book is available upon request from the Wiley Makerting Department.

Author Biography

About the Authors...John R. (Jack) Deller, Jr. is professor of electrical and computer engineering at Michigan State University where he directs the Speech Processing Laboratory. He received the 1998 IEEE Signal Processing Magazine Best Paper Award and the 1997 IEEE Signal Processing Society Meritorious Service Award for his six-year service as editor in chief of the IEEE Signal Processing Magazine. Dr. Deller is the coauthor of Digital Signal Processing and the Microcontroller (Prentice Hall, 1999) and currently serves as associate editor of the IEEE Transactions on Speech and Audio Processing. He is a Fellow of the IEEE.

Table of Contents

Preface to the IEEE Edition xvii(2)
Preface xix(4)
Acronyms and Abbreviations xxiii
I Signal Processing Background 3(96)
1 Propaedeutic
3(96)
1.0 Preamble
3(3)
1.0.1 The Purpose of Chapter 1
3(1)
1.0.2 Please Read This Note on Notation
4(1)
1.0.3 For People Who Never Read Chapter 1 (and Those Who Do)
5(1)
1.1 Review of DSP Concepts and Notation
6(23)
1.1.1 "Normalized Time and Frequency"
6(3)
1.1.2 Singularity Signals
9(1)
1.1.3 Energy and Power Signals
9(1)
1.1.4 Transforms and a Few Related Concepts
10(6)
1.1.5 Windows and Frames
16(4)
1.1.6 Discrete-Time Systems
20(4)
1.1.7 Minimum, Maximum, and Mixed-Phase Signals and Systems
24(5)
1.2 Review of Probability and Stochastic Processes
29(26)
1.2.1 Probability Spaces
30(3)
1.2.2 Random Variables
33(9)
1.2.3 Random Processes
42(10)
1.2.4 Vector-Valued Random Processes
52(3)
1.3 Topics in Statistical Pattern Recognition
55(18)
1.3.1 Distance Measures
56(2)
1.3.2 The Euclidean Metric and "Prewhitening" of Features
58(5)
1.3.3 Maximum Likelihood Classification
63(3)
1.3.4 Feature Selection and Probablistic Separability Measures
66(4)
1.3.5 Clustering Algorithms
70(3)
1.4 Information and Entropy
73(6)
1.4.1 Definitions
73(4)
1.4.2 Random Sources
77(1)
1.4.3 Entropy Concepts in Pattern Recognition
78(1)
1.5 Phasors and Steady-State Solutions
79(2)
1.6 Onward to Speech Processing
81(4)
1.7 Problems
85(5)
Appendices: Supplemental Bibliography
90(9)
1.A Example Textbooks on Digital Signal Processing
90(1)
1.B Example Textbooks on Stochastic Processes
90(1)
1.C Example Textbooks on Statistical Pattern Recognition
91(1)
1.D Example Textbooks on Information Theory
91(1)
1.E Other Resources on Speech Processing
92(1)
1.E.1 Textbooks
92(1)
1.E.2 Edited Paper Collections
92(1)
1.E.3 Journals
92(1)
1.E.4 Conference Proceedings
93(1)
1.F Example Textbooks on Speech and Hearing Sciences
93(1)
1.G Other Resources on Artificial Neural Networks
94(1)
1.G.1 Textbooks and Monographs
94(1)
1.G.2 Journals
94(1)
1.G.3 Conference Proceedings
95(4)
II Speech Production and Modeling 99(126)
2 Fundamentals of Speech Science
99(52)
2.0 Preamble
99(1)
2.1 Speech Communication
100(1)
2.2 Anatomy and Physiology of the Speech Production System
101(14)
2.2.1 Anatomy
101(3)
2.2.2 The Role of the Vocal Tract and Some Elementary Acoustical Analysis
104(6)
2.2.3 Excitation of the Speech System and the Physiology of Voicing
110(5)
2.3 Phonemics and Phonetics
115(31)
2.3.1 Phonemes Versus Phones
115(1)
2.3.2 Phonemic and Phonetic Transcription
116(1)
2.3.3 Phonemic and Phonetic Classification
117(20)
2.3.4 Prosodic Features and Coarticulation
137(9)
2.4 Conclusions
146(1)
2.5 Problems
146(5)
3 Modeling Speech Production
151(74)
3.0 Preamble
151(1)
3.1 Acoustic Theory of Speech Production
151(36)
3.1.1 History
151(5)
3.1.2 Sound Propagation
156(3)
3.1.3 Source Excitation Model
159(7)
3.1.4 Vocal-Tract Modeling
166(20)
3.1.5 Models for Nasals and Fricatives
186(1)
3.2 Discrete-Time Modeling
187(13)
3.2.1 General Discrete-Time Speech Model
187(5)
3.2.2 A Discrete-Time Filter Model for Speech Production
192(5)
3.2.3 Other Speech Models
197(3)
3.3 Conclusions
200(1)
3.4 Problems
201(2)
3.A Single Lossless Tube Analysis
203(8)
3.A.1 Open and Closed Terminations
203(3)
3.A.2 Impedance Analysis, T-Network, and Two-Port Network
206(5)
3.B Two-Tube Lossless Model of the Vocal Tract
211(6)
3.C Fast Discrete-Time Transfer Function Calculation
217(8)
III Analysis Techniques 225(184)
4 Short-Term Processing of Speech
225(41)
4.1 Introduction
225(1)
4.2 Short-Term Measures from Long-Term Concepts
226(10)
4.2.1 Motivation
226(1)
4.2.2 "Frames" of Speech
227(1)
4.2.3 Approach 1 to the Derivation of a Short-Term Feature and Its Two Computational Forms
227(4)
4.2.4 Approach 2 to the Derivation of a Short-Term Feature and Its Two Computational Forms
231(3)
4.2.5 On the Role of "1/N" and Related Issues
234(2)
4.3 Example Short-Term Features and Applications
236(26)
4.3.1 Short-Term Estimates of Autocorrelation
236(8)
4.3.2 Average Magnitude Difference Function
244(1)
4.3.3 Zero Crossing Measure
245(1)
4.3.4 Short-Term Power and Energy Measures
246(5)
4.3.5 Short-Term Fourier Analysis
251(11)
4.4 Conclusions
262(1)
4.5 Problems
263(3)
5 Linear Prediction Analysis
266(86)
5.0 Preamble
266(1)
5.1 Long-Term LP Analysis by System Identification
267(13)
5.1.1 The All-Pole Model
267(3)
5.1.2 Identification of the Model
270(10)
5.2 How Good Is the LP Model?
280(10)
5.2.1 The "Ideal" and "Almost Ideal" Cases
280(1)
5.2.2 "Nonideal" Cases
281(6)
5.2.3 Summary and Further Discussion
287(3)
5.3 Short-Term LP Analysis
290(41)
5.3.1 Autocorrelation Method
290(2)
5.3.2 Covariance Method
292(4)
5.3.3 Solution Methods
296(29)
5.3.4 Gain Computation
325(2)
5.3.5 A Distance Measure for LP Coefficients
327(2)
5.3.6 Preemphasis of the Speech Waveform
329(2)
5.4 Alternative Representations of the LP Coefficients
331(2)
5.4.1 The Line Spectrum Pair
331(2)
5.4.2 Cepstral Parameters
333(1)
5.5 Applications of LP in Speech Analysis
333(9)
5.5.1 Pitch Estimation
333(3)
5.5.2 Formant Estimation and Glottal Waveform Deconvolution
336(6)
5.6 Conclusions
342(1)
5.7 Problems
343(5)
5.A Proof of Theorem 5.1
348(2)
5.B The Orthogonality Principle
350(2)
6 Cepstral Analysis
352(57)
6.1 Introduction
352(3)
6.2 "Real" Cepstrum
355(31)
6.2.1 Long-Term Real Cepstrum
355(9)
6.2.2 Short-Term Real Cepstrum
364(2)
6.2.3 Example Applications of the stRC to Speech Analysis and Recognition
366(14)
6.2.4 Other Forms and Variations on the stRC Parameters
380(6)
6.3 Complex Cepstrum
386(11)
6.3.1 Long-Term Complex Cepstrum
386(7)
6.3.2 Short-Term Complex Cepstrum
393(1)
6.3.3 Example Application of the stCC to Speech Analysis
394(3)
6.3.4 Variations on the Complex Cepstrum
397(1)
6.4 A Critical Analysis of the Cepstrum and Conclusions
397(4)
6.5 Problems
401(8)
IV Coding, Enhancement and Quality Assessment 409(192)
7 Speech Coding and Synthesis
409(92)
7.1 Introduction
410(1)
7.2 Optimum Scalar and Vector Quantization
410(24)
7.2.1 Scalar Quantization
411(14)
7.2.2 Vector Quantization
425(9)
7.3 Waveform Coding
434(25)
7.3.1 Introduction
434(1)
7.3.2 Time Domain Waveform Coding
435(16)
7.3.3 Frequency Domain Waveform Coding
451(6)
7.3.4 Vector Waveform Quantization
457(2)
7.4 Vocoders
459(29)
7.4.1 The Channel Vocoder
460(2)
7.4.2 The Phase Vocoder
462(1)
7.4.3 The Cepstral (Homomorphic) Vocoder
462(7)
7.4.4 Formant Vocoders
469(2)
7.4.5 Linear Predictive Coding
471(14)
7.4.6 Vector Quantization of Model Parameters
485(3)
7.5 Measuring the Quality of Speech Compression Techniques
488(1)
7.6 Conclusions
489(1)
7.7 Problems
490(4)
7.A Quadrature Mirror Filters
494(7)
8 Speech Enhancement
501(67)
8.1 Introduction
501(3)
8.2 Classification of Speech Enhancement Methods
504(2)
8.3 Short-Term Spectral Amplitude Techniques
506(11)
8.3.1 Introduction
506(1)
8.3.2 Spectral Subtraction
506(10)
8.3.3 Summary of Short-Term Spectral Magnitude Methods
516(1)
8.4 Speech Modeling and Wiener Filtering
517(11)
8.4.1 Introduction
517(1)
8.4.2 Iterative Wiener Filtering
517(4)
8.4.3 Speech Enhancement and All-Pole Modeling
521(3)
8.4.4 Sequential Estimation via EM Theory
524(1)
8.4.5 Constrained Iterative Enhancement
525(2)
8.4.6 Further Refinements to Iterative Enhancement
527(1)
8.4.7 Summary of Speech Modeling and Wiener Filtering
528(1)
8.5 Adaptive Noise Canceling
528(13)
8.5.1 Introduction
528(2)
8.5.2 ANC Formalities and the LMS Algorithm
530(4)
8.5.3 Applications of ANC
534(7)
8.5.4 Summary of ANC Methods
541(1)
8.6 Systems Based on Fundamental Frequency Tracking
541(11)
8.6.1 Introduction
541(1)
8.6.2 Single-Channel ANC
542(3)
8.6.3 Adaptive Comb Filtering
545(4)
8.6.4 Harmonic Selection
549(2)
8.6.5 Summary of Systems Based on Fundamental Frequency Tracking
551(1)
8.7 Performance Evaluation
552(4)
8.7.1 Introduction
552(1)
8.7.2 Enhancement and Perceptual Aspects of Speech
552(2)
8.7.3 Speech Enhancement Algorithm Performance
554(2)
8.8 Conclusions
556(1)
8.9 Problems
557(4)
8.A The INTEL System
561(4)
8.B Addressing Cross-Talk in Dual-Channel ANC
565(3)
9 Speech Quality Assessment
568(33)
9.1 Introduction
568(2)
9.1.1 The Need for Quality Assessment
568(2)
9.1.2 Quality Versus Intelligibility
570(1)
9.2 Subjective Quality Measures
570(10)
9.2.1 Intelligibility Tests
572(3)
9.2.2 Quality Tests
575(5)
9.3 Objective Quality Measures
580(13)
9.3.1 Articulation Index
582(2)
9.3.2 Signal-to-Noise Ratio
584(3)
9.3.3 Itakura Measure
587(1)
9.3.4 Other Measures Based on LP Analysis
588(1)
9.3.5 Weighted-Spectral Slope Measures
589(1)
9.3.6 Global Objective Measures
590(1)
9.3.7 Example Applications
591(2)
9.4 Objective Versus Subjective Measures
593(2)
9.5 Problems
595(6)
V Recognition 601(298)
10 The Speech Recognition Problem
601(22)
10.1 Introduction
601(5)
10.1.1 The Dream and the Reality
601(3)
10.1.2 Discovering Our Ignorance
604(1)
10.1.3 Circumventing Our Ignorance
605(1)
10.2 The "Dimensions of Difficulty"
606(14)
10.2.1 Speaker-Dependent Versus Speaker-Independent Recognition
607(1)
10.2.2 Vocabulary Size
607(1)
10.2.3 Isolated-Word Versus Continuous-Speech Recognition
608(6)
10.2.4 Linguistic Constraints
614(5)
10.2.5 Acoustic Ambiguity and Confusability
619(1)
10.2.6 Environmental Noise
620(1)
10.3 Related Problems and Approaches
620(1)
10.3.1 Knowledge Engineering
620(1)
10.3.2 Speaker Recognition and Verification
621(1)
10.4 Conclusions
621(1)
10.5 Problems
621(2)
11 Dynamic Time Warping
623(54)
11.1 Introduction
623(1)
11.2 Dynamic Programming
624(10)
11.3 Dynamic Time Warping Applied to IWR
634(17)
11.3.1 DTW Problem and Its Solution Using DP
634(4)
11.3.2 DTW Search Constraints
638(11)
11.3.3 Typical DTW Algorithm: Memory and Computational Requirements
649(2)
11.4 DTW Applied to CSR
651(21)
11.4.1 Introduction
651(1)
11.4.2 Level Building
652(8)
11.4.3 The One-Stage Algorithm
660(9)
11.4.4 A Grammar-Driven Connected-Word Recognition System
669(1)
11.4.5 Pruning and Beam Search
670(1)
11.4.6 Summary of Resource Requirements for DTW Algorithms
671(1)
11.5 Training Issues in DTW Algorithms
672(2)
11.6 Conclusions
674(1)
11.7 Problems
674(3)
12 The Hidden Markov Model
677(68)
12.1 Introduction
677(2)
12.2 Theoretical Developments
679(44)
12.2.1 Generalities
679(5)
12.2.2 The Discrete Observation HMM
684(21)
12.2.3 The Continuous Observation HMM
705(4)
12.2.4 Inclusion of State Duration Probabilities in the Discrete Observation HMM
709(6)
12.2.5 Scaling the Forward-Backward Algorithm
715(3)
12.2.6 Training with Multiple Observation Sequences
718(2)
12.2.7 Alternative Optimization Criteria in the Training of HMMs
720(2)
12.2.8 A Distance Measure for HMMs
722(1)
12.3 Practical Issues
723(11)
12.3.1 Acoustic Observations
723(1)
12.3.2 Model Structure and Size
724(4)
12.3.3 Training with Insufficient Data
728(2)
12.3.4 Acoustic Units Modeled by HMMs
730(4)
12.4 First View of Recognition Systems Based on HMMs
734(6)
12.4.1 Introduction
734(1)
12.4.2 IWR Without Syntax
735(3)
12.4.3 CSR by the Connected-Word Strategy Without Syntax
738(2)
12.4.4 Preliminary Comments on Language Modeling Using HMMs
740(1)
12.5 Problems
740(5)
13 Language Modeling
745(60)
13.1 Introduction
745(1)
13.2 Formal Tools for Linguistic Processing
746(8)
13.2.1 Formal Languages
746(3)
13.2.2 Perplexity of a Language
749(2)
13.2.3 Bottom-Up Versus Top-Down Parsing
751(3)
13.3 HMMs, Finite-State Automata, and Regular Grammars
754(5)
13.4 A "Bottom-Up" Parsing Example
759(5)
13.5 Principles of "Top-Down" Recognizers
764(15)
13.5.1 Focus on the Linguistic Decoder
764(6)
13.5.2 Focus on the Acoustic Decoder
770(2)
13.5.3 Adding Levels to the Linguistic Decoder
772(3)
13.5.4 Training the Continuous-Speech Recognizer
775(4)
13.6 Other Language Models
779(10)
13.6.1 N-Gram Statistical Models
779(6)
13.6.2 Other Formal Grammars
785(4)
13.7 IWR As "CSR"
789(1)
13.8 Standard Databases for Speech-Recognition Research
790(1)
13.9 A Survey of Language-Model-Based Systems
791(10)
13.10 Conclusions
801(1)
13.11 Problems
801(4)
14 The Artificial Neural Network
805(94)
14.1 Introduction
805(3)
14.2 The Artificial Neuron
808(5)
14.3 Network Principles and Paradigms
813(24)
14.3.1 Introduction
813(2)
14.3.2 Layered Networks: Formalities and Definitions
815(4)
14.3.3 The Multilayer Perceptron
819(15)
14.3.4 Learning Vector Quantizer
834(3)
14.4 Applications of ANNs in Speech Recognition
837(9)
14.4.1 Presegmented Speech Material
837(2)
14.4.2 Recognizing Dynamic Speech
839(2)
14.4.3 ANNs and Conventional Approaches
841(4)
14.4.4 Language Modeling Using ANNs
845(1)
14.4.5 Integration of ANNs into the Survey Systems of Section 13.9
845(1)
14.5 Conclusions
846(1)
14.6 Problems
847(52)
Index 899

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program