did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780471351542

Speech and Audio Signal Processing : Processing and Perception of Speech and Music

by ;
  • ISBN13:

    9780471351542

  • ISBN10:

    0471351547

  • Format: Paperback
  • Copyright: 1999-08-01
  • Publisher: Wiley
  • Purchase Benefits
List Price: $126.40

Summary

Speech and music are the most basic means of adult human communication. As technology advances and increasingly sophisticated tools become available to use with speech and music signals, scientists can study these sounds more effectively, and invent new ways of applying them for the benefit of humankind. This book includes coverage of the physiology and psychoacoustics of hearing as well as the results from research on pitch and speech perception, vocoding methods and information on many aspects of automatic speech recognition (ASR) systems. The authors have made use of their own research in these fields, as well as the methods and results of many other contributors.

Table of Contents

Introduction
1(8)
Why We Wrote This Book
1(1)
How to Use This Book
2(2)
A Confession
4(1)
Acknowledgments
4(5)
PART I HISTORICAL BACKGROUND
Synthetic Audio: A Brief History
9(11)
von Kempelen
9(1)
The Voder
9(2)
Teaching the Operator to Make the Voder ``Talk''
11(3)
Speech Synthesis after the Voder
14(1)
Music Machines
14(3)
Exercises
17(3)
Speech Analysis and Synthesis Overview
20(19)
Background
20(4)
Transmission of Acoustic Signals
20(1)
Acoustical Telegraphy before Morse Code
21(1)
The Telephone
22(1)
The Channel Vocoder and Bandwidth Compression
22(2)
Voice-Coding Concepts
24(4)
Homer Dudley (1898--1981)
28(7)
Exercises
35(1)
Appendix: Hearing of the Fall of Troy
36(3)
Brief History of Automatic Speech Recognition
39(17)
Radio Rex
39(1)
Digit Recognition
40(2)
Speech Recognition in the 1950s
42(1)
The 1960s
42(3)
Short-Term Spectral Analysis
44(1)
Pattern Matching
44(1)
1971--1976 ARPA Project
45(1)
Achieved by 1976
45(1)
The 1980s in Automatic Speech Recognition
46(4)
Large Corpora Collection
46(1)
Front Ends
47(1)
Hidden Markov Models
47(1)
The Second (D) ARPA Speech-Recognition Program
48(1)
The Return of Neural Nets
49(1)
Knowledge-Based Approaches
50(1)
Recent Work
50(1)
Some Lessons
51(1)
Exercises
52(4)
Speech-Recognition Overview
56(13)
Why Study Automatic Speech Recognition?
56(1)
Why is Automatic Speech Recognition Hard?
57(2)
Automatic Speech Recognition Dimensions
59(2)
Task Parameters
59(2)
Sample Domain: Letters of the Alphabet
61(1)
Components of Automatic Speech Recognition
61(3)
Final Comments
64(1)
Exercises
65(4)
PART II MATHEMATICAL BACKGROUND
Digital Signal Processing
69(14)
Introduction
69(1)
The z Transform
69(1)
Inverse z Transform
70(1)
Convolution
71(1)
Sampling
72(1)
Linear Difference Equations
73(1)
First-Order Linear Difference Equations
74(1)
Resonance
75(4)
Concluding Comments
79(1)
Exercises
79(4)
Digital Filters and Discrete Fourier Transform
83(20)
Introduction
83(1)
Filtering Concepts
84(4)
Useful Filter Functions
88(2)
Transformations for Digital Filter Design
90(1)
Digital Filter Design with Bilinear Transformation
91(1)
The Discrete Fourier Transform
92(3)
Fast Fourier Transform Methods
95(3)
Relation Between the DFT and Digital Filters
98(2)
Exercises
100(3)
Pattern Classification
103(16)
Introduction
103(2)
Feature Extraction
105(2)
Some Opinions
106(1)
Pattern-Classification Methods
107(6)
Minimum Distance Classifiers
107(2)
Discriminant Functions
109(1)
Generalized Discriminators
110(3)
Exercises
113(1)
Appendix: Multilayer Perception Training
114(5)
Definitions
114(1)
Derivation
115(4)
Statistical Pattern Classification
119(18)
Introduction
119(1)
A Few Definitions
119(1)
Class-Related Probability Function
120(1)
Minimum Error Classification
121(1)
Likelihood-Based MAP Classification
122(1)
Approximating a Bayes Classifier
123(2)
Statistically Based Linear Discriminants
125(1)
Discussion
126(1)
Iterative Training: The EM Algorithm
126(6)
Discussion
131(1)
Exercises
132(5)
PART III ACOUSTICS
Wave Basics
137(11)
Introduction
137(1)
The Wave Equation for the Vibrating String
137(2)
Discrete-Time Traveling Waves
139(1)
Boundary Conditions and Discrete Traveling Waves
140(1)
Standing Waves
140(1)
Discrete-Time Models of Acoustic Tubes
141(2)
Acoustic Tube Resonances
143(1)
Relation of Acoustic Tube Resonances to Observed Formant Frequencies
144(2)
Exercises
146(2)
Acoustic Tube Modeling of Speech Production
148(6)
Introduction
148(1)
Acoustic Tube Models of English Phonemes
148(4)
Excitation Mechanisms in Speech Production
152(1)
Exercises
153(1)
Music Production
154(21)
Introduction
154(1)
Sequence of Steps in a Plucked or Bowed String Instrument
155(1)
Vibrations of the Bowed String
155(1)
Frequency-Response Measurements of the Bridge of a Violin
156(3)
Vibrations of the Body of String Instruments: Measurement Methods
159(4)
Rediation Pattern of Bowed String Instruments
163(2)
Some Considerations in Piano Design
165(6)
Brief Discussion of the Trumpet, Trombone, French Horn, and Tuba
171(2)
Exercises
173(2)
Room Acoustics
175(14)
Sound Waves
175(4)
One-Dimensional Wave Equation
176(1)
Spherical Wave Equation
177(1)
Intensity
177(1)
Decibel Sound Levels
178(1)
Typical Power Sources
178(1)
Sound Waves in Rooms
179(5)
Acoustic Reverberation
180(3)
Early Reflections
183(1)
Room Acoustics as a Component in Speech Systems
184(1)
Exercises
185(4)
PART IV AUDITORY PERCEPTION
Ear Physiology
189(16)
Introduction
189(1)
Anatomical Pathways from the Ear to the Perception of Sound
189(2)
The Peripheral Auditory System
191(1)
Hair Cell and Auditory Nerve Functions
192(2)
Properties of the Auditory Nerve
194(7)
Summary and Block Diagram of the Peripheral Auditory System
201(2)
Exercises
203(2)
Psychoacoustics
205(9)
Introduction
205(1)
Sound-Pressure Level and Loudness
206(2)
Frequency Analysis and Critical Bands
208(2)
Masking
210(2)
Summary
212(1)
Exercises
213(1)
Models of Pitch Perception
214(14)
Introduction
214(1)
Historical Review of Pitch-Perception Models
214(5)
Physiological Exploration of Place Versus Periodicity
219(1)
Results from Psychoacoustic Testing and Models
220(4)
Summary
224(2)
Exercises
226(2)
Speech Perception
228(18)
Introduction
228(1)
Vowel Perception: Psychoacoustics and Physiology
228(3)
The Confusion Matrix
231(3)
Perceptual Cues for Plosives
234(1)
Physiological Studies of Two Voiced Plosives
235(2)
Motor Theories of Speech Perception
237(2)
Neural Firing Patterns for Connected Speech Stimuli
239(1)
Concluding Thoughts
240(3)
Exercises
243(3)
Human Speech Recognition
246(11)
Introduction
246(1)
The Articulation Index and Human Recognition
246(2)
The Big Idea
246(1)
The Experiments
247(1)
Discussion
248(1)
Comparisons between Human and Machine Speech Recognizers
248(4)
Concluding Thoughts
252(1)
Exercises
253(4)
PART V SPEECH FEATURES
The Auditory System as a Filter Bank
257(14)
Introduction
257(1)
Review of Fletcher's Critical Band Experiments
257(2)
Relation Between Threshold Measurements and Hypothesized Filter Shapes
259(5)
Gamma-Tone Filters, Roex Filters, and Auditory Models
264(2)
Other Considerations in Filter-Bank Design
266(2)
Speech Spectrum Analysis Using the FFT
268(1)
Conclusions
269(1)
Exercises
269(2)
The Cepstrum as a Spectral Analyzer
271(9)
Introduction
271(1)
A Historical Note
271(1)
The Real Cepstrum
272(1)
The Complex Cepstrum
273(2)
Application of Cepstral Analysis to Speech Signals
275(2)
Concluding Thoughts
277(1)
Exercises
278(2)
Linear Prediction
280(15)
Introduction
280(1)
The Predictive Model
280(4)
Properties of the Representation
284(2)
Getting the Coefficients
286(2)
Related Representations
288(1)
Concluding Discussion
289(2)
Exercises
291(4)
PART VI AUTOMATIC SPEECH RECOGNITION
Feature Extraction for ASR
295(14)
Introduction
295(1)
Common Feature Vectors
295(5)
Dynamic Features
300(1)
Strategies for Robustness
300(5)
Robustness to Convolutional Error
300(4)
Robustness to Additive Noise
304(1)
Caveats
304(1)
Auditory Models
305(1)
Multichannel Input
305(1)
Discussion
306(1)
Exercises
306(3)
Linguistic Categories for Speech Recognition
309(15)
Introduction
309(1)
Phones and Phonemes
309(2)
Overview
309(1)
What Makes a Phone?
310(1)
What Makes a Phoneme?
310(1)
Phonetic and Phonemic Alphabets
311(1)
Articulatory Features
312(5)
Overview
312(1)
Consonants
312(4)
Vowels
316(1)
Why Use Features?
316(1)
Subword Units as Categories for ASR
317(1)
Phonological Models for ASR
317(1)
Context-Dependent Phones
318(1)
Other Subword Units
319(1)
Properties in Fluent Speech
320(1)
Phrases
320(1)
Some Issues in Phonological Modeling
320(1)
Exercises
321(3)
Deterministic Sequence Recognition for ASR
324(13)
Introduction
324(1)
Isolated Word Recognition
325(8)
Linear Time Warm
326(1)
Dynamic Time Warp
327(4)
Distances
331(1)
End-Point Detection
331(2)
Connected Word Recognition
333(1)
Segmental Approaches
334(1)
Discussion
335(1)
Exercises
336(1)
Statistical Sequence Recognition
337(14)
Introduction
337(1)
Stating the Problem
338(2)
Parametrization and Probability Estimation
340(9)
Markov Models
341(2)
Hidden Markov Model
343(1)
HMMs for Speech Recognition
344(1)
Estimation of P (X/M)
345(4)
Conclusion
349(1)
Exercises
350(1)
Statistical Model Training
351(16)
Introduction
351(1)
HMM Training
352(3)
Forward-Backward Training
355(3)
Optimal Parameters for Emission Probability Estimators
358(2)
Gaussian Density Functions
358(1)
Example: Training with Discrete Densities
359(1)
Viterbi Training
360(3)
Example: Training with Gaussian Density Functions
362(1)
Example: Training with Discrete Densities
362(1)
Local Acoustic Probability Estimators for ASR
363(1)
Discrete Probabilities
363(1)
Gaussian Densities
363(1)
Tied Mixtures of Gaussians
364(1)
Independent Mixtures of Gaussians
364(1)
Neural Networks
364(1)
Initialization
364(1)
Smoothing
365(1)
Conclusion
366(1)
Exercises
366(1)
Discriminant Acoustic Probability Estimation
367(13)
Introduction
367(1)
Discriminant Training
368(6)
Maximum Mutual Information
369(1)
Corrective Training
369(1)
Generalized Probabilistic Descent
370(1)
Direct Estimation of Posteriors
371(3)
HMM--ANN Based ASR
374(2)
MLP Architecture
374(1)
MLP Training
374(1)
Embedded Training
375(1)
Other Applications of ANNs to ASR
376(1)
Exercises
377(1)
Appendix: Posterior Probability Proof
377(3)
Speech Recognition and Understanding
380(15)
Introduction
380(1)
Phonological Models
381(2)
Language Models
383(4)
n-Gram Statistics
385(1)
Smoothing
386(1)
Decoding with Acoustic and Language Models
387(1)
A Complete System
388(1)
Accepting Realistic Input
389(2)
Concluding Comments
391(4)
PART VII SYNTHESIS AND CODING
Speech Synthesis
395(20)
Introduction
395(1)
Parametric Sources--Filter Synthesis
396(7)
Formant Synthesizers
397(2)
Other Source--Filter Synthesizer Structures
399(3)
Talking Chips
402(1)
Concatenative Methods
403(2)
Speculation
405(1)
Exercises
406(1)
Appendix: Synthesizer Examples
406(4)
The Klatt Recordings
406(1)
Development of Speech Synthesizers
407(2)
Segmental Synthesis by Rule
409(1)
Synthesis by Rule of Segments and Sentence Prosody
410(1)
Fully Automatic Text-to-Speech Conversion
410(5)
The van Santen Recordings
411(4)
Pitch Detection
415(16)
Introduction
415(1)
A Note on Nomenclature
415(1)
Pitch Detection Perception and Articulation
416(1)
The Voicing Decision
416(2)
Some Difficulties in Pitch Detection
418(1)
Signal Processing to Improve Pitch Detection
418(4)
Pattern-Recognition Methods for Pitch Detection
422(4)
Median Smoothing to Fix Errors in Pitch Estimation
426(2)
Exercises
428(3)
Vocoders
431(20)
Introduction
431(1)
Standards for Digital Speech Coding
431(1)
Design Consideration in Channel Vocoder Filter Banks
431(3)
Energy Measurements in a Channel Vocoder
434(2)
A Vocoder Design for Spectral Envelope Estimation
436(1)
Bit Saving in Channel Vocoders
436(4)
Design of the Excitation Parameters for a Channel Vocoder
440(2)
LPC Vocoders
442(1)
Cepstral Vocoders
443(1)
Design Comparisons
443(3)
Vocoder Standardization
446(1)
Exercises
447(4)
Low-Rate Vocoders
451(12)
Introduction
451(1)
The Frame-Fill Concept
452(2)
Pattern Matching or Vector Quantization
454(1)
The Kang--Coulter 600-bps Vocoder
455(1)
Segmentation Methods for Bandwidth Reduction
456(5)
Exercises
461(2)
Medium-Rate and High-Rate Vocoders
463(28)
Introduction
463(1)
Voice Excitation and Spectral Flattening
463(1)
Voice-Excited Channel Vocoder
464(2)
Voice-Excited and Error-Signal-Excited LPC Vocoders
466(2)
Waveform Coding with Predictive Methods
468(2)
Adaptive Predictive Coding of Speech
470(1)
Subband Coding
471(1)
Multipulse LPC Vocoders
472(2)
Code-Excited Linear Predictive Coding
474(4)
Modification to CELP
476(1)
Non-Gaussian Codebook Sequences
476(1)
Low-Delay CELP
476(2)
Reducing Codebook Search Time in CELP
478(7)
Filter Simplification
478(1)
Speeding Up the Search
479(2)
Multiresolution Codebook Search
481(1)
Partial Sequence Elimination
482(1)
Tree-Structured Delta Codebooks
482(1)
Adaptive Codebooks
483(1)
Linear Combination Codebooks
484(1)
Vector Sum Excited Linear Prediction
485(1)
Adaptive Transform Coding
485(1)
Conclusions
485(1)
Exercises
486(5)
PART VIII OTHER APPLICATIONS
Speech Transformations
491(16)
Introduction
491(1)
Time-Scale Modification
491(3)
Transformation without Explicit Pitch Detection
494(1)
Transformations in Analysis-Synthesis Systems
495(3)
Hybrid Systems
498(1)
Speech Modification in Phase Vocoders
498(1)
Speech Transformations without Pitch Extraction
499(3)
Frequency Compression and Gender Transformation
501(1)
The Sine Transform Coder as a Transformation Algorithm
502(2)
Voice Modification to Emulate a Target Voice
504(1)
Exercises
505(2)
Some Aspects of Computer Music Synthesis
507(14)
Introduction
507(1)
Some Examples of Acoustically Generated Musical Sounds
507(2)
Music Synthesis Concepts
509(2)
Analysis-Based Synthesis
511(3)
Other Techniques for Music Synthesis
514(2)
Reverberation
516(1)
Several Examples of Synthesis
517(2)
Exercises
519(1)
Acknowledgment
519(2)
Speaker Verification
521(10)
Introduction
521(1)
Acoustic Parameters
522(1)
Similarity Measures
523(2)
Text-Dependent Speaker Verification
525(1)
Text-Independent Speaker Verification
526(1)
Text-Prompted Speaker Verification
527(1)
Indentification, Verification, and the Decision Threshold
528(1)
Exercises
529(2)
Index 531

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program