did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780780334496

Speech Communications Human and Machine

by
  • ISBN13:

    9780780334496

  • ISBN10:

    0780334493

  • Edition: 2nd
  • Format: Hardcover
  • Copyright: 1999-11-30
  • Publisher: Wiley-IEEE Press
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $290.07 Save up to $0.45
  • Buy New
    $289.62
    Add to Cart Free Shipping Icon Free Shipping

    PRINT ON DEMAND: 2-4 WEEKS. THIS ITEM CANNOT BE CANCELLED OR RETURNED.

Supplemental Materials

What is included with this book?

Summary

"Today the wireless communications industry is heavily dependent upon advanced speech coding techniques, while the integration of personal computers and voice technology is poised for growth. In this revised and updated second edition, a timely overview of the science of speech processing helps you keep pace with these rapidly developing advances. Students of electrical engineering, along with computer scientists, systems engineers, linguists, audiologists, and psychologists, will find in this one concise volume an interdisciplinary introduction to speech communication. This reference book addresses how humans generate and interpret speech and how machines simulate human speech performance and code speech for efficient transmission. With a skillful blending of the basic principles and technical detail underlying speech communication, this broad-based book offers you essential insights into the field. You will learn state-of-the-art techniques to analyze, code, recognize, and synthesize speech. In addition, you will gain a better understanding of the limits of today's technology and an informed view of future trends for speech research. SPEECH COMMUNICATIONS brings you an integrated approach to human and machine speech production and perception that is unmatched in the field. This book is complete with up-to-date references and Web addresses that will lead you to a wealth of resources for your own research into speech communication."

Author Biography

About the Author Douglas O’Shaughnessy is a professor at INRS-Telecommunication at the University of Quebec, Montreal, Canada. He is associate editor of the Journal of the Acoustical Society of America, and just completed a four-year term as an associate editor of the IEEE Transactions on Speech and Audio Processing. Dr. O’Shaughnessy’s research team is currently working on voice dialogs in English and French. He is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), and a Fellow of the Acoustical Society of America.

Table of Contents

Preface xvii
Acknowledgments xxi
Acronyms in Speech Communications xxiii
Important Developments in Speech Communications xxv
Introduction
1(8)
What Is Speech Communication?
1(1)
Developments in Speech Communication
1(1)
Outline of the Book
2(5)
Production of Speech
3(1)
Sound Perception
3(1)
Speech Analysis
3(1)
Speech Coding
4(1)
Speech Enhancement
5(1)
Speech Synthesis
6(1)
Speech and Speaker Recognition
6(1)
Other Topics
7(2)
Review of Mathematics for Speech Processing
9(26)
Mathematical Preliminaries
9(3)
Number Representations
9(1)
Matrix Arithmetic
10(2)
Signals and Linear Systems
12(4)
Simple Signals
13(3)
Filtering and Convolution
16(1)
Frequency Analysis
16(3)
Fourier Transform
17(1)
Spectra and Correlation
18(1)
Laplace Transform: Poles and Zeros
18(1)
Circuits
19(1)
Discrete-Time Signals and Systems
20(5)
Sampling
20(2)
Frequency Transforms of Discrete-Time Signals
22(1)
Decimation and Interpolation
23(2)
Filters
25(4)
Bandpass Filters
26(1)
Digital Filters
26(1)
Difference Equations and Filter Structures
27(2)
Probability and Statistics
29(4)
Probability Densities and Histograms
30(1)
Averages and Variances
31(1)
Gaussian Probability Density
31(1)
Joint Probability
32(1)
Noise
33(1)
Summary
33(2)
Speech Production and Acoustic Phonetics
35(74)
Introduction
35(2)
Anatomy and Physiology of the Speech Organs
37(11)
The Lungs and the Thorax
38(1)
Larynx and Vocal Folds (Cords)
39(6)
Vocal Tract
45(3)
Articulatory Phonetics
48(8)
Manner of Articulation
50(2)
Structure of the Syllable
52(1)
Voicing
52(1)
Place of Articulation
53(2)
Phonemes in Other Languages
55(1)
Articulatory Models
55(1)
Acoustic Phonetics
56(12)
Spectrograms
56(1)
Vowels
57(3)
Diphthongs
60(2)
Glides and Liquids
62(2)
Nasals
64(1)
Fricatives
65(1)
Stops (Plosives)
65(2)
Variants of Normal Speech
67(1)
Acoustic Theory of Speech Production
68(20)
Acoustics of the Excitation Source
68(2)
Acoustics of the Vocal Tract
70(8)
Transmission Line Analog of the Vocal Tract
78(8)
Effects of Losses in the Vocal Tract
86(1)
Radiation at the Lips
87(1)
Model of Glottal Excitation
87(1)
Quantal Theory of Speech Production
88(1)
Practical Vocal Tract Models for Speech Analysis and Synthesis
88(7)
Articulatory Model
89(4)
Terminal-Analog Model
93(2)
Coarticulation
95(6)
Where Does Coarticulation Occur?
96(1)
Coarticulation Effects for Different Articulators
96(2)
Invariant Features
98(2)
Effects of Coarticulation on Duration
100(1)
Models for Coarticulation
100(1)
Prosody (Suprasegmentals)
101(6)
Duration
102(1)
Effects of Stress and Speaking Rate
103(1)
Fundamental Frequency (F0)
104(3)
Conclusion
107(2)
Problems
107(2)
Hearing
109(32)
Introduction
109(1)
Anatomy and Physiology of the Ear
109(10)
Outer Ear
110(1)
Middle Ear
111(1)
Inner Ear
111(2)
Basilar Membrane (BM) Behavior
113(2)
Electrical Activity in the Auditory Neurons
115(4)
Adaptation
119(1)
Sound Perception
119(12)
Auditory Psychophysics
120(1)
Thresholds
120(2)
Just-Noticeable Differences (JNDs)
122(1)
Pitch Perception
123(2)
Masking
125(2)
Critical Bands
127(1)
Nonsimultaneous or Temporal Masking
128(2)
Origins of Masking
130(1)
Release from Masking (‡)
130(1)
Sound Localization (‡)
131(1)
Response of the Ear to Complex Stimuli
131(7)
Speech Stimuli (‡)
132(1)
Masking Due to Complex Stimuli (‡)
132(1)
Adaptation (‡)
133(1)
Just-Noticeable Differences (JNDs) in Speech
134(2)
Timing
136(1)
Separating Sound Sources
137(1)
Conclusion
138(3)
Problems
138(3)
Speech Perception
141(32)
Introduction
141(1)
Perceptually-Important Features of Speech Signals
142(2)
Synthetic vs. Natural Speech
143(1)
Redundancy in Speech
143(1)
Models of Speech Perception
144(6)
Categorical Perception
144(1)
Distinctive Features
145(1)
Active Models
146(2)
Passive Models
148(2)
Vowel Perception
150(2)
Perceived Formant Location in Synthetic Vowels
150(1)
Context Normalization
150(1)
Coarticulation Effects
151(1)
Consonant Perception
152(8)
Perception of the Manner of Articulation Feature
152(2)
Perception of Place of Articulation of Consonants
154(4)
Perception of Voicing in Obstruents
158(2)
Duration as a Phonemic Cue
160(2)
Manner Cues
160(1)
Place Cues
161(1)
Speaking Rate Effects
161(1)
Intonation: Perception of Prosody
162(6)
Stress: Lexical and Sentential
163(1)
Acoustic Correlates of Stress
164(1)
Perception of Syntactic Features
165(3)
Perceptually Relevant Pitch Movements
168(1)
Other Aspects of Speech Perception (‡)
168(3)
Adaptation Studies
168(1)
Dichotic Studies
169(1)
Phase Effects
169(1)
Word and Syllable Effects
170(1)
Perception of Distorted Speech
170(1)
Speech Perception by the Handicapped
171(1)
Conclusion
171(2)
Problems
172(1)
Speech Analysis
173(56)
Introduction
173(1)
Short-Time Speech Analysis
174(5)
Windowing
175(1)
Spectra of Windows: Wide-and Narrow-Band Spectrograms
176(3)
Time-Domain Parameters
179(6)
Signal Analysis in the Time Domain
179(1)
Short-Time Average Energy and Magnitude
180(1)
Short-Time Average Zero-Crossing Rate (ZCR)
181(2)
Short-Time Autocorrelation Function
183(2)
Frequency-Domain (Spectral) Parameters
185(7)
Filter-Bank Analysis
186(1)
Short-Time Fourier Transform Analysis
186(2)
Spectral Displays
188(1)
Formant Estimation and Tracking
189(2)
Other Spectral Methods (‡)
191(1)
Energy Separation (‡)
191(1)
Linear Predictive Coding (LPC) Analysis
192(18)
Basic Principles of LPC
192(1)
Least-Squares Autocorrelation Method
193(3)
Least-Squares Covariance Method
196(1)
Computational Considerations
197(1)
Spectral Estimation via LPC
198(4)
Updating the LPC Model Sample by Sample
202(1)
Transversal Predictors
202(1)
Lattice LPC Models
203(3)
Window Considerations
206(1)
Modifications to Standard LPC
207(1)
Emphasizing Low Frequencies
208(1)
Pole-Zero LPC Models
209(1)
Cepstral Analysis
210(5)
Mathematical Details of Cepstral Analysis
211(1)
Applications for the Cepstrum
212(2)
Mel-Scale Cepstrum
214(1)
Other Spectral Estimation Methods (‡)
215(2)
Karhunen-Loeve Transform (KLT)
215(1)
Wavelets
216(1)
Winger Distribution
216(1)
Other Recent Techniques
217(1)
F0 (``Pitch'') Estimation
217(5)
Time-Domain F0 Techniques
218(1)
Short-Time Spectral Techniques
219(3)
Robust Analysis
222(1)
Reducation of Information
222(3)
Taking Advantage of Gradual Vocal Tract Motion
223(1)
Smoothing: Linear and Nonlinear
224(1)
Summary
225(4)
Problems
226(3)
Coding of Speech Signals
229(94)
Introduction
229(3)
Coding Noise
229(1)
Applications
230(1)
Quality
230(1)
Classes of Coders
231(1)
Chapter Overview
231(1)
Quantization
232(13)
Quantization Error or Noise
233(2)
Bit Protection
235(1)
Signal-to-Noise Ratio (SNR)
236(1)
Nonuniform Quantization
237(3)
Relationship of Bandwidth and Noise to Coding Rate
240(1)
Vector Quantization (VQ)
241(4)
Speech Redundancies (Characteristics to Exploit)
245(1)
Measures to Evaluate Speech Quality
245(1)
Time-Domain Waveform Coding
246(12)
Basic Time-Adaptive Waveform Coding
247(3)
Exploiting Properties of the Spectral Envelope
250(3)
Exploiting the Periodicity of Voiced Speech
253(3)
Exploiting Auditory Limitations (Noise Shaping)
256(2)
Linear Predictive Coding (LPC)
258(20)
Linear Delta Modulation (LDM)
260(2)
Adaptive Delta Modulation (ADM)
262(2)
Adaptive Differential Pulse-Code Modulation (ADPCM)
264(2)
Linear Predictive Coders (LPC) vs. Linear Predictive Analysis-by-Synthesis (LPAS)
266(1)
Equivalent Forms for LPC Coefficients
267(1)
Line Spectrum Pairs/Frequencies
268(1)
Parameter Updating and Transmission
269(1)
Variable-Frame-Rate (VFR) Transmission
269(1)
Transmission Details
270(1)
Time-Varying LPC Coefficients (‡)
270(1)
Different Excitation Models
271(6)
Waveform Interpolation
277(1)
Spectral (Frequency-Domain) Coders
278(17)
Filter Bank Analysis
278(2)
Sub-Band Coders (SBC)
280(4)
Adaptive Transform Coders (ATC)
284(5)
Harmonic Coding
289(6)
Other Vocoders (Non-LP Source Voice Coders) (‡)
295(7)
Phase Vocoder
296(2)
Channel Vocoders
298(1)
Excitation for Vocoders
299(1)
Homomorphic (Cepstral) Vocoder
300(1)
Other Vocoders
301(1)
Vector Quantization (VQ) Coders
302(11)
Split VQ Coders
303(1)
Gain/Shape Vector Quantization
303(1)
Other Types of Vector Quantization
304(3)
Vector Quantization of LPC
307(1)
Code-Excited Linear Prediction (CELP)
308(4)
Very-Low-Rate LPC Vocoders
312(1)
Network and Application Considerations
313(5)
Packet Transmission
314(1)
Time Assignment Speech Interpolation (TASI)
315(1)
Embedded Coding
316(1)
Tandeming of Coders (‡)
317(1)
Hardware Implementation: Integrated Circuits
318(1)
Summary
319(4)
Problems
320(3)
Speech Enhancement
323(14)
Introduction
323(1)
Background
324(1)
Nature of Interfering Sounds
325(2)
Speech Enhancement (SE) Techniques
327(1)
Spectral Subtraction and Filtering
327(1)
Harmonic Filtering
327(1)
Parametric Resynthesis
327(1)
Spectral Subtraction (SS)
328(1)
Filtering and Adaptive Noise Cancellation
329(4)
Filtering
329(1)
Multi-Microphone Adaptive Noise Cancellation (ANC)
330(3)
Methods Involving Fundamental Frequency Tracking
333(2)
Enhancement by Resynthesis
335(1)
Summary
336(1)
Speech Synthesis
337(30)
Introduction
337(1)
Principles of Speech Synthesis
338(8)
Types of Stored Speech Units to Concatenate
339(2)
Memory Size
341(2)
Synthesis Method
343(1)
Limited-Text (Voice-Response) Systems
343(1)
Unrestricted-Text (TTS) Systems
344(2)
Synthesizer Methods
346(14)
Articulatory Synthesis (‡)
346(3)
Formant Synthesis
349(3)
Linear Predictive Coding (LPC) Synthesis
352(1)
Specifying Parameter Trajectories (‡)
353(2)
Intraframe Parameter Updating (‡)
355(1)
Excitation Modeling (‡)
356(3)
Waveform Concatenation
359(1)
Synthesis of Intonation
360(3)
Speech Synthesis for Different Speakers
363(1)
Speech Synthesis in Other Languages
364(1)
Evaluation of TTS Systems
364(1)
Practical Speech Synthesis
365(1)
Conclusion
365(2)
Problems
366(1)
Automatic Speech Recognition
367(70)
Introduction
367(5)
ASR Search: Vast and Expensive
367(2)
Variability in Speech Signals
369(1)
Segmenting Speech into Smaller Units
369(1)
Performance Evaluation
370(1)
Databases for Speech Recognition
371(1)
Basic Pattern Recognition Approach
372(3)
Pattern Recognition Methods
372(1)
Different Viewpoints Toward ASR
373(2)
Preprocessing
375(1)
Parametric Representation
375(3)
Parameters Used in Recognition
377(1)
Feature Extraction
377(1)
Evaluating the Similarity of Speech Patterns
378(8)
Frame-Based Distance Measures
379(5)
Making ASR Decisions
384(2)
Accommodating Both Spectral and Temporal Variability
386(16)
Segmenting Speech into Smaller Units
387(2)
Dynamic Time Warping
389(10)
Applying Vector Quantization to ASR
399(3)
Networks for Speech Recognition
402(10)
Hidden Markov Models (HMMs)
402(10)
Adapting to Variability in Speech
412(5)
Intraspeaker Variability (Speaker Freedom)
412(1)
Interspeaker Variability (Everybody's Different)
412(3)
Environmental Variability (Noise Robustness)
415(2)
Language Models (LMs)
417(3)
Grammars in ASR
417(1)
Integrating Language Models into ASR
418(2)
Search Design
420(4)
Efficient Searches
420(3)
Allowing Vocabulary Freedom
423(1)
Out-of-Vocabulary Words
424(1)
Artificial Neural Networks
424(3)
Training ANNs
425(1)
Accommodating Timing in ANNs
426(1)
Expert-System Approach to ASR
427(5)
Segmenting Speech into Syllables
427(1)
Segmentation of Continuous Speech into Phones
428(2)
Labeling Phones
430(1)
Phonological Rules
431(1)
Using Prosodics to Aid Recognition
432(1)
Commercial Systems
432(1)
Summary of Current ASR Design
433(1)
Conclusion
433(4)
Problems
434(3)
Speaker Recognition
437(24)
Introduction
437(1)
Verification vs. Recognition
438(1)
Recognition Techniques
439(11)
Model Evaluation
440(1)
Text Dependence
440(1)
Statistical vs. Dynamic Features
441(2)
Stochastic Models
443(1)
Vector Quantization
443(2)
Similarity and Distance Measures
445(2)
Cepstral Analysis
447(1)
Orthogonal LPC Parameters (‡)
447(2)
Neural Network Approaches
449(1)
Features that Distinguish Speakers
450(3)
Measures of the Effectiveness of Features for Recognition
450(1)
Techniques to Choose Features
451(1)
Spectral Features
452(1)
Prosodic Features
452(1)
System Design
453(3)
Data Collection
453(1)
Sequential Decision Strategy
454(1)
Multiple-Stage Recognition
454(1)
Effects of Different Communication Channels (‡)
455(1)
Language and Accent Identification
456(1)
Speaker Recognition by Humans
457(1)
Conclusion
458(3)
Problems
458(3)
Appendix: Computer Sites for Help on Speech Communication 461(8)
References 469(68)
Index 537(10)
About the Author 547

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program