Preface

Acknowledgments

xviii

Statistical Speech Recognition

(66)

Part I: Theory

Introduction

(4)

The theoretical part

(2)

The implementation part

(1)

ASR: A Simple Explanation abc

(2)

Statistical Speech Recognition abc

(10)

The deterministic approach

(1)

The stochastic framework

(1)

Stochastic model simplifications

(3)

Hidden Markov Models: an introduction

(3)

The search algorithm

(1)

ASR Structure and Chapter Organization ***

(2)

Problems, Applications and Products in ASR

(2)

The ASR as an Information Theory Problem ***

(6)

The layered model

(2)

The L-language

(1)

The information theory approach

(4)

Part II: Implementation

Introduction

(3)

The Crisis of Software Development *** ***

(2)

Object oriented versus procedural programming

(1)

A Safe C++ Programming Style ***

(10)

What must not be used in C++ ***

(3)

Syntax and programming style ***

(1)

Conventions in writing code ***

(2)

Basic design rules: the modules ***

(2)

The Program Services ***

(2)

The Memory Service ***

(10)

Dynamic memory management: a C++ without pointers ***

(4)

Object list resizing: the C++ ``realloc'' ***

(1)

The root class: a user guide

(3)

Implementation details of the container classes ***

(1)

Diagnostics ***

(6)

Defensive programming ***

(3)

The diagnostic system (User guide)

(2)

Implementation details of the diagnostic system ***

(1)

Topics Related to Large Projects *** ***

(6)

Portability and defined options (file compatib.h defopt.h Boolean.h)

(2)

Test routines and version history

(1)

Time efficiency, development speed and code readability

(3)

Speech Database

(54)

Part I: Theory

Introduction

(1)

Statistical Notes

(1)

Speech Variability ***

(4)

Design of Speech Databases ***

(8)

Some practical aspects of building a corpus

(2)

Transcriptions of speech

(2)

Speech Database Sources

(6)

TIMIT and ATIS databases ***

(5)

Part II: Implementation

Introduction

(2)

The Configuration Manager ***

(5)

The configuration file ***

(1)

The option configuration file *.opt ***

(1)

The configuration class ***

(2)

Introducing new parameters ***

(1)

The String Class ***

(5)

The String class usage ***

(1)

String class techniques ***

(3)

The Structure of an ASR-Oriented Database

100

(4)

The options of the database manager ***

103

(1)

The Database Manager Class Hierarchy

104

(9)

The sound file hierarchy (file soundfil.h)

105

(2)

The label file hierarchy

107

(3)

Soundlabelled File and Dbase Voc classes (file soundlab.*)

110

(3)

Safe Polymorphism *** ***

113

(5)

Implementation

115

(3)

Adding new database standards ***

118

(1)

Hints on Class Design *** ***

118

(3)

Speech Signal Analysis

121

(46)

Part I: Theory

Introduction

122

(1)

Analog to Digiral Voice Conversion ***

122

(3)

Physical features of speech signals ***

124

(1)

Feature Extraction ***

125

(9)

Signal preprocessing

125

(1)

Windowing

126

(3)

Spectral analysis

129

(1)

Filter bank processing

130

(2)

Log energy computation

132

(1)

Mel frequency cepstrum computation

132

(2)

Delta coefficients and energy

134

(1)

Cepstrum Analysis

134

(2)

Robustness in Speech Recognition ***

136

(5)

Additive noise and linear distortion model

137

(1)

Cepstral compensation

138

(2)

Channel equalization and speech enhancement

140

(1)

Model adaptation techniques

141

(1)

Distortion Measures ***

141

(3)

Part II: Implementation

Introduction

144

(2)

Feature Extraction: a DSP Approach ***

146

(3)

Configuration of the feature manager

148

(1)

Hierarchy Design

149

(5)

Adding a new block ***

153

(1)

Safe Interfacing C-Style Routines *** ***

154

(4)

The Mathematical Classes ***

158

(9)

Overview of mathematical classes ***

160

(1)

Mathematical vectors in C++ ***

161

(2)

Advanced techniques for vector implementation ***

163

(4)

HMMs and Initialization

167

(44)

Part I: Theory

Introduction

168

(1)

Markov Chains ***

168

(2)

Hidden Markov Models ***

170

(5)

Alternative definitions for HMMs

171

(1)

Processes suitable for HMM modeling

171

(1)

Example: two state-HMM

172

(3)

Application of the HMM to Speech Modeling

175

(9)

Probability inference in HMMs: forward and backward procedure

179

(1)

Observation densities

180

(2)

HMM initialization

182

(2)

Automatic Segmentation

184

(1)

Clustering of Feature Vectors

185

(3)

HMMs versus Probabilistic, Bayesian and Neural Networks ***

188

(4)

Part II: Implementation

Introduction

192

(1)

HMM Phoneme Models

193

(1)

HMM Estimation Procedure ***

194

(4)

Implementation Details

198

(4)

HMM model implementation

198

(2)

Algorithm implementation

200

(2)

Sectionate

202

(1)

Module Options

202

(2)

Array Class ***

204

(3)

Array class regerence guide ***

205

(2)

Diagonal Arrays ***

207

(4)

Diagonal Array user guide

207

(1)

Diagonal array: implementation details ***

208

(3)

HMM Training

211

(32)

Part II: Theory

Introduction

212

(1)

MLE and MAP Training

213

(1)

Utterance Probabilities and HMM Parameters

214

(1)

The Maximization Procedure: MLE Case

215

(6)

Parameter estimation formulas for MLE ***

217

(2)

Summary of MLE procedure

219

(2)

Maximum a Posteriori (MAP) Estimation

221

(7)

Explicit formulas for MAP

223

(1)

Estimate of the hyper-parameters

223

(1)

Use of MAP procedure

224

(4)

MAP Applications

228

(1)

Part II: Implementation

Introduction

229

(1)

Single Model Training

230

(4)

Single model re-estimation algorithm

231

(3)

Model Simultaneous Training

234

(4)

Algorithm description

234

(2)

Algorithm optimizations

236

(2)

Logarithmic Math

238

(1)

Class Organization

238

(1)

Labels: Hints for Futher Developments

239

(1)

Option Description

240

(3)

Language Models

243

(60)

Part I: Theory

Introduction

244

(1)

The Language Models

245

(1)

Languages as Information Sources

246

(3)

The M-gram Model

249

(2)

Perplexity

251

(1)

Rare Events Estimation: Smoothing

252

(7)

Good-Turing smoothing

254

(2)

Linear interpolation model

256

(1)

Non-linear interpolation

257

(2)

Smoothing Methods: Examples

259

(7)

Perplexity: an example

265

(1)

Word Automatic Clustering ***

266

(7)

Estimation accuracy of class-based and word-based probabilities

267

(3)

Clustering algorithms

270

(1)

Equivalent word merging algorithm

270

(3)

Adaptive Language Models ***

273

(1)

Appendix

274

(3)

Part II: Implementation

Introduction

277

(2)

Language Training Model Options

279

(2)

Hierarchy Design

281

(3)

Database Interface and Counts

284

(11)

Segmented database (TIMIT)

285

(3)

Non-segmented databases

288

(4)

Sorted list and binary search ***

292

(3)

Clustering

295

(5)

Merging algorithm

297

(2)

Perplexity computation for class-based language models

299

(1)

Transition Probability Computation

300

(3)

Recognition

303

(38)

Part I: Theory

Introduction

304

(1)

Static Structure of the Recognizer ***

305

(4)

Static structure of a phonetic recognizer

306

(1)

Static structure of a word recognizer

307

(2)

The Search Algorithms ***

309

(4)

The Viterbi algorithm

310

(1)

Algorithm description

311

(2)

Dynamic Structure Implementation ***

313

(2)

Example of the Viterbi Algorithm

315

(5)

Use of the logarithms

320

(1)

Variants of the Static Structures ***

320

(6)

Part II: Implementation

Introduction

326

(1)

Static Structure

327

(6)

Graph implementation *** ***

328

(1)

Layered graph

329

(4)

Dynamic Structure

333

(5)

Memory management

333

(2)

Representing trees

335

(1)

Hypothesis management

336

(2)

Recognition Options

338

(3)

Evaluation and Parameter Setting

341

(22)

Part I: Theory

Introduction

342

(3)

Evaluation conditions

343

(2)

Speech Data Retrieval

345

(1)

Feature Parameters

345

(1)

Initialization

346

(2)

Training

348

(4)

Grammar

352

(1)

Phonetic grammar

352

(1)

Word language model

353

(1)

Recongnition

353

(6)

Phonetic recognition

354

(2)

Word recognition

356

(3)

Part II: Implementation

Evaluation

359

(1)

Options of the evaluator

359

(1)

RES Specifications

360

(3)

Econometric Appendix: The Behavior of Financial Time Series

Introduction

363

(1)

The state of the art: the homogeneous rational expectation hypothesis approach

364

(4)

Descriptive evidence from ``blue chips'' in Italy. USA. UK and Germany

368

(2)

Autoregressive conditional heteroskedasticity models for stock returns

370

(2)

Research frontiers on statistical properties of stock returns: the HMM approach

372

(1)

Conclusions

373

(1)

Empirical results

374

(1)

Diagnostics

374

(13)

References

387

(12)

Index

399

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Speech Recognition Theory and C++ Implementation

9780471977308

0471977306

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program