The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, recognize faces or spoken speech, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. Introduction to Machine Learningis a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. It discusses many methods based in different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining, in order to present a unified treatment of machine learning problems and solutions. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program. The book can be used by advanced undergraduates and graduate students who have completed courses in computer programming, probability, calculus, and linear algebra. It will also be of interest to engineers in the field who are concerned with the application of machine learning methods. After an introduction that defines machine learning and gives examples of machine learning applications, the book covers supervised learning, Bayesian decision theory, parametric methods, multivariate methods, dimensionality reduction, clustering, nonparametric methods, decision trees, linear discrimination, multilayer perceptrons, local models, hidden Markov models, assessing and comparing classification algorithms, combining multiple learners, and reinforcement learning.

Series Foreword

xiii

Figures

Tables

xxiii

Preface

xxv

Acknowledgments

xxvii

Notations

xxix

Introduction

(16)

What Is Machine Learning?

(2)

Examples of Machine Learning Applications

(9)

Learning Associations

(1)

Classification

(4)

Regression

(2)

Unsupervised Learning

(1)

Reinforcement Learning

(1)

Notes

(2)

Relevant Resources

(1)

Exercises

(1)

References

(1)

Supervised Learning

(22)

Learning a Class from Examples

(5)

Vapnik-Chervonenkis (VC) Dimension

(2)

Probably Approximately Correct (PAC) Learning

(1)

Noise

(2)

Learning Multiple Classes

(2)

Regression

(3)

Model Selection and Generalization

(3)

Dimensions of a Supervised Machine Learning Algorithm

(1)

Notes

(1)

Exercises

(1)

References

(1)

Bayesian Decision Theory

(22)

Introduction

(2)

Classification

(2)

Losses and Risks

(2)

Discriminant Functions

(1)

Utility Theory

(1)

Value of Information

(1)

Bayesian Networks

(7)

Influence Diagrams

(1)

Association Rules

(1)

Notes

(1)

Exercises

(1)

References

(3)

Parametric Methods

(24)

Introduction

(1)

Maximum Likelihood Estimation

(2)

Bernoulli Density

(1)

Multinomial Density

(1)

Gaussian (Normal) Density

(1)

Evaluating an Estimator: Bias and Variance

(3)

The Bayes' Estimator

(2)

Parametric Classification

(4)

Regression

(3)

Tuning Model Complexity: Bias/Variance Dilemma

(3)

Model Selection Procedures

(3)

Notes

(1)

Exercises

(1)

References

(2)

Multivariate Methods

(20)

Multivariate Data

(1)

Parameter Estimation

(1)

Estimation of Missing Values

(1)

Multivariate Normal Distribution

(4)

Multivariate Classification

(6)

Tuning Complexity

(1)

Discrete Features

(1)

Multivariate Regression

100

(2)

Notes

102

(1)

Exercises

102

(1)

References

103

(2)

Dimensionality Reduction

105

(28)

Introduction

105

(1)

Subset Selection

106

(2)

Principal Components Analysis

108

(8)

Factor Analysis

116

(5)

Multidimensional Scaling

121

(3)

Linear Discriminant Analysis

124

(3)

Notes

127

(3)

Exercises

130

(1)

References

130

(3)

Clustering

133

(20)

Introduction

133

(1)

Mixture Densities

134

(1)

k-Means Clustering

135

(4)

Expectation-Maximization Algorithm

139

(5)

Mixtures of Latent Variable Models

144

(1)

Supervised Learning after Clustering

145

(1)

Hierarchical Clustering

146

(3)

Choosing the Number of Clusters

149

(1)

Notes

149

(1)

Exercises

150

(1)

References

150

(3)

Nonparametric Methods

153

(20)

Introduction

153

(1)

Nonparametric Density Estimation

154

(5)

Histogram Estimator

155

(2)

Kernel Estimator

157

(1)

k-Nearest Neighbor Estimator

158

(1)

Generalization to Multivariate Data

159

(2)

Nonparametric Classification

161

(1)

Condensed Nearest Neighbor

162

(2)

Nonparametric Regression: Smoothing Models

164

(4)

Running Mean Smoother

165

(1)

Kernel Smoother

166

(1)

Running Line Smoother

167

(1)

How to Choose the Smoothing Parameter

168

(1)

Notes

169

(1)

Exercises

170

(1)

References

170

(3)

Decision Trees

173

(24)

Introduction

173

(2)

Univariate Trees

175

(7)

Classification Trees

176

(4)

Regression Trees

180

(2)

Pruning

182

(3)

Rule Extraction from Trees

185

(1)

Learning Rules from Data

186

(4)

Multivariate Trees

190

(2)

Notes

192

(3)

Exercises

195

(1)

References

195

(2)

Linear Discrimination

197

(32)

Introduction

197

(2)

Generalizing the Linear Model

199

(1)

Geometry of the Linear Discriminant

200

(4)

Two Classes

200

(2)

Multiple Classes

202

(2)

Pairwise Separation

204

(1)

Parametric Discrimination Revisited

205

(1)

Gradient Descent

206

(2)

Logistic Discrimination

208

(8)

Two Classes

208

(3)

Multiple Classes

211

(5)

Discrimination by Regression

216

(2)

Support Vector Machines

218

(9)

Optimal Separating Hyperplane

218

(3)

The Nonseparable Case: Soft Margin Hyperplane

221

(2)

Kernel Functions

223

(2)

Support Vector Machines for Regression

225

(2)

Notes

227

(1)

Exercises

227

(1)

References

228

(1)

Multilayer Perceptrons

229

(46)

Introduction

229

(4)

Understanding the Brain

230

(1)

Neural Networks as a Paradigm for Parallel Processing

231

(2)

The Perceptron

233

(3)

Training a Perceptron

236

(3)

Learning Boolean Functions

239

(2)

Multilayer Perceptrons

241

(3)

MLP as a Universal Approximator

244

(1)

Backpropagation Algorithm

245

(7)

Nonlinear Regression

246

(2)

Two-Class Discrimination

248

(2)

Multiclass Discrimination

250

(2)

Multiple Hidden Layers

252

(1)

Training Procedures

252

(7)

Improving Convergence

252

(1)

Overtraining

253

(1)

Structuring the Network

254

(3)

Hints

257

(2)

Tuning the Network Size

259

(3)

Bayesian View of Learning

262

(1)

Dimensionality Reduction

263

(3)

Learning Time

266

(2)

Time Delay Neural Networks

266

(1)

Recurrent Networks

267

(1)

Notes

268

(2)

Exercises

270

(1)

References

271

(4)

Local Models

275

(30)

Introduction

275

(1)

Competitive Learning

276

(8)

Online k-Means

276

(5)

Adaptive Resonance Theory

281

(1)

Self-Organizing Maps

282

(2)

Radial Basis Functions

284

(6)

Incorporating Rule-Based Knowledge

290

(1)

Normalized Basis Functions

291

(2)

Competitive Basis Functions

293

(3)

Learning Vector Quantization

296

(1)

Mixture of Experts

296

(4)

Cooperative Experts

299

(1)

Competitive Experts

300

(1)

Hierarchical Mixture of Experts

300

(1)

Notes

301

(1)

Exercises

302

(1)

References

302

(3)

Hidden Markov Models

305

(22)

Introduction

305

(1)

Discrete Markov Processes

306

(3)

Hidden Markov Models

309

(2)

Three Basic Problems of HMMs

311

(1)

Evaluation Problem

311

(4)

Finding the State Sequence

315

(2)

Learning Model Parameters

317

(3)

Continuous Observations

320

(1)

The HMM with Input

321

(1)

Model Selection in HMM

322

(1)

Notes

323

(2)

Exercises

325

(1)

References

325

(2)

Assessing and Comparing Classification Algorithms

327

(24)

Introduction

327

(3)

Cross-Validation and Resampling Methods

330

(3)

K-Fold Cross-Validation

331

(1)

5x2 Cross-Validation

331

(1)

Bootstrapping

332

(1)

Measuring Error

333

(1)

Interval Estimation

334

(4)

Hypothesis Testing

338

(1)

Assessing a Classification Algorithm's Performance

339

(2)

Binomial Test

340

(1)

Approximate Normal Test

341

(1)

Paired t Test

341

(1)

Comparing Two Classification Algorithms

341

(4)

McNemar's Test

342

(1)

K-Fold Cross-Validated Paired t Test

342

(1)

5 x 2 cv Paired t Test

343

(1)

5 x 2 cv Paired F Test

344

(1)

Comparing Multiple Classification Algorithms: Analysis of Variance

345

(3)

Notes

348

(1)

Exercises

349

(1)

References

350

(1)

Combining Multiple Learners

351

(22)

Rationale

351

(3)

Voting

354

(3)

Error-Correcting Output Codes

357

(3)

Bagging

360

(1)

Boosting

360

(3)

Mixture of Experts Revisited

363

(1)

Stacked Generalization

364

(2)

Cascading

366

(2)

Notes

368

(1)

Exercises

369

(1)

References

370

(3)

Reinforcement Learning

373

(24)

Introduction

373

(2)

Single State Case: K-Armed Bandit

375

(1)

Elements of Reinforcement Learning

376

(3)

Model-Based Learning

379

(1)

Value Iteration

379

(1)

Policy Iteration

380

(1)

Temporal Difference Learning

380

(7)

Exploration Strategies

381

(1)

Deterministic Rewards and Actions

382

(1)

Nondeterministic Rewards and Actions

383

(2)

Eligibility Traces

385

(2)

Generalization

387

(2)

Partially Observable States

389

(2)

Notes

391

(2)

Exercises

393

(1)

References

394

(3)

A Probability

397

(12)

A.1 Elements of Probability

397

(2)

A.1.1 Axioms of Probability

398

(1)

A.1.2 Conditional Probability

398

(1)

A.2 Random Variables

399

(4)

A.2.1 Probability Distribution and Density Functions

399

(1)

A.2.2 Joint Distribution and Density Functions

400

(1)

A.2.3 Conditional Distributions

400

(1)

A.2.4 Bayes' Rule

401

(1)

A.2.5 Expectation

401

(1)

A.2.6 Variance

402

(1)

A.2.7 Weak Law of Large Numbers

403

(1)

A.3 Special Random Variables

403

(4)

A.3.1 Bernoulli Distribution

403

(1)

A.3.2 Binomial Distribution

404

(1)

A.3.3 Multinomial Distribution

404

(1)

A.3.4 Uniform Distribution

404

(1)

A.3.5 Normal (Gaussian) Distribution

405

(1)

A.3.6 Chi-Square Distribution

406

(1)

A.3.7 t Distribution

407

(1)

A.3.8 F Distribution

407

(1)

A.4 References

407

(2)

Index

409

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Introduction to Machine Learning

9780262012119

0262012111

Summary

Table of Contents

Supplemental Materials

Rewards Program

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Introduction to Machine Learning

9780262012119

0262012111

Summary

Table of Contents

Supplemental Materials

Rewards Program

Digital License