9780387952840

Preface

vii

Introduction

(8)

Overview of Supervised Learning

(32)

Introduction

(1)

Variable Types and Terminology

(2)

Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors

(7)

Linear Models and Least Squares

(3)

Nearest-Neighbor Methods

(2)

From Least Squares to Nearest Neighbors

(2)

Statistical Decision Theory

(4)

Local Methods in High Dimensions

(6)

Statistical Models, Supervised Learning and Function Approximation

(4)

A Statistical Model for the Joint Distribution Pr (X, Y)

(1)

Supervised Learning

(1)

Function Approximation

(3)

Structured Regression Models

(1)

Difficulty of the Problem

(1)

Classes of Restricted Estimators

(4)

Roughness Penalty and Bayesian Methods

(1)

Kernel Methods and Local Regression

(1)

Basis Functions and Dictionary Methods

(2)

Model Selection and the Bias --- Variance Tradeoff

(4)

Bibliographic Notes

(1)

Exercises

(2)

Linear Methods for Regression

(38)

Introduction

(1)

Linear Regression Models and Least Squares

(8)

Example: Prostate Cancer

(2)

The Gauss --- Markov Theorem

(1)

Multiple Regression from Simple Univariate Regression

(5)

Multiple Outputs

(1)

Subset Selection and Coefficient Shrinkage

(20)

Subset Selection

(2)

Prostate Cancer Data Example (Continued)

(2)

Shrinkage Methods

(7)

Methods Using Derived Input Directions

(2)

Discussion: A Comparison of the Selection and Shrinkage Methods

(5)

Multiple Outcome Shrinkage and Selection

(2)

Computational Considerations

(4)

Bibliographic Notes

(1)

Exercises

(4)

Linear Methods for Classification

(36)

Introduction

(2)

Linear Regression of an Indicator Matrix

(3)

Linear Discriminant Analysis

(11)

Regularized Discriminant Analysis

(1)

Computations for LDA

(1)

Reduced-Rank Linear Discriminant Analysis

(4)

Logistic Regression

(10)

Fitting Logistic Regression Models

(2)

Example: South African Heart Disease

100

(2)

Quadratic Approximations and Inference

102

(1)

Logistic Regression or LDA?

103

(2)

Separating Hyperplanes

105

(10)

Rosenblatt's Perceptron Learning Algorithm

107

(1)

Optimal Separating Hyperplanes

108

(3)

Bibliographic Notes

111

(1)

Exercises

111

(4)

Basis Expansions and Regularization

115

(50)

Introduction

115

(2)

Piecewise Polynomials and Splines

117

(9)

Natural Cubic Splines

120

(2)

Example: South African Heart Disease (Continued)

122

(2)

Example: Phoneme Recognition

124

(2)

Filtering and Feature Extraction

126

(1)

Smoothing Splines

127

(7)

Degrees of Freedom and Smoother Matrices

129

(5)

Automatic Selection of the Smoothing Parameters

134

(3)

Fixing the Degrees of Freedom

134

(1)

The Bias --- Variance Tradeoff

134

(3)

Nonparametric Logistic Regression

137

(1)

Multidimensional Splines

138

(6)

Regularization and Reproducing Kernel Hilbert Spaces

144

(4)

Spaces of Functions Generated by Kernels

144

(2)

Examples of RKHS

146

(2)

Wavelet Smoothing

148

(17)

Wavelet Bases and the Wavelet Transform

150

(3)

Adaptive Wavelet Filtering

153

(2)

Bibliographic Notes

155

(1)

Exercises

155

(5)

Appendix: Computational Considerations for Splines

160

(1)

Appendix: B-splines

160

(3)

Appendix: Computations for Smoothing Splines

163

(2)

Kernel Methods

165

(28)

One-Dimensional Kernel Smoothers

165

(7)

Local Linear Regression

168

(3)

Local Polynomial Regression

171

(1)

Selecting the Width of the Kernel

172

(2)

Local Regression in IRp

174

(1)

Structured Local Regression Models in IRp

175

(4)

Structured Kernels

177

(1)

Structured Regression Functions

177

(2)

Local Likelihood and Other Models

179

(3)

Kernel Density Estimation and Classification

182

(4)

Kernel Density Estimation

182

(2)

Kernel Density Classification

184

(1)

The Naive Bayes Classifier

184

(2)

Radial Basis Functions and Kernels

186

(2)

Mixture Models for Density Estimation and Classification

188

(2)

Computational Considerations

190

(3)

Bibliographic Notes

190

(1)

Exercises

190

(3)

Model Assessment and Selection

193

(32)

Introduction

193

(1)

Bias, Variance and Model Complexity

193

(3)

The Bias --- Variance Decomposition

196

(4)

Example: Bias --- Variance Tradeoff

198

(2)

Optimism of the Training Error Rate

200

(3)

Estimates of In-Sample Prediction Error

203

(2)

The Effective Number of Parameters

205

(1)

The Bayesian Approach and BIC

206

(2)

Minimum Description Length

208

(2)

Vapnik---Chernovenkis Dimension

210

(4)

Example (Continued)

212

(2)

Cross-Validation

214

(3)

Bootstrap Methods

217

(8)

Example (Continued)

220

(2)

Bibliographic Notes

222

(1)

Exercises

222

(3)

Model Inference and Averaging

225

(32)

Introduction

225

(1)

The Bootstrap and Maximum Likelihood Methods

225

(6)

A Smoothing Example

225

(4)

Maximum Likelihood Inference

229

(2)

Bootstrap versus Maximum Likelihood

231

(1)

Bayesian Methods

231

(4)

Relationship Between the Bootstrap and Bayesian Inference

235

(1)

The EM Algorithm

236

(7)

Two-Component Mixture Model

236

(4)

The EM Algorithm in General

240

(1)

EM as a Maximization-Maximization Procedure

241

(2)

MCMC for Sampling from the Posterior

243

(3)

Bagging

246

(4)

Example: Trees with Simulated Data

247

(3)

Model Averaging and Stacking

250

(3)

Stochastic Search: Bumping

253

(4)

Bibliographic Notes

254

(1)

Exercises

255

(2)

Additive Models, Trees, and Related Methods

257

(42)

Generalized Additive Models

257

(9)

Fitting Additive Models

259

(2)

Example: Additive Logistic Regression

261

(5)

Summary

266

(1)

Tree-Based Methods

266

(13)

Background

266

(1)

Regression Trees

267

(3)

Classification Trees

270

(2)

Other Issues

272

(3)

Spam Example (Continued)

275

(4)

PRIM --- Bump Hunting

279

(4)

Spam Example (Continued)

282

(1)

MARS: Multivariate Adaptive Regression Splines

283

(7)

Spam Example (Continued)

287

(1)

Example (Simulated Data)

288

(1)

Other Issues

289

(1)

Hierarchical Mixtures of Experts

290

(3)

Missing Data

293

(2)

Computational Considerations

295

(4)

Bibliographic Notes

295

(1)

Exercises

296

(3)

Boosting and Additive Trees

299

(48)

Boosting Methods

299

(4)

Outline of this Chapter

302

(1)

Boosting Fits an Additive Model

303

(1)

Forward Stagewise Additive Modeling

304

(1)

Exponential Loss and AdaBoost

305

(1)

Why Exponential Loss?

306

(2)

Loss Functions and Robustness

308

(4)

``Off-the-Shelf'' Procedures for Data Mining

312

(2)

Example --- Spam Data

314

(2)

Boosting Trees

316

(3)

Numerical Optimization

319

(4)

Steepest Descent

320

(1)

Gradient Boosting

320

(2)

MART

322

(1)

Right-Sized Trees for Boosting

323

(1)

Regularization

324

(7)

Shrinkage

326

(2)

Penalized Regression

328

(2)

Virtues of the L1 Penalty (Lasso) over L2

330

(1)

Interpretation

331

(4)

Relative Importance of Predictor Variables

331

(2)

Partial Dependence Plots

333

(2)

Illustrations

335

(12)

California Housing

335

(4)

Demographics Data

339

(1)

Bibliographic Notes

340

(4)

Exercises

344

(3)

Neural Networks

347

(24)

Introduction

347

(1)

Projection Pursuit Regression

347

(3)

Neural Networks

350

(3)

Fitting Neural Networks

353

(2)

Some Issues in Training Neural Networks

355

(4)

Starting Values

355

(1)

Overfitting

356

(2)

Scaling of the Inputs

358

(1)

Number of Hidden Units and Layers

358

(1)

Multiple Minima

359

(1)

Example: Simulated Data

359

(3)

Example: ZIP Code Data

362

(4)

Discussion

366

(1)

Computational Considerations

367

(4)

Bibliographic Notes

367

(2)

Exercises

369

(2)

Support Vector Machines and Flexible Discriminants

371

(40)

Introduction

371

(1)

The Support Vector Classifier

371

(6)

Computing the Support Vector Classifier

373

(2)

Mixture Example (Continued)

375

(2)

Support Vector Machines

377

(13)

Computing the SVM for Classification

377

(3)

The SVM as a Penalization Method

380

(1)

Function Estimation and Reproducing Kernels

381

(3)

SVMs and the Curse of Dimensionality

384

(1)

Support Vector Machines for Regression

385

(2)

Regression and Kernels

387

(2)

Discussion

389

(1)

Generalizing Linear Discriminant Analysis

390

(1)

Flexible Discriminant Analysis

391

(6)

Computing the FDA Estimates

394

(3)

Penalized Discriminant Analysis

397

(2)

Mixture Discriminant Analysis

399

(12)

Example: Waveform Data

402

(4)

Bibliographic Notes

406

(1)

Exercises

406

(5)

Prototype Methods and Nearest - Neighbors

411

(26)

Introduction

411

(1)

Prototype Methods

411

(4)

K-means Clustering

412

(2)

Learning Vector Quantization

414

(1)

Gaussian Mixtures

415

(1)

κ-Nearest-Neighbor Classifiers

415

(12)

Example: A Comparative Study

420

(2)

Example: κ-Nearest-Neighbors and Image Scene Classification

422

(1)

Invariant Metrics and Tangent Distance

423

(4)

Adaptive Nearest-Neighbor Methods

427

(5)

Example

430

(1)

Global Dimension Reduction for Nearest-Neighbors

431

(1)

Computational Considerations

432

(5)

Bibliographic Notes

433

(1)

Exercises

433

(4)

Unsupervised Learning

437

(72)

Introduction

437

(2)

Association Rules

439

(14)

Market Basket Analysis

440

(1)

The Apriori Algorithm

441

(3)

Example: Market Basket Analysis

444

(3)

Unsupervised as Supervised Learning

447

(2)

Generalized Association Rules

449

(2)

Choice of Supervised Learning Method

451

(1)

Example: Market Basket Analysis (Continued)

451

(2)

Cluster Analysis

453

(27)

Proximity Matrices

455

(1)

Dissimilarities Based on Attributes

455

(2)

Object Dissimilarity

457

(2)

Clustering Algorithms

459

(1)

Combinatorial Algorithms

460

(1)

K-means

461

(2)

Gaussian Mixtures as Soft K-means Clustering

463

(1)

Example: Human Tumor Microarray Data

463

(3)

Vector Quantization

466

(2)

K-medoids

468

(2)

Practical Issues

470

(2)

Hierarchical Clustering

472

(8)

Self-Organizing Maps

480

(5)

Principal Components, Curves and Surfaces

485

(9)

Principal Components

485

(6)

Principal Curves and Surfaces

491

(3)

Independent Component Analysis and Exploratory Projection Pursuit

494

(8)

Latent Variables and Factor Analysis

494

(2)

Independent Component Analysis

496

(4)

Exploratory Projection Pursuit

500

(1)

A Different Approach to ICA

500

(2)

Multidimensional Scaling

502

(7)

Bibliographic Notes

503

(1)

Exercises

504

(5)

References

509

(14)

Author Index

523

(4)

Index

527

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

The Elements of Statistical Learning

0387952845

Supplemental Materials

Summary

Table of Contents

Supplemental Materials

Rewards Program