Winner, 2009 DeGroot Prize for the best book in statistical science, awarded by the International Society for Bayesian Analysis. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Carl Edward Rasmussen is a Lecturer at the Department of Engineering, University of Cambridge, and Adjunct Research Scientist at the Max Planck Institute for Biological Cybernetics, Tübingen.

Christopher K. I. Williams is Professor of Machine Learning and Director of the Institute for Adaptive and Neural Computation in the School of Informatics, University of Edinburgh.

Series Foreword

xi

Preface

xiii

Symbols and Notation

xvii

Introduction

1

(6)

A Pictorial Introduction to Bayesian Modelling

3

(2)

Roadmap

5

(2)

Regression

7

(26)

Weight-space View

7

(6)

The Standard Linear Model

8

(3)

Projections of Inputs into Feature Space

11

(2)

Function-space View

13

(6)

Varying the Hyperparameters

19

(2)

Decision Theory for Regression

21

(1)

An Example Application

22

(2)

Smoothing, Weight Functions and Equivalent Kernels

24

(3)

Incorporating Explicit Basis Functions

27

(2)

Marginal Likelihood

29

(1)

History and Related Work

29

(1)

Exercises

30

(3)

Classification

33

(46)

Classification Problems

34

(3)

Decision Theory for Classification

35

(2)

Linear Models for Classification

37

(2)

Gaussian Process Classification

39

(2)

The Laplace Approximation for the Binary GP Classifier

41

(7)

Posterior

42

(2)

Predictions

44

(1)

Implementation

45

(2)

Marginal Likelihood

47

(1)

Multi-class Laplace Approximation

48

(4)

Implementation

51

(1)

Expectation Propagation

52

(8)

Predictions

56

(1)

Marginal Likelihood

57

(1)

Implementation

57

(3)

Experiments

60

(12)

A Toy Problem

60

(2)

One-dimensional Example

62

(1)

Binary Handwritten Digit Classification Example

63

(7)

10-class Handwritten Digit Classification Example

70

(2)

Discussion

72

(2)

Appendix: Moment Derivations

74

(1)

Exercises

75

(4)

Covariance Functions

79

(26)

Preliminaries

79

(2)

Mean Square Continuity and Differentiability

81

(1)

Examples of Covariance Functions

81

(15)

Stationary Covariance Functions

82

(7)

Dot Product Covariance Functions

89

(1)

Other Non-stationary Covariance Functions

90

(4)

Making New Kernels from Old

94

(2)

Eigenfunction Analysis of Kernels

96

(3)

An Analytic Example

97

(1)

Numerical Approximation of Eigenfunctions

98

(1)

Kernels for Non-vectorial Inputs

99

(3)

String Kernels

100

(1)

Fisher Kernels

101

(1)

Exercises

102

(3)

Model Selection and Adaptation of Hyperparameters

105

(24)

The Model Selection Problem

106

(2)

Bayesian Model Selection

108

(3)

Cross-validation

111

(1)

Model Selection for GP Regression

112

(12)

Marginal Likelihood

112

(4)

Cross-validation

116

(2)

Examples and Discussion

118

(6)

Model Selection for GP Classification

124

(4)

Derivatives of the Marginal Likelihood for Laplace's Approximation

125

(2)

Derivatives of the Marginal Likelihood for EP

127

(1)

Cross-validation

127

(1)

Example

128

(1)

Exercises

128

(1)

Relationships between GPs and Other Models

129

(22)

Reproducing Kernel Hilbert Spaces

129

(3)

Regularization

132

(4)

Regularization Defined by Differential Operators

133

(2)

Obtaining the Regularized Solution

135

(1)

The Relationship of the Regularization View to Gaussian Process Prediction

135

(1)

Spline Models

136

(5)

A 1-d Gaussian Process Spline Construction

138

(3)

Support Vector Machines

141

(5)

Support Vector Classification

141

(4)

Support Vector Regression

145

(1)

Least-squares Classification

146

(3)

Probabilistic Least-squares Classification

147

(2)

Relevance Vector Machines

149

(1)

Exercises

150

(1)

Theoretical Perspectives

151

(20)

The Equivalent Kernel

151

(4)

Some Specific Examples of Equivalent Kernels

153

(2)

Asymptotic Analysis

155

(4)

Consistency

155

(2)

Equivalence and Orthogonality

157

(2)

Average-case Learning Curves

159

(2)

PAC-Bayesian Analysis

161

(4)

The PAC Framework

162

(1)

PAC-Bayesian Analysis

163

(1)

PAC-Bayesian Analysis of GP Classification

164

(1)

Comparison with Other Supervised Learning Methods

165

(3)

Appendix: Learning Curve for the Ornstein-Uhlenbeck Process

168

(1)

Exercises

169

(2)

Approximation Methods for Large Datasets

171

(18)

Reduced-rank Approximations of the Gram Matrix

171

(3)

Greedy Approximation

174

(1)

Approximations for GPR with Fixed Hyperparameters

175

(10)

Subset of Regressors

175

(2)

The Nystrom Method

177

(1)

Subset of Datapoints

177

(1)

Projected Process Approximation

178

(2)

Bayesian Committee Machine

180

(1)

Iterative Solution of Linear Systems

181

(1)

Comparison of Approximate GPR Methods

182

(3)

Approximations for GPC with Fixed Hyperparameters

185

(1)

Approximating the Marginal Likelihood and its Derivatives

185

(2)

Appendix: Equivalence of SR and GPR Using the Nystrom Approximate Kernel

187

(1)

Exercises

187

(2)

Further Issues and Conclusions

189

(10)

Multiple Outputs

190

(1)

Noise Models with Dependencies

190

(1)

Non-Gaussian Likelihoods

191

(1)

Derivative Observations

191

(1)

Prediction with Uncertain Inputs

192

(1)

Mixtures of Gaussian Processes

192

(1)

Global Optimization

193

(1)

Evaluation of Integrals

193

(1)

Student's t Process

194

(1)

Invariances

194

(2)

Latent Variable Models

196

(1)

Conclusions and Future Directions

196

(3)

Appendix A Mathematical Background

199

(8)

Joint, Marginal and Conditional Probability

199

(1)

Gaussian Identities

200

(1)

Matrix Identities

201

(1)

Matrix Derivatives

202

(1)

Matrix Norms

202

(1)

Cholesky Decomposition

202

(1)

Entropy and Kullback-Leibler Divergence

203

(1)

Limits

204

(1)

Measure and Integration

204

(1)

Lp Spaces

205

(1)

Fourier Transforms

205

(1)

Convexity

206

(1)

Appendix B Gaussian Markov Processes

207

(14)

Fourier Analysis

208

(3)

Sampling and Periodization

209

(2)

Continuous-time Gaussian Markov Processes

211

(3)

Continuous-time GMPs on R

211

(2)

The Solution of the Corresponding SDE on the Circle

213

(1)

Discrete-time Gaussian Markov Processes

214

(3)

Discrete-time GMPs on Z

214

(1)

The Solution of the Corresponding Difference Equation on PN

215

(2)

The Relationship Between Discrete-time and Sampled Continuous-time GMPs

217

(1)

Markov Processes in Higher Dimensions

218

(3)

Appendix C Datasets and Code

221

(2)

Bibliography

223

(16)

Author Index

239

(6)

Subject Index

245

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.