Preface | p. v |
Variability, Information, and Prediction | p. 1 |
The Curse of Dimensionality | p. 3 |
The Two Extremes | p. 4 |
Perspectives on the Curse | p. 5 |
Sparsity | p. 6 |
Exploding Numbers of Models | p. 8 |
Multicollinearity and Concurvity | p. 9 |
The Effect of Noise | p. 10 |
Coping with the Curse | p. 11 |
Selecting Design Points | p. 11 |
Local Dimension | p. 12 |
Parsimony | p. 17 |
Two Techniques | p. 18 |
The Bootstrap | p. 18 |
Cross-Validation | p. 27 |
Optimization and Search | p. 32 |
Univariate Search | p. 32 |
Multivariate Search | p. 33 |
General Searches | p. 34 |
Constraint Satisfaction and Combinatorial Search | p. 35 |
Notes | p. 38 |
Hammersley Points | p. 38 |
Edgeworth Expansions for the Mean | p. 39 |
Bootstrap Asymptotics for the Studentized Mean | p. 41 |
Exercises | p. 43 |
Local Smoothers | p. 53 |
Early Smoothers | p. 55 |
Transition to Classical Smoothers | p. 59 |
Global Versus Local Approximations | p. 60 |
LOESS | p. 64 |
Kernel Smoothers | p. 67 |
Statistical Function Approximation | p. 68 |
The Concept of Kernel Methods and the Discrete Case | p. 73 |
Kernels and Stochastic Designs: Density Estimation | p. 78 |
Stochastic Designs: Asymptotics for Kernel Smoothers | p. 81 |
Convergence Theorems and Rates for Kernel Smoothers | p. 86 |
Kernel and Bandwidth Selection | p. 90 |
Linear Smoothers | p. 95 |
Nearest Neighbors | p. 96 |
Applications of Kernel Regression | p. 100 |
A Simulated Example | p. 100 |
Ethanol Data | p. 102 |
Exercises | p. 107 |
Spline Smoothing | p. 117 |
Interpolating Splines | p. 117 |
Natural Cubic Splines | p. 123 |
Smoothing Splines for Regression | p. 126 |
Model Selection for Spline Smoothing | p. 129 |
Spline Smoothing Meets Kernel Smoothing | p. 130 |
Asymptotic Bias, Variance, and MISE for Spline Smoothers | p. 131 |
Ethanol Data Example - Continued | p. 133 |
Splines Redux: Hilbert Space Formulation | p. 136 |
Reproducing Kernels | p. 138 |
Constructing an RKHS | p. 141 |
Direct Sum Construction for Splines | p. 146 |
Explicit Forms | p. 149 |
Nonparametrics in Data Mining and Machine Learning | p. 152 |
Simulated Comparisons | p. 154 |
What Happens with Dependent Noise Models? | p. 157 |
Higher Dimensions and the Curse of Dimensionality | p. 159 |
Notes | p. 163 |
Sobolev Spaces: Definition | p. 163 |
Exercises | p. 164 |
New Wave Nonparametrics | p. 171 |
Additive Models | p. 172 |
The Backfitting Algorithm | p. 173 |
Concurvity and Inference | p. 177 |
Nonparametric Optimality | p. 180 |
Generalized Additive Models | p. 181 |
Projection Pursuit Regression | p. 184 |
Neural Networks | p. 189 |
Backpropagation and Inference | p. 192 |
Barron's Result and the Curse | p. 197 |
Approximation Properties | p. 198 |
Barron's Theorem: Formal Statement | p. 200 |
Recursive Partitioning Regression | p. 202 |
Growing Trees | p. 204 |
Pruning and Selection | p. 207 |
Regression | p. 208 |
Bayesian Additive Regression Trees: BART | p. 210 |
MARS | p. 210 |
Sliced Inverse Regression | p. 215 |
ACE and AVAS | p. 218 |
Notes | p. 220 |
Proof of Barron's Theorem | p. 220 |
Exercises | p. 224 |
Supervised Learning: Partition Methods | p. 231 |
Multiclass Learning | p. 233 |
Discriminant Analysis | p. 235 |
Distance-Based Discriminant Analysis | p. 236 |
Bayes Rules | p. 241 |
Probability-Based Discriminant Analysis | p. 245 |
Tree-Based Classifiers | p. 249 |
Splitting Rules | p. 249 |
Logic Trees | p. 253 |
Random Forests | p. 254 |
Support Vector Machines | p. 262 |
Margins and Distances | p. 262 |
Binary Classification and Risk | p. 265 |
Prediction Bounds for Function Classes | p. 268 |
Constructing SVM Classifiers | p. 271 |
SVM Classification for Nonlinearly Separable Populations | p. 279 |
SVMs in the General Nonlinear Case | p. 282 |
Some Kernels Used in SVM Classification | p. 288 |
Kernel Choice, SVMs and Model Selection | p. 289 |
Support Vector Regression | p. 290 |
Multiclass Support Vector Machines | p. 293 |
Neural Networks | p. 294 |
Notes | p. 296 |
Hoeffding's Inequality | p. 296 |
VC Dimension | p. 297 |
Exercises | p. 300 |
Alternative Nonparametrics | p. 307 |
Ensemble Methods | p. 308 |
Bayes Model Averaging | p. 310 |
Bagging | p. 312 |
Stacking | p. 316 |
Boosting | p. 318 |
Other Averaging Methods | p. 326 |
Oracle Inequalities | p. 328 |
Bayes Nonparametrics | p. 334 |
Dirichlet Process Priors | p. 334 |
Polya Tree Priors | p. 336 |
Gaussian Process Priors | p. 338 |
The Relevance Vector Machine | p. 344 |
RVM Regression: Formal Description | p. 345 |
RVM Classification | p. 349 |
Hidden Markov Models - Sequential Classification | p. 352 |
Notes | p. 354 |
Proof of Yang's Oracle Inequality | p. 354 |
Proof of Lecue's Oracle Inequality | p. 357 |
Exercises | p. 359 |
Computational Comparisons | p. 365 |
Computational Results: Classification | p. 366 |
Comparison on Fisher's Iris Data | p. 366 |
Comparison on Ripley's Data | p. 369 |
Computational Results: Regression | p. 376 |
Vapnik's sinc Function | p. 377 |
Friedman's Function | p. 389 |
Conclusions | p. 392 |
Systematic Simulation Study | p. 397 |
No Free Lunch | p. 400 |
Exercises | p. 402 |
Unsupervised Learning: Clustering | p. 405 |
Centroid-Based Clustering | p. 408 |
K-Means Clustering | p. 409 |
Variants | p. 412 |
Hierarchical Clustering | p. 413 |
Agglomerative Hierarchical Clustering | p. 414 |
Divisive Hierarchical Clustering | p. 422 |
Theory for Hierarchical Clustering | p. 426 |
Partitional Clustering | p. 430 |
Model-Based Clustering | p. 432 |
Graph-Theoretic Clustering | p. 447 |
Spectral Clustering | p. 452 |
Bayesian Clustering | p. 458 |
Probabilistic Clustering | p. 458 |
Hypothesis Testing | p. 461 |
Computed Examples | p. 463 |
Ripley's Data | p. 465 |
Iris Data | p. 475 |
Cluster Validation | p. 480 |
Notes | p. 484 |
Derivatives of Functions of a Matrix | p. 484 |
Kruskal's Algorithm: Proof | p. 484 |
Prim's Algorithm: Proof | p. 485 |
Exercises | p. 485 |
Learning in High Dimensions | p. 493 |
Principal Components | p. 495 |
Main Theorem | p. 496 |
Key Properties | p. 498 |
Extensions | p. 500 |
Factor Analysis | p. 502 |
Finding ¿ and ¿ | p. 504 |
Finding K | p. 506 |
Estimating Factor Scores | p. 507 |
Projection Pursuit | p. 508 |
Independent Components Analysis | p. 511 |
Main Definitions | p. 511 |
Key Results | p. 513 |
Computational Approach | p. 515 |
Nonlinear PCs and ICA | p. 516 |
Nonlinear PCs | p. 517 |
Nonlinear ICA | p. 518 |
Geometric Summarization | p. 518 |
Measuring Distances to an Algebraic Shape | p. 519 |
Principal Curves and Surfaces | p. 520 |
Supervised Dimension Reduction: Partial Least Squares | p. 523 |
Simple PLS | p. 523 |
PLS Procedures | p. 524 |
Properties of PLS | p. 526 |
Supervised Dimension Reduction: Sufficient Dimensions in Regression | p. 527 |
Visualization I: Basic Plots | p. 531 |
Elementary Visualization | p. 534 |
Projections | p. 541 |
Time Dependence | p. 543 |
Visualization II: Transformations | p. 546 |
Chernoff Faces | p. 546 |
Multidimensional Scaling | p. 547 |
Self-Organizing Maps | p. 553 |
Exercises | p. 560 |
Variable Selection | p. 569 |
Concepts from Linear Regression | p. 570 |
Subset Selection | p. 572 |
Variable Ranking | p. 575 |
Overview | p. 577 |
Traditional Criteria | p. 578 |
Akaike Information Criterion (AIC) | p. 580 |
Bayesian Information Criterion (BIC) | p. 583 |
Choices of Information Criteria | p. 585 |
Cross Validation | p. 587 |
Shrinkage Methods | p. 599 |
Shrinkage Methods for Linear Models | p. 601 |
Grouping in Variable Selection | p. 615 |
Least Angle Regression | p. 617 |
Shrinkage Methods for Model Classes | p. 620 |
Cautionary Notes | p. 631 |
Bayes Variable Selection | p. 632 |
Prior Specification | p. 635 |
Posterior Calculation and Exploration | p. 643 |
Evaluating Evidence | p. 647 |
Connections Between Bayesian and Frequentist Methods | p. 650 |
Computational Comparisons | p. 653 |
The n>p Case | p. 653 |
When p>n | p. 665 |
Notes | p. 667 |
Code for Generating Data in Section 10.5 | p. 667 |
Exercises | p. 671 |
Multiple Testing | p. 679 |
Analyzing the Hypothesis Testing Problem | p. 681 |
A Paradigmatic Setting | p. 681 |
Counts for Multiple Tests | p. 684 |
Measures of Error in Multiple Testing | p. 685 |
Aspects of Error Control | p. 687 |
Controlling the Familywise Error Rate | p. 690 |
One-Step Adjustments | p. 690 |
Stepwise p-Value Adjustments | p. 693 |
PCER and PFER | p. 695 |
Null Domination | p. 696 |
Two Procedures | p. 697 |
Controlling the Type I Error Rate | p. 702 |
Adjusted p-Values for PFER/PCER | p. 706 |
Controlling the False Discovery Rate | p. 707 |
FDR and other Measures of Error | p. 709 |
The Benjamini-Hochberg Procedure | p. 710 |
A BH Theorem for a Dependent Setting | p. 711 |
Variations on BH | p. 713 |
Controlling the Positive False Discovery Rate | p. 719 |
Bayesian Interpretations | p. 719 |
Aspects of Implementation | p. 723 |
Bayesian Multiple Testing | p. 727 |
Fully Bayes: Hierarchical | p. 728 |
Fully Bayes: Decision theory | p. 731 |
Notes | p. 736 |
Proof of the Benjamini-Hochberg Theorem | p. 736 |
Proof of the Benjamini-Yekutieli Theorem | p. 739 |
References | p. 743 |
Index | p. 773 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.