Principles and Theory for Data... | Buy

This book is a thorough introduction to the most important topics in data mining and machine learning. It begins with a detailed review of classical function estimation and proceeds with chapters on nonlinear regression, classification, and ensemble methods. The final chapters focus on clustering, dimension reduction, variable selection, and multiple comparisons. All these topics have undergone extraordinarily rapid development in recent years and this treatment offers a modern perspective emphasizing the most recent contributions. The presentation of foundational results is detailed and includes many accessible proofs not readily available outside original sources. While the orientation is conceptual and theoretical, the main points are regularly reinforced by computational comparisons.Intended primarily as a graduate level textbook for statistics, computer science, and electrical engineering students, this book assumes only a strong foundation in undergraduate statistics and mathematics, and facility with using R packages. The text has a wide variety of problems, many of an exploratory nature. There are numerous computed examples, complete with code, so that further computations can be carried out readily. The book also serves as a handbook for researchers who want a conceptual overview of the central topics in data mining and machine learning.

Bertrand Clarke is a Professor of Statistics in the Department of Medicine, Department of Epidemiology and Public Health, and the Center for Computational Sciences at the University of Miami. He has been on the Editorial Board of the Journal of the American Statistical Association, the Journal of Statistical Planning and Inference, and Statistical Papers. He is co-winner, with Andrew Barron, of the 1990 Browder J. Thompson Prize from the Institute of Electrical and Electronic Engineers. Ernest Fokou is an Assistant Professor of Statistics at Kettering University. He has also taught at Ohio State University and been a long term visitor at the Statistical and Mathematical Sciences Institute where he was a Post-doctoral Research Fellow in the Data Mining and Machine Learning Program. In 2000, he was the winner of the Young Researcher Award from the International Association for Statistical Computing. Hao Helen Zhang is an Associate Professor of Statistics in the Department of Statistics at North Carolina State University. For 2003-2004, she was a Research Fellow at SAMSI and in 2007, she won a Faculty Early Career Development Award from the National Science Foundation. She is on the Editorial Board of the Journal of the American Statistical Association and Biometrics.

Preface	p. v
Variability, Information, and Prediction	p. 1
The Curse of Dimensionality	p. 3
The Two Extremes	p. 4
Perspectives on the Curse	p. 5
Sparsity	p. 6
Exploding Numbers of Models	p. 8
Multicollinearity and Concurvity	p. 9
The Effect of Noise	p. 10
Coping with the Curse	p. 11
Selecting Design Points	p. 11
Local Dimension	p. 12
Parsimony	p. 17
Two Techniques	p. 18
The Bootstrap	p. 18
Cross-Validation	p. 27
Optimization and Search	p. 32
Univariate Search	p. 32
Multivariate Search	p. 33
General Searches	p. 34
Constraint Satisfaction and Combinatorial Search	p. 35
Notes	p. 38
Hammersley Points	p. 38
Edgeworth Expansions for the Mean	p. 39
Bootstrap Asymptotics for the Studentized Mean	p. 41
Exercises	p. 43
Local Smoothers	p. 53
Early Smoothers	p. 55
Transition to Classical Smoothers	p. 59
Global Versus Local Approximations	p. 60
LOESS	p. 64
Kernel Smoothers	p. 67
Statistical Function Approximation	p. 68
The Concept of Kernel Methods and the Discrete Case	p. 73
Kernels and Stochastic Designs: Density Estimation	p. 78
Stochastic Designs: Asymptotics for Kernel Smoothers	p. 81
Convergence Theorems and Rates for Kernel Smoothers	p. 86
Kernel and Bandwidth Selection	p. 90
Linear Smoothers	p. 95
Nearest Neighbors	p. 96
Applications of Kernel Regression	p. 100
A Simulated Example	p. 100
Ethanol Data	p. 102
Exercises	p. 107
Spline Smoothing	p. 117
Interpolating Splines	p. 117
Natural Cubic Splines	p. 123
Smoothing Splines for Regression	p. 126
Model Selection for Spline Smoothing	p. 129
Spline Smoothing Meets Kernel Smoothing	p. 130
Asymptotic Bias, Variance, and MISE for Spline Smoothers	p. 131
Ethanol Data Example - Continued	p. 133
Splines Redux: Hilbert Space Formulation	p. 136
Reproducing Kernels	p. 138
Constructing an RKHS	p. 141
Direct Sum Construction for Splines	p. 146
Explicit Forms	p. 149
Nonparametrics in Data Mining and Machine Learning	p. 152
Simulated Comparisons	p. 154
What Happens with Dependent Noise Models?	p. 157
Higher Dimensions and the Curse of Dimensionality	p. 159
Notes	p. 163
Sobolev Spaces: Definition	p. 163
Exercises	p. 164
New Wave Nonparametrics	p. 171
Additive Models	p. 172
The Backfitting Algorithm	p. 173
Concurvity and Inference	p. 177
Nonparametric Optimality	p. 180
Generalized Additive Models	p. 181
Projection Pursuit Regression	p. 184
Neural Networks	p. 189
Backpropagation and Inference	p. 192
Barron's Result and the Curse	p. 197
Approximation Properties	p. 198
Barron's Theorem: Formal Statement	p. 200
Recursive Partitioning Regression	p. 202
Growing Trees	p. 204
Pruning and Selection	p. 207
Regression	p. 208
Bayesian Additive Regression Trees: BART	p. 210
MARS	p. 210
Sliced Inverse Regression	p. 215
ACE and AVAS	p. 218
Notes	p. 220
Proof of Barron's Theorem	p. 220
Exercises	p. 224
Supervised Learning: Partition Methods	p. 231
Multiclass Learning	p. 233
Discriminant Analysis	p. 235
Distance-Based Discriminant Analysis	p. 236
Bayes Rules	p. 241
Probability-Based Discriminant Analysis	p. 245
Tree-Based Classifiers	p. 249
Splitting Rules	p. 249
Logic Trees	p. 253
Random Forests	p. 254
Support Vector Machines	p. 262
Margins and Distances	p. 262
Binary Classification and Risk	p. 265
Prediction Bounds for Function Classes	p. 268
Constructing SVM Classifiers	p. 271
SVM Classification for Nonlinearly Separable Populations	p. 279
SVMs in the General Nonlinear Case	p. 282
Some Kernels Used in SVM Classification	p. 288
Kernel Choice, SVMs and Model Selection	p. 289
Support Vector Regression	p. 290
Multiclass Support Vector Machines	p. 293
Neural Networks	p. 294
Notes	p. 296
Hoeffding's Inequality	p. 296
VC Dimension	p. 297
Exercises	p. 300
Alternative Nonparametrics	p. 307
Ensemble Methods	p. 308
Bayes Model Averaging	p. 310
Bagging	p. 312
Stacking	p. 316
Boosting	p. 318
Other Averaging Methods	p. 326
Oracle Inequalities	p. 328
Bayes Nonparametrics	p. 334
Dirichlet Process Priors	p. 334
Polya Tree Priors	p. 336
Gaussian Process Priors	p. 338
The Relevance Vector Machine	p. 344
RVM Regression: Formal Description	p. 345
RVM Classification	p. 349
Hidden Markov Models - Sequential Classification	p. 352
Notes	p. 354
Proof of Yang's Oracle Inequality	p. 354
Proof of Lecue's Oracle Inequality	p. 357
Exercises	p. 359
Computational Comparisons	p. 365
Computational Results: Classification	p. 366
Comparison on Fisher's Iris Data	p. 366
Comparison on Ripley's Data	p. 369
Computational Results: Regression	p. 376
Vapnik's sinc Function	p. 377
Friedman's Function	p. 389
Conclusions	p. 392
Systematic Simulation Study	p. 397
No Free Lunch	p. 400
Exercises	p. 402
Unsupervised Learning: Clustering	p. 405
Centroid-Based Clustering	p. 408
K-Means Clustering	p. 409
Variants	p. 412
Hierarchical Clustering	p. 413
Agglomerative Hierarchical Clustering	p. 414
Divisive Hierarchical Clustering	p. 422
Theory for Hierarchical Clustering	p. 426
Partitional Clustering	p. 430
Model-Based Clustering	p. 432
Graph-Theoretic Clustering	p. 447
Spectral Clustering	p. 452
Bayesian Clustering	p. 458
Probabilistic Clustering	p. 458
Hypothesis Testing	p. 461
Computed Examples	p. 463
Ripley's Data	p. 465
Iris Data	p. 475
Cluster Validation	p. 480
Notes	p. 484
Derivatives of Functions of a Matrix	p. 484
Kruskal's Algorithm: Proof	p. 484
Prim's Algorithm: Proof	p. 485
Exercises	p. 485
Learning in High Dimensions	p. 493
Principal Components	p. 495
Main Theorem	p. 496
Key Properties	p. 498
Extensions	p. 500
Factor Analysis	p. 502
Finding ¿ and ¿	p. 504
Finding K	p. 506
Estimating Factor Scores	p. 507
Projection Pursuit	p. 508
Independent Components Analysis	p. 511
Main Definitions	p. 511
Key Results	p. 513
Computational Approach	p. 515
Nonlinear PCs and ICA	p. 516
Nonlinear PCs	p. 517
Nonlinear ICA	p. 518
Geometric Summarization	p. 518
Measuring Distances to an Algebraic Shape	p. 519
Principal Curves and Surfaces	p. 520
Supervised Dimension Reduction: Partial Least Squares	p. 523
Simple PLS	p. 523
PLS Procedures	p. 524
Properties of PLS	p. 526
Supervised Dimension Reduction: Sufficient Dimensions in Regression	p. 527
Visualization I: Basic Plots	p. 531
Elementary Visualization	p. 534
Projections	p. 541
Time Dependence	p. 543
Visualization II: Transformations	p. 546
Chernoff Faces	p. 546
Multidimensional Scaling	p. 547
Self-Organizing Maps	p. 553
Exercises	p. 560
Variable Selection	p. 569
Concepts from Linear Regression	p. 570
Subset Selection	p. 572
Variable Ranking	p. 575
Overview	p. 577
Traditional Criteria	p. 578
Akaike Information Criterion (AIC)	p. 580
Bayesian Information Criterion (BIC)	p. 583
Choices of Information Criteria	p. 585
Cross Validation	p. 587
Shrinkage Methods	p. 599
Shrinkage Methods for Linear Models	p. 601
Grouping in Variable Selection	p. 615
Least Angle Regression	p. 617
Shrinkage Methods for Model Classes	p. 620
Cautionary Notes	p. 631
Bayes Variable Selection	p. 632
Prior Specification	p. 635
Posterior Calculation and Exploration	p. 643
Evaluating Evidence	p. 647
Connections Between Bayesian and Frequentist Methods	p. 650
Computational Comparisons	p. 653
The n>p Case	p. 653
When p>n	p. 665
Notes	p. 667
Code for Generating Data in Section 10.5	p. 667
Exercises	p. 671
Multiple Testing	p. 679
Analyzing the Hypothesis Testing Problem	p. 681
A Paradigmatic Setting	p. 681
Counts for Multiple Tests	p. 684
Measures of Error in Multiple Testing	p. 685
Aspects of Error Control	p. 687
Controlling the Familywise Error Rate	p. 690
One-Step Adjustments	p. 690
Stepwise p-Value Adjustments	p. 693
PCER and PFER	p. 695
Null Domination	p. 696
Two Procedures	p. 697
Controlling the Type I Error Rate	p. 702
Adjusted p-Values for PFER/PCER	p. 706
Controlling the False Discovery Rate	p. 707
FDR and other Measures of Error	p. 709
The Benjamini-Hochberg Procedure	p. 710
A BH Theorem for a Dependent Setting	p. 711
Variations on BH	p. 713
Controlling the Positive False Discovery Rate	p. 719
Bayesian Interpretations	p. 719
Aspects of Implementation	p. 723
Bayesian Multiple Testing	p. 727
Fully Bayes: Hierarchical	p. 728
Fully Bayes: Decision theory	p. 731
Notes	p. 736
Proof of the Benjamini-Hochberg Theorem	p. 736
Proof of the Benjamini-Yekutieli Theorem	p. 739
References	p. 743
Index	p. 773
Table of Contents provided by Ingram. All Rights Reserved.

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.