did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780470641835

An Elementary Introduction to Statistical Learning Theory

by ;
  • ISBN13:

    9780470641835

  • ISBN10:

    0470641835

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 2011-08-02
  • Publisher: Wiley
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $139.68 Save up to $0.70
  • Buy New
    $138.98
    Add to Cart Free Shipping Icon Free Shipping

    PRINT ON DEMAND: 2-4 WEEKS. THIS ITEM CANNOT BE CANCELLED OR RETURNED.

Supplemental Materials

What is included with this book?

Summary

A joint endeavor from leading researchers in the fields of philosophy and electrical engineering, An Introduction to Statistical Learning Theory provides a broad and accessible introduction to rapidly evolving field of statistical pattern recognition and statistical learning theory. Exploring topics that are not often covered in introductory level books on statistical learning theory, including PAC learning, VC dimension, and simplicity, the authors present upper-undergraduate and graduate levels with the basic theory behind contemporary machine learning and uniquely suggest it serves as an excellent framework for philosophical thinking about inductive inference.

Author Biography

Sanjeev Kulkarni, PhD, is Professor in the Department of Electrical Engineering at Princeton University, where he is also an affiliated faculty member in the Department of Operations Research and Financial Engineering and the Department of Philosophy. Dr. Kulkarni has published widely on statistical pattern recognition, nonparametric estimation, machine learning, information theory, and other areas. A Fellow of the IEEE, he was awarded Princeton University's President's Award for Distinguished Teaching in 2007. Gilbert Harman, Phd, is James S. McDonnell Distinguished University Professor in the Department of Philosophy at Princeton University. A Fellow of the Cognitive Science Society, he is the author of more than fifty published articles in his areas of research interest, which include ethics, statistical learning theory, psychology of reasoning, and logic.

Table of Contents

Prefacep. xiii
Introduction: Classification, Learning, Features and Applicationsp. 1
Scopep. 1
Why Machine Learning?p. 2
Some Applicationsp. 3
Image Recognitionp. 3
Speech Recognitionp. 3
Medical Diagnosisp. 4
Statistical Arbitragep. 4
Measurements, Features, and Feature Vectorsp. 4
The Need for Probabilityp. 5
Supervised Learningp. 5
Summaryp. 6
Appendix: Inductionp. 6
Questionsp. 7
Referencesp. 8
Probabilityp. 10
Probability of Some Basic Eventsp. 10
Probabilities of Compound Eventsp. 12
Conditional Probabilityp. 13
Drawing Without Replacementp. 14
A Classic Birthday Problemp. 15
Random Variablesp. 15
Expected Valuep. 16
Variancep. 17
Summaryp. 19
Appendix: Interpretations of Probabilityp. 19
Questionsp. 20
Referencesp. 21
Probability Densitiesp. 23
An Example in Two Dimensionsp. 23
Random Numbers in [0,1]p. 23
Density Functionsp. 24
Probability Densities in Higher Dimensionsp. 27
Joint and Conditional Densitiesp. 28
Expected Value and Variancep. 28
Laws of Large Numbersp. 29
Summaryp. 30
Appendix: Measurabilityp. 30
Questionp. 32
Referencesp. 32
The Pattern Recognition Problemp. 34
A Simple Examplep. 34
Decision Rulesp. 35
Success Criterionp. 37
The Best Classifier: Bayes Decision Rulep. 37
Continuous Features and Densitiesp. 38
Summaryp. 39
Appendix: Uncountably Manyp. 39
Questionsp. 40
Referencesp. 41
The Optimal Bayes Decision Rulep. 43
Bayes Theoremp. 43
Bayes Decision Rulep. 44
Optimality and Some Commentsp. 45
An Examplep. 47
Bayes Theorem and Decision Rule with Densitiesp. 48
Summaryp. 49
Appendix: Defining Conditional Probabilityp. 50
Questionsp. 50
Referencesp. 53
Learning from Examplesp. 55
Lack of Knowledge of Distributionsp. 55
Training Datap. 56
Assumptions on the Training Datap. 57
A Brute Force Approach to Learningp. 59
Curse of Dimensionality, Inductive Bias, and No Free Lunchp. 60
Summaryp. 61
Appendix: What Sort of Learning?p. 62
Questionsp. 63
Referencesp. 64
The Nearest Neighbor Rulep. 65
The Nearest Neighbor Rulep. 65
Performance of the Nearest Neighbor Rulep. 66
Intuition and Proof Sketch of Performancep. 67
Using more Neighborsp. 69
Summaryp. 70
Appendix: When People use Nearest Neighbor Reasoningp. 70
Who Is a Bachelor?p. 70
Legal Reasoningp. 71
Moral Reasoningp. 71
Questionsp. 72
Referencesp. 73
Kernel Rulesp. 74
Motivationp. 74
A Variation on Nearest Neighbor Rulesp. 75
Kernel Rulesp. 76
Universal Consistency of Kernel Rulesp. 79
Potential Functionsp. 80
More General Kernelsp. 81
Summaryp. 82
Appendix: Kernels, Similarity, and Featuresp. 82
Questionsp. 83
Referencesp. 84
Neural Networks: Perceptronsp. 86
Multilayer Feedforward Networksp. 86
Neural Networks for Learning and Classificationp. 87
Perceptronsp. 89
Thresholdp. 90
Learning Rule for Perceptronsp. 90
Representional Capabilities of Perceptronsp. 92
Summaryp. 94
Appendix: Models of Mindp. 95
Questionsp. 96
Referencesp. 97
Multilayer Networksp. 99
Representation Capabilities of Multilayer Networksp. 99
Learning and Sigmoidal Outputsp. 101
Training Error and Weight Spacep. 104
Error Minimization by Gradient Descentp. 105
Backpropagationp. 106
Derivation of Backpropagation Equationsp. 109
Derivation for a Single Unitp. 110
Derivation for a Networkp. 111
Summaryp. 113
Appendix: Gradient Descent and Reasoning toward Reflective Equilibriump. 113
Questionsp. 114
Referencesp. 115
PAC Learningp. 116
Class of Decision Rulesp. 117
Best Rule from a Classp. 118
Probably Approximately Correct Criterionp. 119
PAC Learningp. 120
Summaryp. 122
Appendix: Identifying Indiscerniblesp. 122
Questionsp. 123
Referencesp. 123
VC Dimensionp. 125
Approximation and Estimation Errorsp. 125
Shatteringp. 126
VC Dimensionp. 127
Learning Resultp. 128
Some Examplesp. 129
Application to Neural Netsp. 132
Summaryp. 133
Appendix: VC Dimension and Popper Dimensionp. 133
Questionsp. 134
Referencesp. 135
Infinite VC Dimensionp. 137
A Hierarchy of Classes and Modified PAC Criterionp. 138
Misfit Versus Complexity Trade-Offp. 138
Learning Resultsp. 139
Inductive Bias and Simplicityp. 140
Summaryp. 141
Appendix: Uniform Convergence and Universal Consistencyp. 141
Questionsp. 142
Referencesp. 143
The Function Estimation Problemp. 144
Estimationp. 144
Success Criterionp. 145
Best Estimator: Regression Functionp. 146
Summaryp. 147
Appendix: Regression Toward the Meanp. 147
Questionsp. 148
Referencesp. 149
Learning Function Estimationp. 150
Review of the Function Estimation/Regression Problemp. 150
Nearest Neighbor Rulesp. 151
Kernel Methodsp. 151
Neural Network Learningp. 152
Estimation with a Fixed Class of Functionsp. 153
Shattering, Pseudo-Dimension, and Learningp. 154
Conclusionp. 156
Appendix: Accuracy, Precision, Bias, and Variance in Estimationp. 156
Questionsp. 157
Referencesp. 158
Simplicityp. 160
Simplicity in Sciencep. 160
Explicit Appeals to Simplicityp. 160
Is the World Simple?p. 161
Mistaken Appeals to Simplicityp. 161
Implicit Appeals to Simplicityp. 161
Ordering Hypothesesp. 162
Two Kinds of Simplicity Orderingsp. 162
Two Examplesp. 163
Curve Fittingp. 163
Enumerative Inductionp. 164
Simplicity as Simplicity of Representationp. 165
Fix on a Particular System of Representation?p. 166
Are Fewer Parameters Simpler?p. 167
Pragmatic Theory of Simplicityp. 167
Simplicity and Global Indeterminacyp. 168
Summaryp. 169
Appendix: Basic Science and Statistical Learning Theoryp. 169
Questionsp. 170
p. 170
Support Vector Machinesp. 172
Mapping the Feature Vectorsp. 173
Maximizing the Marginp. 175
Optimization and Support Vectorsp. 177
Implementation and Connection to Kernel Methodsp. 179
Details of the Optimization Problemp. 180
Rewriting Separation Conditionsp. 180
Equation for Marginp. 181
Slack Variables for Nonseparable Examplesp. 181
Reformulation and Solution of Optimizationp. 182
Summaryp. 183
Appendix: Computationp. 184
Questionsp. 185
Referencesp. 186
Boostingp. 187
Weak Learning Rulesp. 187
Combining Classifiersp. 188
Distribution on the Training Examplesp. 189
The Adaboost Algorithmp. 190
Performance on Training Datap. 191
Generalization Performancep. 192
Summaryp. 194
Appendix: Ensemble Methodsp. 194
Questionsp. 195
Referencesp. 196
Bibliographyp. 197
Author Indexp. 203
Subject Indexp. 207
Table of Contents provided by Ingram. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program