What is included with this book?
Preface | p. xiii |
Introduction: Classification, Learning, Features and Applications | p. 1 |
Scope | p. 1 |
Why Machine Learning? | p. 2 |
Some Applications | p. 3 |
Image Recognition | p. 3 |
Speech Recognition | p. 3 |
Medical Diagnosis | p. 4 |
Statistical Arbitrage | p. 4 |
Measurements, Features, and Feature Vectors | p. 4 |
The Need for Probability | p. 5 |
Supervised Learning | p. 5 |
Summary | p. 6 |
Appendix: Induction | p. 6 |
Questions | p. 7 |
References | p. 8 |
Probability | p. 10 |
Probability of Some Basic Events | p. 10 |
Probabilities of Compound Events | p. 12 |
Conditional Probability | p. 13 |
Drawing Without Replacement | p. 14 |
A Classic Birthday Problem | p. 15 |
Random Variables | p. 15 |
Expected Value | p. 16 |
Variance | p. 17 |
Summary | p. 19 |
Appendix: Interpretations of Probability | p. 19 |
Questions | p. 20 |
References | p. 21 |
Probability Densities | p. 23 |
An Example in Two Dimensions | p. 23 |
Random Numbers in [0,1] | p. 23 |
Density Functions | p. 24 |
Probability Densities in Higher Dimensions | p. 27 |
Joint and Conditional Densities | p. 28 |
Expected Value and Variance | p. 28 |
Laws of Large Numbers | p. 29 |
Summary | p. 30 |
Appendix: Measurability | p. 30 |
Question | p. 32 |
References | p. 32 |
The Pattern Recognition Problem | p. 34 |
A Simple Example | p. 34 |
Decision Rules | p. 35 |
Success Criterion | p. 37 |
The Best Classifier: Bayes Decision Rule | p. 37 |
Continuous Features and Densities | p. 38 |
Summary | p. 39 |
Appendix: Uncountably Many | p. 39 |
Questions | p. 40 |
References | p. 41 |
The Optimal Bayes Decision Rule | p. 43 |
Bayes Theorem | p. 43 |
Bayes Decision Rule | p. 44 |
Optimality and Some Comments | p. 45 |
An Example | p. 47 |
Bayes Theorem and Decision Rule with Densities | p. 48 |
Summary | p. 49 |
Appendix: Defining Conditional Probability | p. 50 |
Questions | p. 50 |
References | p. 53 |
Learning from Examples | p. 55 |
Lack of Knowledge of Distributions | p. 55 |
Training Data | p. 56 |
Assumptions on the Training Data | p. 57 |
A Brute Force Approach to Learning | p. 59 |
Curse of Dimensionality, Inductive Bias, and No Free Lunch | p. 60 |
Summary | p. 61 |
Appendix: What Sort of Learning? | p. 62 |
Questions | p. 63 |
References | p. 64 |
The Nearest Neighbor Rule | p. 65 |
The Nearest Neighbor Rule | p. 65 |
Performance of the Nearest Neighbor Rule | p. 66 |
Intuition and Proof Sketch of Performance | p. 67 |
Using more Neighbors | p. 69 |
Summary | p. 70 |
Appendix: When People use Nearest Neighbor Reasoning | p. 70 |
Who Is a Bachelor? | p. 70 |
Legal Reasoning | p. 71 |
Moral Reasoning | p. 71 |
Questions | p. 72 |
References | p. 73 |
Kernel Rules | p. 74 |
Motivation | p. 74 |
A Variation on Nearest Neighbor Rules | p. 75 |
Kernel Rules | p. 76 |
Universal Consistency of Kernel Rules | p. 79 |
Potential Functions | p. 80 |
More General Kernels | p. 81 |
Summary | p. 82 |
Appendix: Kernels, Similarity, and Features | p. 82 |
Questions | p. 83 |
References | p. 84 |
Neural Networks: Perceptrons | p. 86 |
Multilayer Feedforward Networks | p. 86 |
Neural Networks for Learning and Classification | p. 87 |
Perceptrons | p. 89 |
Threshold | p. 90 |
Learning Rule for Perceptrons | p. 90 |
Representional Capabilities of Perceptrons | p. 92 |
Summary | p. 94 |
Appendix: Models of Mind | p. 95 |
Questions | p. 96 |
References | p. 97 |
Multilayer Networks | p. 99 |
Representation Capabilities of Multilayer Networks | p. 99 |
Learning and Sigmoidal Outputs | p. 101 |
Training Error and Weight Space | p. 104 |
Error Minimization by Gradient Descent | p. 105 |
Backpropagation | p. 106 |
Derivation of Backpropagation Equations | p. 109 |
Derivation for a Single Unit | p. 110 |
Derivation for a Network | p. 111 |
Summary | p. 113 |
Appendix: Gradient Descent and Reasoning toward Reflective Equilibrium | p. 113 |
Questions | p. 114 |
References | p. 115 |
PAC Learning | p. 116 |
Class of Decision Rules | p. 117 |
Best Rule from a Class | p. 118 |
Probably Approximately Correct Criterion | p. 119 |
PAC Learning | p. 120 |
Summary | p. 122 |
Appendix: Identifying Indiscernibles | p. 122 |
Questions | p. 123 |
References | p. 123 |
VC Dimension | p. 125 |
Approximation and Estimation Errors | p. 125 |
Shattering | p. 126 |
VC Dimension | p. 127 |
Learning Result | p. 128 |
Some Examples | p. 129 |
Application to Neural Nets | p. 132 |
Summary | p. 133 |
Appendix: VC Dimension and Popper Dimension | p. 133 |
Questions | p. 134 |
References | p. 135 |
Infinite VC Dimension | p. 137 |
A Hierarchy of Classes and Modified PAC Criterion | p. 138 |
Misfit Versus Complexity Trade-Off | p. 138 |
Learning Results | p. 139 |
Inductive Bias and Simplicity | p. 140 |
Summary | p. 141 |
Appendix: Uniform Convergence and Universal Consistency | p. 141 |
Questions | p. 142 |
References | p. 143 |
The Function Estimation Problem | p. 144 |
Estimation | p. 144 |
Success Criterion | p. 145 |
Best Estimator: Regression Function | p. 146 |
Summary | p. 147 |
Appendix: Regression Toward the Mean | p. 147 |
Questions | p. 148 |
References | p. 149 |
Learning Function Estimation | p. 150 |
Review of the Function Estimation/Regression Problem | p. 150 |
Nearest Neighbor Rules | p. 151 |
Kernel Methods | p. 151 |
Neural Network Learning | p. 152 |
Estimation with a Fixed Class of Functions | p. 153 |
Shattering, Pseudo-Dimension, and Learning | p. 154 |
Conclusion | p. 156 |
Appendix: Accuracy, Precision, Bias, and Variance in Estimation | p. 156 |
Questions | p. 157 |
References | p. 158 |
Simplicity | p. 160 |
Simplicity in Science | p. 160 |
Explicit Appeals to Simplicity | p. 160 |
Is the World Simple? | p. 161 |
Mistaken Appeals to Simplicity | p. 161 |
Implicit Appeals to Simplicity | p. 161 |
Ordering Hypotheses | p. 162 |
Two Kinds of Simplicity Orderings | p. 162 |
Two Examples | p. 163 |
Curve Fitting | p. 163 |
Enumerative Induction | p. 164 |
Simplicity as Simplicity of Representation | p. 165 |
Fix on a Particular System of Representation? | p. 166 |
Are Fewer Parameters Simpler? | p. 167 |
Pragmatic Theory of Simplicity | p. 167 |
Simplicity and Global Indeterminacy | p. 168 |
Summary | p. 169 |
Appendix: Basic Science and Statistical Learning Theory | p. 169 |
Questions | p. 170 |
p. 170 | |
Support Vector Machines | p. 172 |
Mapping the Feature Vectors | p. 173 |
Maximizing the Margin | p. 175 |
Optimization and Support Vectors | p. 177 |
Implementation and Connection to Kernel Methods | p. 179 |
Details of the Optimization Problem | p. 180 |
Rewriting Separation Conditions | p. 180 |
Equation for Margin | p. 181 |
Slack Variables for Nonseparable Examples | p. 181 |
Reformulation and Solution of Optimization | p. 182 |
Summary | p. 183 |
Appendix: Computation | p. 184 |
Questions | p. 185 |
References | p. 186 |
Boosting | p. 187 |
Weak Learning Rules | p. 187 |
Combining Classifiers | p. 188 |
Distribution on the Training Examples | p. 189 |
The Adaboost Algorithm | p. 190 |
Performance on Training Data | p. 191 |
Generalization Performance | p. 192 |
Summary | p. 194 |
Appendix: Ensemble Methods | p. 194 |
Questions | p. 195 |
References | p. 196 |
Bibliography | p. 197 |
Author Index | p. 203 |
Subject Index | p. 207 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.