9780262017091

As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity. After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.

Foreword	p. xi
Preface	p. xiii
Introduction
Introduction and Problem Formulation	p. 3
Machine Learning under Covariate Shift	p. 3
Quick Tour of Covariate Shift Adaptation	p. 5
Problem Formulation	p. 7
Function Learning from Examples	p. 7
Loss Functions	p. 8
Generalization Error	p. 9
Covariate Shift	p. 9
Models for Function Learning	p. 10
Specification of Models	p. 13
Structure of This Book	p. 14
Part II: Learning under Covariate Shift	p. 14
Part III: Learning Causing Covariate Shift	p. 17
Learning Under Covariate Shift
Function Approximation	p. 21
Importance-Weighting Techniques for Covariate Shift Adaptation	p. 22
Importance-Weighted ERM	p. 22
Adaptive IWERM	p. 23
Regularized IWERM	p. 23
Examples of Importance-Weighted Regression Methods	p. 25
Squared Loss: Least-Squares Regression	p. 26
Absolute Loss: Least-Absolute Regression	p. 30
Huber Loss: Huber Regression	p. 31
Deadzone-Linear Loss: Support Vector Regression	p. 33
Examples of Importance-Weighted Classification Methods	p. 35
Squared Loss: Fisher Discriminant Analysis	p. 36
Logistic Loss: Logistic Regression Classifier	p. 38
Hinge Loss: Support Vector Machine	p. 39
Exponential Loss: Boosting	p. 40
Numerical Examples	p. 40
Regression	p. 40
Classification	p. 41
Summary and Discussion	p. 45
Model Selection	p. 47
Importance-Weighted Akaike Information Criterion	p. 47
Importance-Weighted Subspace Information Criterion	p. 50
Input Dependence vs. Input Independence in Generalization Error Analysis	p. 51
Approximately Correct Models	p. 53
Input-Dependent Analysis of Generalization Error	p. 54
Importance-Weighted Cross-Validation	p. 64
Numerical Examples	p. 66
Regression	p. 66
Classification	p. 69
Summary and Discussion	p. 70
Importance Estimation	p. 73
Kernel Density Estimation	p. 73
Kernel Mean Matching	p. 75
Logistic Regression	p. 76
Kullback-Leibler Importance Estimation Procedure	p. 78
Algorithm	p. 78
Model Selection by Cross-Validation	p. 81
Basis Function Design	p. 82
Least-Squares Importance Fitting	p. 83
Algorithm	p. 83
Basis Function Design and Model Selection	p. 84
Regularization Path Tracking	p. 85
Unconstrained Least-Squares Importance Fitting	p. 87
Algorithm	p. 87
Analytic Computation of Leave-One-Out Cross-Validation	p. 88
Numerical Examples	p. 88
Setting	p. 90
Importance Estimation by KLIEP	p. 90
Covariate Shift Adaptation by IWLS and IWCV	p. 92
Experimental Comparison	p. 94
Summary	p. 101
Direct Density-Ratio Estimation with Dimensionality Reduction	p. 103
Density Difference in Hetero-Distributional Subspace	p. 103
Characterization of Hetero-Distributional Subspace	p. 104
Identifying Hetero-Distributional Subspace	p. 106
Basic Idea	p. 106
Fisher Discriminant Analysis	p. 108
Local Fisher Discriminant Analysis	p. 109
Using LFDA for Finding Hetero-Distributional Subspace	p. 112
Density-Ratio Estimation in the Hetero-Distributional Subspace	p. 113
Numerical Examples	p. 113
Illustrative Example	p. 113
Performance Comparison Using Artificial Data Sets	p. 117
Summary	p. 121
Relation to Sample Selection Bias	p. 125
Heckman's Sample Selection Model	p. 125
Distributional Change and Sample Selection Bias	p. 129
The Two-Step Algorithm	p. 131
Relation to Covariate Shift Approach	p. 134
Applications of Covariate Shift Adaptation	p. 137
Brain-Computer Interface	p. 137
Background	p. 137
Experimental Setup	p. 138
Experimental Results	p. 140
Speaker Identification	p. 142
Background	p. 142
Formulation	p. 142
Experimental Results	p. 144
Natural Language Processing	p. 149
Formulation	p. 149
Experimental Results	p. 151
Perceived Age Prediction from Face Images	p. 152
Background	p. 152
Formulation	p. 153
Incorporating Characteristics of Human Age Perception	p. 153
Experimental Results	p. 155
Human Activity Recognition from Accelerometric Data	p. 157
Background	p. 157
Importance-Weighted Least-Squares Probabilistic Classifier	p. 157
Experimental Results.	p. 160
Sample Reuse in Reinforcement Learning	p. 165
Markov Decision Problems	p. 165
Policy Iteration	p. 166
Value Function Approximation	p. 167
Sample Reuse by Covariate Shift Adaptation	p. 168
On-Policy vs. Off-Policy	p. 169
Importance Weighting in Value Function Approximation	p. 170
Automatic Selection of the Flattening Parameter	p. 174
Sample Reuse Policy Iteration	p. 175
Robot Control Experiments	p. 176
Learning Causing Covariate Shift
Active Learning	p. 183
Preliminaries	p. 183
Setup	p. 183
Decomposition of Generalization Error	p. 185
Basic Strategy of Active Learning	p. 188
Population-Based Active Learning Methods	p. 188
Classical Method of Active Learning for Correct Models	p. 189
Limitations of Classical Approach and Countermeasures	p. 190
Input-Independent Variance-Only Method	p. 191
Input-Dependent Variance-Only Method	p. 193
Input-Independent Bias-and-Variance Approach	p. 195
Numerical Examples of Population-Based Active Learning Methods	p. 198
Setup	p. 198
Accuracy of Generalization Error Estimation	p. 200
Obtained Generalization Error	p. 202
Pool-Based Active Learning Methods	p. 204
Classical Active Learning Method for Correct Models and Its Limitations	p. 204
Input-Independent Variance-Only Method	p. 205
Input-Dependent Variance-Only Method	p. 206
Input-Independent Bias-and-Variance Approach	p. 207
Numerical Examples of Pool-Based Active Learning Methods	p. 209
Summary and Discussion	p. 212
Active Learning with Model Selection	p. 215
Direct Approach and the Active Learning/Model Selection Dilemma	p. 215
Sequential Approach	p. 216
Batch Approach	p. 218
Ensemble Active Learning	p. 219
Numerical Examples	p. 220
Setting	p. 220
Analysis of Batch Approach	p. 221
Analysis of Sequential Approach	p. 222
Comparison of Obtained Generalization Error	p. 222
Summary and Discussion	p. 223
Applications of Active Learning	p. 225
Design of Efficient Exploration Strategies in Reinforcement Learning	p. 225
Efficient Exploration with Active Learning	p. 225
Reinforcement Learning Revisited	p. 226
Decomposition of Generalization Error	p. 228
Estimating Generalization Error for Active Learning	p. 229
Designing Sampling Policies	p. 230
Active Learning in Policy Iteration	p. 231
Robot Control Experiments	p. 232
Wafer Alignment in Semiconductor Exposure Apparatus	p. 234
Conclusions
Conclusions and Future Prospects	p. 241
Conclusions	p. 241
Future Prospects	p. 242
Appendix: List of Symbols and Abbreviations	p. 243
Bibliography	p. 247
Index	p. 259
Table of Contents provided by Ingram. All Rights Reserved.

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Machine Learning in Non-Stationary Environments

0262017091

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program