Preface | p. VII |
Introduction | p. 1 |
An Overview of Learning and Optimization | p. 1 |
Problem Description | p. 1 |
Optimal Policies | p. 5 |
Fundamental Limitations of Learning and Optimization | p. 12 |
A Sensitivity-Based View of Learning and Optimization | p. 17 |
Problem Formulations in Different Disciplines | p. 19 |
Perturbation Analysis (PA) | p. 21 |
Markov Decision Processes (MDPs) | p. 26 |
Reinforcement Learning (RL) | p. 31 |
Identification and Adaptive Control (I&AC) | p. 34 |
Event-Based Optimization and Potential Aggregation | p. 37 |
A Map of the Learning and Optimization World | p. 41 |
Terminology and Notation | p. 42 |
Problems | p. 44 |
Four Disciplines in Learning and Optimization | |
Perturbation Analysis | p. 51 |
Perturbation Analysis of Markov Chains | p. 52 |
Constructing a Perturbed Sample Path | p. 53 |
Perturbation Realization Factors and Performance Potentials | p. 57 |
Performance Derivative Formulas | p. 64 |
Gradients with Discounted Reward Criteria | p. 68 |
Higher-Order Derivatives and the MacLaurin Series | p. 74 |
Performance Sensitivities of Markov Processes | p. 83 |
Performance Sensitivities of Semi-Markov Processes | p. 90 |
Fundamentals for Semi-Markov Processes | p. 90 |
Performance Sensitivity Formulas | p. 95 |
Perturbation Analysis of Queueing Systems | p. 102 |
Constructing a Perturbed Sample Path | p. 105 |
Perturbation Realization | p. 115 |
Performance Derivatives | p. 121 |
Remarks on Theoretical Issues | p. 125 |
Other Methods | p. 132 |
Problems | p. 137 |
Learning and Optimization with Perturbation Analysis | p. 147 |
The Potentials | p. 148 |
Numerical Methods | p. 148 |
Learning Potentials from Sample Paths | p. 151 |
Coupling | p. 156 |
Performance Derivatives | p. 161 |
Estimating through Potentials | p. 161 |
Learning Directly | p. 162 |
Optimization with PA | p. 172 |
Gradient Methods and Stochastic Approximation | p. 172 |
Optimization with Long Sample Paths | p. 174 |
Applications | p. 177 |
Problems | p. 177 |
Markov Decision Processes | p. 183 |
Ergodic Chains | p. 185 |
Policy Iteration | p. 186 |
Bias Optimality | p. 192 |
MDPs with Discounted Rewards | p. 201 |
Multi-Chains | p. 203 |
Policy Iteration | p. 205 |
Bias Optimality | p. 216 |
MDPs with Discounted Rewards | p. 226 |
The nth-Bias Optimization | p. 228 |
nth-Bias Difference Formulas | p. 229 |
Optimality Equations | p. 232 |
Policy Iteration | p. 240 |
nth-Bias Optimal Policy Spaces | p. 244 |
Problems | p. 246 |
Sample-Path-Based Policy Iteration | p. 253 |
Motivation | p. 254 |
Convergence Properties | p. 258 |
Convergence of Potential Estimates | p. 259 |
Sample Paths with a Fixed Number of Regenerative Periods | p. 260 |
Sample Paths with Increasing Lengths | p. 267 |
"Fast" Algorithms | p. 277 |
The Algorithm That Stops in a Finite Number of Periods | p. 278 |
With Stochastic Approximation | p. 282 |
Problems | p. 284 |
Reinforcement Learning | p. 289 |
Stochastic Approximation | p. 290 |
Finding the Zeros of a Function Recursively | p. 291 |
Estimating Mean Values | p. 297 |
Temporal Difference Methods | p. 298 |
TD Methods for Potentials | p. 298 |
Q-Factors and Other Extensions | p. 308 |
TD Methods for Performance Derivatives | p. 313 |
TD Methods and Performance Optimization | p. 318 |
PA-Based Optimization | p. 318 |
Q-Learning | p. 321 |
Optimistic On-Line Policy Iteration | p. 325 |
Value Iteration | p. 327 |
Summary of the Learning and Optimization Methods | p. 330 |
Problems | p. 333 |
Adaptive Control Problems as MDPs | p. 341 |
Control Problems and MDPs | p. 342 |
Control Systems Modelled as MDPs | p. 342 |
A Comparison of the Two Approaches | p. 345 |
MDPs with Continuous State Spaces | p. 353 |
Operators on Continuous Spaces | p. 354 |
Potentials and Policy Iteration | p. 359 |
Linear Control Systems and the Riccati Equation | p. 363 |
The LQ Problem | p. 363 |
The JLQ Problem | p. 368 |
On-Line Optimization and Adaptive Control | p. 373 |
Discretization and Estimation | p. 374 |
Discussion | p. 379 |
Problems | p. 381 |
The Event-Based Optimization - A New Approach | |
Event-Based Optimization of Markov Systems | p. 387 |
An Overview | p. 388 |
Summary of Previous Chapters | p. 388 |
An Overview of the Event-Based Approach | p. 390 |
Events Associated with Markov Chains | p. 398 |
The Event and Event Space | p. 400 |
The Probabilities of Events | p. 403 |
The Basic Ideas Illustrated by Examples | p. 407 |
Classification of Three Types of Events | p. 410 |
Event-Based Optimization | p. 414 |
The Problem Formulation | p. 414 |
Performance Difference Formulas | p. 417 |
Performance Derivative Formulas | p. 420 |
Optimization | p. 425 |
Learning: Estimating Aggregated Potentials | p. 429 |
Aggregated Potentials | p. 429 |
Aggregated Potentials in the Event-Based Optimization | p. 432 |
Applications and Examples | p. 434 |
Manufacturing | p. 434 |
Service Rate Control | p. 438 |
General Applications | p. 444 |
Problems | p. 446 |
Constructing Sensitivity Formulas | p. 455 |
Motivation | p. 455 |
Markov Chains on the Same State Space | p. 456 |
Event-Based Systems | p. 464 |
Sample-Path Construction | p. 464 |
Parameterized Systems: An Example | p. 467 |
Markov Chains with Different State Spaces | p. 470 |
One Is a Subspace of the Other | p. 470 |
A More General Case | p. 478 |
Summary | p. 482 |
Problems | p. 483 |
Appendices: Mathematical Background | |
Probability and Markov Processes | p. 491 |
Probability | p. 491 |
Markov Processes | p. 498 |
Problems | p. 504 |
Stochastic Matrices | p. 507 |
Canonical Form | p. 507 |
Eigenvalues | p. 508 |
The Limiting Matrix | p. 511 |
Problems | p. 516 |
Queueing Theory | p. 519 |
Single-Server Queues | p. 519 |
Queueing Networks | p. 524 |
Some Useful Techniques | p. 536 |
Problems | p. 538 |
Notation and Abbreviations | p. 543 |
References | p. 547 |
Index | p. 563 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.