9780262193986

Reinforcement Learning : An Introduction

by
  • ISBN13:

    9780262193986

  • ISBN10:

    0262193981

  • Format: Hardcover
  • Copyright: 1998-03-01
  • Publisher: Mit Pr
  • Purchase Benefits
  • Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $75.00 Save up to $2.25
  • Buy New
    $72.75
    Add to Cart Free Shipping

    SPECIAL ORDER: 1-2 WEEKS

Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The eBook copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Summary

Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

Table of Contents

Series Foreword xiii(2)
Preface xv
I The Problem 1(86)
1 Introduction
3(22)
1.1 Reinforcement Learning
3(3)
1.2 Examples
6(1)
1.3 Elements of Reinforcement Learning
7(3)
1.4 An Extended Example: Tic-Tac-Toe
10(5)
1.5 Summary
15(1)
1.6 History of Reinforcement Learning
16(7)
1.7 Bibliographical Remarks
23(2)
2 Evaluative Feedback
25(26)
2.1 An n-Armed Bandit Problem
26(1)
2.2 Action-Value Methods
27(3)
2.3 Softmax Action Selection
30(1)
2.4 Evaluation Versus Instruction
31(5)
2.5 Incremental Implementation
36(2)
2.6 Tracking a Nonstationary Problem
38(1)
2.7 Optimistic Initial Values
39(2)
2.8 Reinforcement Comparison
41(2)
2.9 Pursuit Methods
43(2)
2.10 Associative Search
45(1)
2.11 Conclusions
46(2)
2.12 Bibliographical and Historical Remarks
48(3)
3 The Reinforcement Learning Problem
51(36)
3.1 The Agent-Environment Interface
51(5)
3.2 Goals and Rewards
56(1)
3.3 Returns
57(3)
3.4 Unified Notation for Episodic and Continuing Tasks
60(1)
3.5 The Markov Property
61(5)
3.6 Markov Decision Processes
66(2)
3.7 Value Functions
68(7)
3.8 Optimal Value Functions
75(5)
3.9 Optimality and Approximation
80(1)
3.10 Summary
81(2)
3.11 Bibliographical and Historical Remarks
83(4)
II Elementary Solution Methods 87(74)
4 Dynamic Programming
89(22)
4.1 Policy Evaluation
90(3)
4.2 Policy Improvement
93(4)
4.3 Policy Iteration
97(3)
4.4 Value Iteration
100(3)
4.5 Asynchronous Dynamic Programming
103(2)
4.6 Generalized Policy Iteration
105(2)
4.7 Efficiency of Dynamic Programming
107(1)
4.8 Summary
108(1)
4.9 Bibliographical and Historical Remarks
109(2)
5 Monte Carlo Methods
111(22)
5.1 Monte Carlo Policy Evaluation
112(4)
5.2 Monte Carlo Estimation of Action Values
116(2)
5.3 Monte Carlo Control
118(4)
5.4 On-Policy Monte Carlo Control
122(2)
5.5 Evaluating One Policy While Following Another
124(2)
5.6 Off-Policy Monte Carlo Control
126(2)
5.7 Incremental Implementation
128(1)
5.8 Summary
129(2)
5.9 Bibliographical and Historical Remarks
131(2)
6 Temporal-Difference Learning
133(28)
6.1 TD Prediction
133(5)
6.2 Advantages of TD Prediction Methods
138(3)
6.3 Optimality of TD(0)
141(4)
6.4 Sarsa: On-Policy TD Control
145(3)
6.5 Q-Learning: Off-Policy TD Control
148(3)
6.6 Actor-Critic Methods
151(2)
6.7 R-Learning for Undiscounted Continuing Tasks
153(3)
6.8 Games, Afterstates, and Other Special Cases
156(1)
6.9 Summary
157(1)
6.10 Bibliographical and Historical Remarks
158(3)
III A Unified View 161(130)
7 Eligibility Traces
163(30)
7.1 n-Step TD Prediction
164(5)
7.2 The Forward View of TD(Frequency)
169(4)
7.3 The Backward View of TD(Frequency)
173(3)
7.4 Equivalence of Forward and Backward Views
176(3)
7.5 Sarsa(Frequency)
179(3)
7.6 Q(Frequency)
182(3)
7.7 Eligibility Traces for Actor-Critic Methods
185(1)
7.8 Replacing Traces
186(3)
7.9 Implementation Issues
189(1)
7.10 Variable Frequency
189(1)
7.11 Conclusions
190(1)
7.12 Bibliographical and Historical Remarks
191(2)
8 Generalization and Function Approximation
193(34)
8.1 Value Prediction with Function Approximation
194(3)
8.2 Gradient-Descent Methods
197(3)
8.3 Linear Methods
200(10)
8.4 Control with Function Approximation
210(6)
8.5 Off-Policy Bootstrapping
216(4)
8.6 Should We Bootstrap?
220(2)
8.7 Summary
222(1)
8.8 Bibliographical and Historical Remarks
223(4)
9 Planning and Learning
227(28)
9.1 Models and Planning
227(3)
9.2 Integrating Planning, Acting, and Learning
230(5)
9.3 When the Model Is Wrong
235(3)
9.4 Prioritized Sweeping
238(4)
9.5 Full vs. Sample Backups
242(4)
9.6 Trajectory Sampling
246(4)
9.7 Heuristic Search
250(2)
9.8 Summary
252(2)
9.9 Bibliographical and Historical Remarks
254(1)
10 Dimensions of Reinforcement Learning
255(6)
10.1 The Unified View
255(3)
10.2 Other Frontier Dimensions
258(3)
11 Case Studies
261(30)
11.1 TD-Gammon
261(6)
11.2 Samuel's Checkers Player
267(3)
11.3 The Acrobot
270(4)
11.4 Elevator Dispatching
274(5)
11.5 Dynamic Channel Allocation
279(4)
11.6 Job-Shop Scheduling
283(8)
References 291(22)
Summary of Notation 313(2)
Index 315

Rewards Program

Write a Review