1 Introduction to DBMS Implementation

(20)

1.1 Introducing: The Megatron 2000 Database System

(4)

1.1.1 Megatron 2000 Implementation Details

(2)

1.1.2 How Megatron 2000 Executes Queries

(1)

1.1.3 What's Wrong With Megatron 2000?

(1)

1.2 Overview of a Database Management System

(5)

1.2.1 Data-Definition Language Commands

(2)

1.2.2 Overview of Query Processing

(1)

1.2.3 Main-Memory Buffers and the Buffer Manager

(1)

1.2.4 Transaction Processing

(1)

1.2.5 The Query Processor

(1)

1.3 Outline of This Book

(3)

1.3.1 Prerequisites

(1)

1.3.2 Storage-Management Overview

(1)

1.3.3 Query-Processing Overview

(1)

1.3.4 Transaction-Processing Overview

(1)

1.3.5 Information Integration Overview

(1)

1.4 Review of Database Models and Languages

(5)

1.4.1 Relational Model Review

(1)

1.4.2 SQL Review

(3)

1.4.3 Relational and Object-Oriented Data

(1)

1.5 Summary of Chapter 1

(1)

1.6 References for Chapter 1

(1)

2 Data Storage

(62)

2.1 The Memory Hierarchy

(8)

2.1.1 Cache

(1)

2.1.2 Main Memory

(1)

2.1.3 Virtual Memory

(1)

2.1.4 Secondary Storage

(2)

2.1.5 Tertiary Storage

(1)

2.1.6 Volatile and Nonvolatile Storage

(1)

2.1.7 Exercises for Section 2.1

(1)

2.2 Disks

(10)

2.2.1 Mechanics of Disks

(2)

2.2.2 The Disk Controller

(1)

2.2.3 Disk Storage Characteristics

(2)

2.2.4 Disk Access Characteristics

(4)

2.2.5 Writing Blocks

(1)

2.2.6 Modifying Blocks

(1)

2.2.7 Exercises for Section 2.2

(1)

2.3 Using Secondary Storage Effectively

(9)

2.3.1 The I/O Model of Computation

(1)

2.3.2 Sorting Data in Secondary Storage

(1)

2.3.3 Merge-Sort

(1)

2.3.4 Two-Phase, Multiway Merge-Sort

(3)

2.3.5 Extension of Multiway Merging to Larger Relations

(1)

2.3.6 Exercises for Section 2.3

(1)

2.4 Improving the Access Time of Secondary Storage

(14)

2.4.1 Organizing Data by Cylinders

(1)

2.4.2 Using Multiple Disks

(1)

2.4.3 Mirroring Disks

(1)

2.4.4 Disk Scheduling and the Elevator Algorithm

(4)

2.4.5 Prefetching and Large-Scale Buffering

(1)

2.4.6 Summary of Strategies and Tradeoffs

(2)

2.4.7 Exercises for Section 2.4

(2)

2.5 Disk Failures

(4)

2.5.1 Intermittent Failures

(1)

2.5.2 Checksums

(1)

2.5.3 Stable Storage

(1)

2.5.4 Error-Handling Capabilities of Stable Storage

(1)

2.5.5 Exercises for Section 2.5

(1)

2.6 Recovery from Disk Crashes

(13)

2.6.1 The Failure Model for Disks

(1)

2.6.2 Mirroring as a Redundancy Technique

(1)

2.6.3 Parity Blocks

(4)

2.6.4 An Improvement: RAID 5

(1)

2.6.5 Coping With Multiple Disk Crashes

(4)

2.6.6 Exercises for Section 2.6

(3)

2.7 Summary of Chapter 2

(2)

2.8 References for Chapter 2

(1)

3 Representing Data Elements

(40)

3.1 Data Elements and Fields

(7)

3.1.1 Representing Relational Database Elements

(1)

3.1.2 Representing Objects

(1)

3.1.3 Representing Data Elements

(4)

3.2 Records

(6)

3.2.1 Building Fixed-Length Records

(2)

3.2.2 Record Headers

(1)

3.2.3 Packing Fixed-Length Records into Blocks

(1)

3.2.4 Exercises for Section 3.2

(1)

3.3 Representing Block and Record Addresses

(12)

3.3.1 Client-Server Systems

(1)

3.3.2 Logical and Structured Addresses

(1)

3.3.3 Pointer Swizzling

(5)

3.3.4 Returning Blocks to Disk

104

(1)

3.3.5 Pinned Records and Blocks

105

(1)

3.3.6 Exercises for Section 3.3

105

(3)

3.4 Variable-Length Data and Records

108

(8)

3.4.1 Records With Variable-Length Fields

108

(1)

3.4.2 Records With Repeating Fields

109

(2)

3.4.3 Variable-Format Records

111

(1)

3.4.4 Records That Do Not Fit in a Block

112

(2)

3.4.5 BLOBS

114

(1)

3.4.6 Exercises for Section 3.4

115

(1)

3.5 Record Modifications

116

(4)

3.5.1 Insertion

116

(2)

3.5.2 Deletion

118

(1)

3.5.3 Update

119

(1)

3.5.4 Exercises for Section 3.5

119

(1)

3.6 Summary of Chapter 3

120

(2)

3.7 References for Chapter 3

122

(1)

4 Index Structures

123

(64)

4.1 Indexes on Sequential Files

124

(18)

4.1.1 Sequential Files

124

(1)

4.1.2 Dense Indexes

125

(3)

4.1.3 Sparse Indexes

128

(1)

4.1.4 Multiple Levels of Index

129

(2)

4.1.5 Indexes With Duplicate Search Keys

131

(2)

4.1.6 Managing Indexes During Data Modifications

133

(7)

4.1.7 Exercises for Section 4.1

140

(2)

4.2 Secondary Indexes

142

(12)

4.2.1 Design of Secondary Indexes

142

(2)

4.2.2 Applications of Secondary Indexes

144

(1)

4.2.3 Indirection in Secondary Indexes

145

(3)

4.2.4 Document Retrieval and Inverted Indexes

148

(3)

4.2.5 Exercises for Section 4.2

151

(3)

4.3 B-Trees

154

(16)

4.3.1 The Structure of B-trees

154

(3)

4.3.2 Applications of B-trees

157

(2)

4.3.3 Lookup in B-Trees

159

(1)

4.3.4 Range Queries

160

(1)

4.3.5 Insertion Into B-Trees

161

(2)

4.3.6 Deletion From B-Trees

163

(3)

4.3.7 Efficiency of B-Trees

166

(1)

4.3.8 Exercises for Section 4.3

167

(3)

4.4 Hash Tables

170

(14)

4.4.1 Secondary-Storage Hash Tables

171

(1)

4.4.2 Insertion Into a Hash Table

172

(1)

4.4.3 Hash-Table Deletion

172

(1)

4.4.4 Efficiency of Hash Table Indexes

173

(1)

4.4.5 Extensible Hash Tables

174

(1)

4.4.6 Insertion Into Extensible Hash Tables

175

(2)

4.4.7 Linear Hash Tables

177

(3)

4.4.8 Insertion Into Linear Hash Tables

180

(2)

4.4.9 Exercises for Section 4.4

182

(2)

4.5 Summary of Chapter 4

184

(1)

4.6 References for Chapter 4

185

(2)

5 Multidimensional Indexes

187

(50)

5.1 Applications Needing Multiple Dimensions

188

(9)

5.1.1 Geographic Information Systems

188

(1)

5.1.2 Data Cubes

189

(1)

5.1.3 Multidimensional Queries in SQL

190

(2)

5.1.4 Executing Range Queries Using Conventional Indexes

192

(1)

5.1.5 Executing Nearest-Neighbor Queries Using Conventional Indexes

193

(2)

5.1.6 Other Limitations of Conventional Indexes

195

(1)

5.1.7 Overview of Multidimensional Index Structures

195

(1)

5.1.8 Exercises for Section 5.1

196

(1)

5.2 Hash-Like Structures for Multidimensional Data

197

(12)

5.2.1 Grid Files

198

(1)

5.2.2 Lookup in a Grid File

198

(1)

5.2.3 Insertion Into Grid Files

199

(2)

5.2.4 Performance of Grid Files

201

(3)

5.2.5 Partitioned Hash Functions

204

(1)

5.2.6 Comparison of Grid Files and Partitioned Hashing

205

(1)

5.2.7 Exercises for Section 5.2

206

(3)

5.3 Tree-Like Structures for Multidimensional Data

209

(16)

5.3.1 Multiple-Key Indexes

209

(2)

5.3.2 Performance of Multiple-Key Indexes

211

(1)

5.3.3 kd-Trees

212

(1)

5.3.4 Operations on kd-Trees

213

(3)

5.3.5 Adapting kd-Trees to Secondary Storage

216

(1)

5.3.6 Quad Trees

217

(2)

5.3.7 R-Trees

219

(1)

5.3.8 Operations on R-trees

219

(3)

5.3.9 Exercises for Section 5.3

222

(3)

5.4 Bitmap Indexes

225

(8)

5.4.1 Motivation for Bitmap Indexes

225

(2)

5.4.2 Compressed Bitmaps

227

(2)

5.4.3 Operating on Run-Length-Encoded Bit-Vectors

229

(1)

5.4.4 Managing Bitmap Indexes

230

(2)

5.4.5 Exercises for Section 5.4

232

(1)

5.5 Summary of Chapter 5

233

(1)

5.6 References for Chapter 5

234

(3)

6 Query Execution

237

(92)

6.1 An Algebra for Queries

240

(17)

6.1.1 Union, Intersection, and Difference

241

(1)

6.1.2 The Selection Operator

242

(2)

6.1.3 The Projection Operator

244

(1)

6.1.4 The Product of Relations

245

(1)

6.1.5 Joins

246

(2)

6.1.6 Duplicate Elimination

248

(1)

6.1.7 Grouping and Aggregation

248

(3)

6.1.8 The Sorting Operator

251

(1)

6.1.9 Expression Trees

252

(2)

6.1.10 Exercises for Section 6.1

254

(3)

6.2 Introduction to Physical-Query-Plan Operators

257

(7)

6.2.1 Scanning Tables

257

(1)

6.2.2 Sorting While Scanning Tables

258

(1)

6.2.3 The Model of Computation for Physical Operators

258

(1)

6.2.4 Parameters for Measuring Costs

259

(1)

6.2.5 I/O Cost for Scan Operators

260

(1)

6.2.6 Iterators for Implementation of Physical Operators

261

(3)

6.3 One-Pass Algorithms for Database Operations

264

(10)

6.3.1 One-Pass Algorithms for Tuple-at-a-Time Operations

266

(1)

6.3.2 One-Pass Algorithms for Unary, Full-Relation Operations

267

(3)

6.3.3 One-Pass Algorithms for Binary Operations

270

(3)

6.3.4 Exercises for Section 6.3

273

(1)

6.4 Nested-Loop Joins

274

(5)

6.4.1 Tuple-Based Nested-Loop Join

275

(1)

6.4.2 An Iterator for Tuple-Based Nested-Loop Join

275

(1)

6.4.3 A Block-Based Nested-Loop Join Algorithm

275

(3)

6.4.4 Analysis of Nested-Loop Join

278

(1)

6.4.5 Summary of Algorithms so Far

278

(1)

6.4.6 Exercises for Section 6.4

278

(1)

6.5 Two-Pass Algorithms Based on Sorting

279

(12)

6.5.1 Duplicate Elimination Using Sorting

280

(2)

6.5.2 Grouping and Aggregation Using Sorting

282

(1)

6.5.3 A Sort-Based Union Algorithm

283

(1)

6.5.4 Sort-Based Algorithms for Intersection and Difference

284

(2)

6.5.5 A Simple Sort-Based Join Algorithm

286

(1)

6.5.6 Analysis of Simple Sort-Join

287

(1)

6.5.7 A More Efficient Sort-Based Join

288

(1)

6.5.8 Summary of Sort-Based Algorithms

289

(1)

6.5.9 Exercises for Section 6.5

289

(2)

6.6 Two-Pass Algorithms Based on Hashing

291

(8)

6.6.1 Partitioning Relations by Hashing

292

(1)

6.6.2 A Hash-Based Algorithm for Duplicate Elimination

293

(1)

6.6.3 A Hash-Based Algorithm for Grouping and Aggregation

293

(1)

6.6.4 Hash-Based Algorithms for Union, Intersection, and Difference

294

(1)

6.6.5 The Hash-Join Algorithm

294

(1)

6.6.6 Saving Some Disk I/O's

295

(2)

6.6.7 Summary of Hash-Based Algorithms

297

(1)

6.6.8 Exercises for Section 6.6

298

(1)

6.7 Index-Based Algorithms

299

(8)

6.7.1 Clustering and Nonclustering Indexes

299

(1)

6.7.2 Index-Based Selection

300

(3)

6.7.3 Joining by Using an Index

303

(1)

6.7.4 Joins Using a Sorted Index

304

(2)

6.7.5 Exercises for Section 6.7

306

(1)

6.8 Buffer Management

307

(6)

6.8.1 Buffer Management Architecture

307

(1)

6.8.2 Buffer Management Strategies

308

(2)

6.8.3 The Relationship Between Physical Operator Selection and Buffer Management

310

(2)

6.8.4 Exercises for Section 6.8

312

(1)

6.9 Algorithms Using More Than Two Passes

Preface This book was designed for CS245, the second course in the database sequence at Stanford. Here, the first database course, CS145, covers database design and programming, for which the bookA First Course in Database Systemsby Jeff Ullman and Jennifer Widom, Prentice-Hall, 1997, was written. The CS245 course then covers implementation of a DBMS, notably storage structures, query processing, and transaction management. Use of the Book We're on a quarter system at Stanford, so the principal course using this book - CS245 - is only ten weeks long. In the Winter of 1999, Hector Garcia-Molina used a "beta" version of this book, and covered the following parts: Sections 2.1-2.4, all of Chapters 3 and 4, Sections 5.1 and 5.2, Sections 6.1--6.7, Sections 7.1-7.4, all of Chapter 8, Chapter 9 except for Section 9.8, Sections 10.1-10.3, Section 11.1, and Section 11.5. The balance of Chapters 6 and 7 (query optimization) is covered in an advanced course, CS346, where students implement their own DBMS. Other portions of the book that are not covered in CS245 may appear in another advanced course, CS347, which talks about distributed databases and advanced transaction processing. Schools that are on the semester system have the opportunity to combine the use of this book with its predecessor:A First Course in Database Systems.We recommend using that book in the first semester, coupled with a database-application programming project. The second semester could cover most or all of the content of this book. An advantage to splitting the study of databases into two courses is that students not planning to specialize in DBMS construction can take only the first course and be able to use databases in whatever branch of Computer Science they enter. Prerequisites The course on which the book is based is rarely taken before the senior year, so we expect the reader to have a fairly broad background in the traditional areas of Computer Science. We assume that the reader has learned something about database programming, especially SQL. It is helpful to know about relational algebra and to have some familiarity with basic data structures. Likewise, some knowledge of file systems and operating systems is useful. Exercises The book contains extensive exercises, with some for almost every section. We indicate harder exercises or parts of exercises with an exclamation point. The hardest exercises have a double exclamation point. Some of the exercises or parts are marked with a star. For these exercises, we shall endeavor to maintain solutions accessible through the book's Web page. These solutions are publicly available and should be used for self-testing. Note that in a few cases, one exerciseBasks for modification or adaptation of your solution to another exerciseA. If certain parts ofAhave Web-published solutions, then you should expect the corresponding parts ofBto have solutions as well. Support on the World-Wide Web The book's home page is http://www-db.stanford.edu/~ullman/dbsi.html Here you will find solutions to starred exercises, errata as we learn of them, and backup materials. We hope to make available the notes for each offering of CS245 and relevant portions of other database courses, as we teach them, including homeworks, exams, and solutions. Acknowledgements Thanks go to Brad Adelberg, Karen Butler, Ed Chang, Surajit Chaudhuri, Rada Chirkova, Tom Dienstbier, Xavier Faz, Tracy Fujieda, Luis Gravano, Ben Holzman, Fabien Modoux, Peter Mork, Ken Ross, Mema Roussopolous, and Jonathan Ullman for assistance gathering material and/or discovering errors in earlier drafts of this work. Remaining errors are ours, of course. H. G.-M. J. D. U. J. W. Stanford, CA

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Database System Implementation

9780130402646

0130402648

Supplemental Materials

Summary

Table of Contents

Supplemental Materials

Excerpts

Rewards Program