Introduction to Parallel Computing

Introduction to Parallel Computing, Second Edition Ananth Grama Anshul Gupta George Karypis Vipin Kumar Increasingly, parallel processing is being seen as the only cost-effective method for the fast solution of computationally large and data-intensive problems. The emergence of inexpensive parallel computers such as commodity desktop multiprocessors and clusters of workstations or PCs has made such parallel methods generally applicable, as have software standards for portable parallel programming. This sets the stage for substantial growth in parallel software. Data-intensive applications such as transaction processing and information retrieval, data mining and analysis and multimedia services have provided a new challenge for the modern generation of parallel platforms. Emerging areas such as computational biology and nanotechnology have implications for algorithms and systems development, while changes in architectures, programming models and applications have implications for how parallel platforms are made available to users in the form of grid-based services. This book takes into account these new developments as well as covering the more traditional problems addressed by parallel computers. Where possible it employs an architecture-independent view of the underlying platforms and designs algorithms for an abstract model. Message Passing Interface (MPI), POSIX threads and OpenMP have been selected as programming models and the evolving application mix of parallel computing is reflected in various examples throughout the book. * Provides a complete end-to-end source on almost every aspect of parallel computing (architectures, programming paradigms, algorithms and standards). * Covers both traditional computer science algorithms (sorting, searching, graph, and dynamic programming algorithms) as well as scientific computing algorithms (matrix computations, FFT). * Covers MPI, Pthreads and OpenMP, the three most widely used standards for writing portable parallel programs. * The modular nature of the text makes it suitable for a wide variety of undergraduate and graduate level courses including parallel computing, parallel programming, design and analysis of parallel algorithms and high performance computing. Ananth Grama is Associate Professor of Computer Sciences at Purdue University, working on various aspects of parallel and distributed systems and applications. Anshul Gupta is a member of the research staff at the IBM T. J. Watson Research Center. His research areas are parallel algorithms and scientific computing. George Karypis is Assistant Professor in the Department of Computer Science and Engineering at the University of Minnesota, working on parallel algorithm design, graph partitioning, data mining, and bioinformatics. Vipin Kumar is Professor in the Department of Computer Science and Engineering and the Director of the Army High Performance Computing Research Center at the University of Minnesota. His research interests are in the areas of high performance computing, parallel algorithms for scientific computing problems and data mining.

Preface

xvii

Acknowledgments

xix

Introduction to Parallel Computing

(10)

Motivating Parallelism

(2)

The Computational Power Argument -- from Transistors to FLOPS

(1)

The Memory/Disk Speed Argument

(1)

The Data Communication Argument

(1)

Scope of Parallel Computing

(2)

Applications in Engineering and Design

(1)

Scientific Applications

(1)

Commercial Applications

(1)

Applications in Computer Systems

(1)

Organization and Contents of the Text

(2)

Bibliographic Remarks

(3)

Problems

(2)

Parallel Programming Platforms

(74)

Implicit Parallelism: Trends in Microprocessor Architectures

(4)

Pipelining and Superscalar Execution

(3)

Very Long Instruction Word Processors

(1)

Limitations of Memory System Performance*

(8)

Improving Effective Memory Latency Using Caches

(1)

Impact of Memory Bandwidth

(3)

Alternate Approaches for Hiding Memory Latency

(2)

Tradeoffs of Multithreading and Prefetching

(1)

Dichotomy of Parallel Computing Platforms

(7)

Control Structure of Parallel Platforms

(2)

Communication Model of Parallel Platforms

(4)

Physical Organization of Parallel Platforms

(22)

Architecture of an Ideal Parallel Computer

(1)

Interconnection Networks for Parallel Computers

(1)

Network Topologies

(10)

Evaluating Static Interconnection Networks

(1)

Evaluating Dynamic Interconnection Networks

(1)

Cache Coherence in Multiprocessor Systems

(8)

Communication Costs in Parallel Machines

(10)

Message Passing Costs in Parallel Computers

(8)

Communication Costs in Shared-Address-Space Machines

(2)

Routing Mechanisms for Interconnection Networks

(2)

Impact of Process-Processor Mapping and Mapping Techniques

(9)

Mapping Techniques for Graphs

(7)

Cost-Performance Tradeoffs

(1)

Bibliographic Remarks

(11)

Problems

(9)

Principles of Parallel Algorithm Design

(62)

Preliminaries

(9)

Decomposition, Tasks, and Dependency Graphs

(3)

Granularity, Concurrency, and Task-Interaction

(4)

Processes and Mapping

(1)

Processes versus Processors

(1)

Decomposition Techniques

(15)

Recursive Decomposition

(2)

Data Decomposition

(8)

Exploratory Decomposition

105

(2)

Speculative Decomposition

107

(2)

Hybrid Decompositions

109

(1)

Characteristics of Tasks and Interactions

110

(5)

Characteristics of Tasks

110

(2)

Characteristics of Inter-Task Interactions

112

(3)

Mapping Techniques for Load Balancing

115

(17)

Schemes for Static Mapping

117

(13)

Schemes for Dynamic Mapping

130

(2)

Methods for Containing Interaction Overheads

132

(7)

Maximizing Data Locality

132

(2)

Minimizing Contention and Hot Spots

134

(1)

Overlapping Computations with Interactions

135

(1)

Replicating Data or Computations

136

(1)

Using Optimized Collective Interaction Operations

137

(1)

Overlapping Interactions with Other Interactions

138

(1)

Parallel Algorithm Models

139

(3)

The Data-Parallel Model

139

(1)

The Task Graph Model

140

(1)

The Work Pool Model

140

(1)

The Master-Slave Model

141

(1)

The Pipeline or Producer-Consumer Model

141

(1)

Hybrid Models

142

(1)

Bibliographic Remarks

142

(5)

Problems

143

(4)

Basic Communication Operations

147

(48)

One-to-All Broadcast and All-to-One Reduction

149

(8)

Ring or Linear Array

149

(3)

Mesh

152

(1)

Hypercube

153

(1)

Balanced Binary Tree

153

(1)

Detailed Algorithms

154

(2)

Cost Analysis

156

(1)

All-to-All Broadcast and Reduction

157

(9)

Linear Array and Ring

158

(2)

Mesh

160

(1)

Hypercube

161

(3)

Cost Analysis

164

(2)

All-Reduce and Prefix-Sum Operations

166

(1)

Scatter and Gather

167

(3)

All-to-All Personalized Communication

170

(9)

Ring

173

(1)

Mesh

174

(1)

Hypercube

175

(4)

Circular Shift

179

(5)

Mesh

179

(2)

Hypercube

181

(3)

Improving the Speed of Some Communication Operations

184

(3)

Splitting and Routing Messages in Parts

184

(2)

All-Port Communication

186

(1)

Summary

187

(1)

Bibliographic Remarks

188

(7)

Problems

190

(5)

Analytical Modeling of Parallel Programs

195

(38)

Sources of Overhead in Parallel Programs

195

(2)

Performance Metrics for Parallel Systems

197

(8)

Execution Time

197

(1)

Total Parallel Overhead

197

(1)

Speedup

198

(4)

Efficiency

202

(1)

Cost

203

(2)

The Effect of Granularity on Performance

205

(3)

Scalability of Parallel Systems

208

(10)

Scaling Characteristics of Parallel Programs

209

(3)

The Isoefficiency Metric of Scalability

212

(5)

Cost-Optimality and the Isoefficiency Function

217

(1)

A Lower Bound on the Isoefficiency Function

217

(1)

The Degree of Concurrency and the Isoefficiency Function

218

(1)

Minimum Execution Time and Minimum Cost-Optimal Execution Time

218

(3)

Asymptotic Analysis of Parallel Programs

221

(1)

Other Scalability Metrics

222

(4)

Bibliographic Remarks

226

(7)

Problems

228

(5)

Programming Using the Message-Passing Paradigm

233

(46)

Principles of Message-Passing Programming

233

(2)

The Building Blocks: Send and Receive Operations

235

(5)

Blocking Message Passing Operations

236

(3)

Non-Blocking Message Passing Operations

239

(1)

MPI: the Message Passing Interface

240

(10)

Starting and Terminating the MPI Library

242

(1)

Communicators

242

(1)

Getting Information

243

(1)

Sending and Receiving Messages

244

(4)

Example: Odd-Even Sort

248

(2)

Topologies and Embedding

250

(5)

Creating and Using Cartesian Topologies

251

(2)

Example: Cannon's Matrix-Matrix Multiplication

253

(2)

Overlapping Communication with Computation

255

(5)

Non-Blocking Communication Operations

255

(5)

Collective Communication and Computation Operations

260

(12)

Barrier

260

(1)

Broadcast

260

(1)

Reduction

261

(2)

Prefix

263

(1)

Gather

263

(1)

Scatter

264

(1)

All-to-All

265

(1)

Example: One-Dimensional Matrix-Vector Multiplication

266

(2)

Example: Single-Source Shortest-Path

268

(2)

Example: Sample Sort

270

(2)

Groups and Communicators

272

(4)

Example: Two-Dimensional Matrix-Vector Multiplication

274

(2)

Bibliographic Remarks

276

(3)

Problems

277

(2)

Programming Shared Address Space Platforms

279

(58)

Thread Basics

280

(1)

Why Threads?

281

(1)

The POSIX Thread API

282

(1)

Thread Basics: Creation and Termination

282

(5)

Synchronization Primitives in Pthreads

287

(11)

Mutual Exclusion for Shared Variables

287

(7)

Condition Variables for Synchronization

294

(4)

Controlling Thread and Synchronization Attributes

298

(3)

Attributes Objects for Threads

299

(1)

Attributes Objects for Mutexes

300

(1)

Thread Cancellation

301

(1)

Composite Synchronization Constructs

302

(8)

Read-Write Locks

302

(5)

Barriers

307

(3)

Tips for Designing Asynchronous Programs

310

(1)

OpenMP: a Standard for Directive Based Parallel Programming

311

(21)

The OpenMP Programming Model

312

(3)

Specifying Concurrent Tasks in OpenMP

315

(7)

Synchronization Constructs in OpenMP

322

(5)

Data Handling in OpenMP

327

(1)

OpenMP Library Functions

328

(2)

Environment Variables in OpenMP

330

(1)

Explicit Threads versus OpenMP Based Programming

331

(1)

Bibliographic Remarks

332

(5)

Problems

332

(5)

Dense Matrix Algorithms

337

(42)

Matrix-Vector Multiplication

337

(8)

Rowwise 1-D Partitioning

338

(3)

2-D Partitioning

341

(4)

Matrix-Matrix Multiplication

345

(7)

A Simple Parallel Algorithm

346

(1)

Cannon's Algorithm

347

(2)

The DNS Algorithm

349

(3)

Solving a System of Linear Equations

352

(19)

A Simple Gaussian Elimination Algorithm

353

(13)

Gaussian Elimination with Partial Pivoting

366

(3)

Solving a Triangular System: Back-Substitution

369

(1)

Numerical Considerations in Solving Systems of Linear Equations

370

(1)

Bibliographic Remarks

371

(8)

Problems

372

(7)

Sorting

379

(50)

Issues in Sorting on Parallel Computers

380

(2)

Where the Input and Output Sequences are Stored

380

(1)

How Comparisons are Performed

380

(2)

Sorting Networks

382

(12)

Bitonic Sort

384

(3)

Mapping Bitonic Sort to a Hypercube and a Mesh

387

(7)

Bubble Sort and its Variants

394

(5)

Odd-Even Transposition

395

(3)

Shellsort

398

(1)

Quicksort

399

(13)

Parallelizing Quicksort

401

(1)

Parallel Formulation for a CRCW PRAM

402

(2)

Parallel Formulation for Practical Architectures

404

(7)

Pivot Selection

411

(1)

Bucket and Sample Sort

412

(2)

Other Sorting Algorithms

414

(2)

Enumeration Sort

414

(1)

Radix Sort

415

(1)

Bibliographic Remarks

416

(13)

Problems

419

(10)

Graph Algorithms

429

(40)

Definitions and Representation

429

(3)

Minimum Spanning Tree: Prim's Algorithm

432

(4)

Single-Source Shortest Paths: Dijkstra's Algorithm

436

(1)

All-Pairs Shortest Paths

437

(8)

Dijkstra's Algorithm

438

(2)

Floyd's Algorithm

440

(5)

Performance Comparisons

445

(1)

Transitive Closure

445

(1)

Connected Components

446

(4)

A Depth-First Search Based Algorithm

446

(4)

Algorithms for Sparse Graphs

450

(12)

Finding a Maximal Independent Set

451

(4)

Single-Source Shortest Paths

455

(7)

Bibliographic Remarks

462

(7)

Problems

465

(4)

Search Algorithms for Discrete Optimization Problems

469

(46)

Definitions and Examples

469

(5)

Sequential Search Algorithms

474

(4)

Depth-First Search Algorithms

474

(4)

Best-First Search Algorithms

478

(1)

Search Overhead Factor

478

(2)

Parallel Depth-First Search

480

(16)

Important Parameters of Parallel DFS

482

(3)

A General Framework for Analysis of Parallel DFS

485

(3)

Analysis of Load-Balancing Schemes

488

(2)

Termination Detection

490

(2)

Experimental Results

492

(3)

Parallel Formulations of Depth-First Branch-and-Bound Search

495

(1)

Parallel Formulations of IDA*

496

(1)

Parallel Best-First Search

496

(5)

Speedup Anomalies in Parallel Search Algorithms

501

(4)

Analysis of Average Speedup in Parallel DFS

502

(3)

Bibliographic Remarks

505

(10)

Problems

510

(5)

Dynamic Programming

515

(22)

Overview of Dynamic Programming

515

(3)

Serial Monadic DP Formulations

518

(5)

The Shortest-Path Problem

518

(2)

The 0/1 Knapsack Problem

520

(3)

Nonserial Monadic DP Formulations

523

(3)

The Longest-Common-Subsequence Problem

523

(3)

Serial Polyadic DP Formulations

526

(1)

Floyd's All-Pairs Shortest-Paths Algorithm

526

(1)

Nonserial Polyadic DP Formulations

527

(3)

The Optimal Matrix-Parenthesization Problem

527

(3)

Summary and Discussion

530

(1)

Bibliographic Remarks

531

(6)

Problems

532

(5)

Fast Fourier Transform

537

(28)

The Serial Algorithm

538

(3)

The Binary-Exchange Algorithm

541

(12)

A Full Bandwidth Network

541

(7)

Limited Bandwidth Network

548

(3)

Extra Computations in Parallel FFT

551

(2)

The Transpose Algorithm

553

(7)

Two-Dimensional Transpose Algorithm

553

(3)

The Generalized Transpose Algorithm

556

(4)

Bibliographic Remarks

560

(5)

Problems

562

(3)

Appendix A Complexity of Functions and Order Analysis

565

(4)

A.1 Complexity of Functions

565

(1)

A.2 Order Analysis of Functions

566

(3)

Bibliography

569

(42)

Author Index

611

(10)

Subject Index

621

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

9780201648652

0201648652

Summary

Table of Contents

Supplemental Materials

Rewards Program

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Introduction to Parallel Computing

9780201648652

0201648652

Summary

Table of Contents

Supplemental Materials

Rewards Program

Digital License