Programming Massively Parallel Processors

  • ISBN13:


  • ISBN10:


  • Format: Paperback
  • Copyright: 2010-01-22
  • Publisher: Morgan Kaufmann Pub
  • Purchase Benefits
  • Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $69.95
  • eBook
    Add to Cart


Supplemental Materials

What is included with this book?

  • The eBook copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.


Multi-core processors are no longer the future of computing-they are the present day reality. A typical mass-produced CPU features multiple processor cores, while a GPU (Graphics Processing Unit) may have hundreds or even thousands of cores. With the rise of multi-core architectures has come the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs.*Describes computational thinking techniques that will enable you to think about problems in ways that are amenable to high-performance parallel computing. *Utilizes CUDA (Compute Unified Device Architecture), NVIDIA's software development tool created specifically for massively parallel environments. *Shows you how to achieve both high-performance and high-reliability using the CUDA programming model as well as OpenCL.

Author Biography

David B. Kirk: Chief Scientist and NVIDIA Fellow at NVIDIA, a leader in visual computing technologies. Wen-mei W. Hwu: Walter J. Sanders III Advanced Micro Devices Endowed Chair in Electrical and Computer Engineering in the Coordinated Sciences Laboratory of the University of Illinois at Urbana-Champaign.

Table of Contents

Prefacep. xi
Acknowledgmentsp. xvii
Dedicationp. xix
Introductionp. 1
GPUs as Parallel Computersp. 2
Architecture of a Modern GPUp. 8
Why More Speed or Parallelism?p. 10
Parallel Programming Languages and Modelsp. 13
Overarching Goalsp. 15
Organization of the Bookp. 16
History Of GPU Computingp. 21
Evolution of Graphics Pipelinesp. 21
The Era of Fixed-Function Graphics Pipelinesp. 22
Evolution of Programmable Real-Time Graphicsp. 26
Unified Graphics and Computing Processorsp. 29
GPGPU: An Intermediate Stepp. 31
GPU Computingp. 32
Scalable GPUsp. 33
Recent Developmentsp. 34
Future Trendsp. 34
Introduction To Cudap. 39
Data Parallelismp. 39
Cuda Program Structurep. 41
A Matrix-Matrix Multiplication Examplep. 42
Device Memories and Data Transferp. 46
Kernel Functions and Threadingp. 51
Summaryp. 56
Function declarationsp. 56
Kernel launchp. 56
Predefined variablesp. 56
Runtime APIp. 57
Cuda Threadsp. 59
Cuda Thread Organizationp. 59
Using blockIdx and threadIdxp. 64
Synchronization and Transparent Scalabilityp. 68
Thread Assignmentp. 70
Thread Scheduling and Latency Tolerancep. 71
Summaryp. 74
Exercisesp. 74
Cuda÷ Memoriesp. 77
Importance of Memory Access Efficiencyp. 78
CUDA Device Memory Typesp. 79
A Strategy for Reducing Global Memory Trafficp. 83
Memory as a Limiting Factor to Parallelismp. 90
Summaryp. 92
Exercisesp. 93
Performance On Siderationsp. 95
More on Thread Executionp. 96
Global Memory Bandwidthp. 103
Dynamic Partitioning of SM Resourcesp. 111
Data Prefetchingp. 113
Instruction Mixp. 115
Thread Granularityp. 116
Measured Performance and Summaryp. 118
Exercisesp. 120
Floating Point Considerationsp. 125
Floating-Point Formatp. 126
Normalized Representation of Mp. 126
Excess Encoding of Ep. 127
Representable Numbersp. 129
Special Bit Patterns and Precisionp. 134
Arithmetic Accuracy and Roundingp. 135
Algorithm Considerationsp. 136
Summaryp. 138
Exercisesp. 138
Application Case Study: Advanced MRI Reconstructionp. 141
Application Backgroundp. 142
Iterative Reconstructionp. 144
Computing FHdp. 148
Determine the Kernel Parallelism Structurep. 149
Getting Around the Memory Bandwidth Limitationp. 156
Using Hardware Trigonometry Functionsp. 163
Experimental Performance Tuningp. 166
Final Evaluationp. 167
Exercisesp. 170
Application Case Study: Molecular Visualization and Analysisp. 173
Application Backgroundp. 174
A Simple Kernel Implementationp. 176
Instruction Execution Efficiencyp. 180
Memory Coalescingp. 182
Additional Performance Comparisonsp. 185
Using Multiple GPUsp. 187
Exercisesp. 188
Parallel Programming and Computational Thinkingp. 191
Goals of Parallel Programmingp. 192
Problem Decompositionp. 193
Algorithm Selectionp. 196
Computational Thinkingp. 202
Exercisesp. 204
A Brief Introduction To Opencl÷p. 205
Backgroundp. 205
Data Parallelism Modelp. 207
Device Architecturep. 209
Kernel Functionsp. 211
Device Management and Kernel Launchp. 212
Electrostatic Potential Map in OpenCLp. 214
Summaryp. 219
Exercisesp. 220
Conclusion And Future Outlookp. 221
Goals Revisitedp. 221
Memory Architecture Evolutionp. 223
Large Virtual and Physical Address Spacesp. 223
Unified Device Memory Spacep. 224
Configurable Caching and Scratch Padp. 225
Enhanced Atomic Operationsp. 226
Enhanced Global Memory Accessp. 226
Kernel Execution Control Evolutionp. 227
Function Calls within Kernel Functionsp. 227
Exception Handling in Kernel Functionsp. 227
Simultaneous Execution of Multiple Kernelsp. 228
Interruptible Kernelsp. 228
Core Performancep. 229
Double-Precision Speedp. 229
Better Control Flow Efficiencyp. 229
Programming Environmentp. 230
A Bright Outlookp. 230
Matrix Multiplication Host-Only Version Source Codep. 233
matrixmul . cup. 233
matrixmul_gold.cppp. 237
matrixmul . hp. 238
assist.hp. 239
Expected Outputp. 243
GPU Compute Capabilitiesp. 245
GPU Compute Capability Tablesp. 245
Memory Coalescing Variationsp. 246
Indexp. 251
Table of Contents provided by Ingram. All Rights Reserved.

Rewards Program

Write a Review