did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9783540419457

Compiler Optimizations for Scalable Parallel Systems : Languages, Compilation Techniques, and Run Time Systems

by ; ;
  • ISBN13:

    9783540419457

  • ISBN10:

    3540419454

  • Format: Paperback
  • Copyright: 2001-11-01
  • Publisher: Springer Verlag
  • Purchase Benefits
List Price: $129.00 Save up to $110.44
  • Digital
    $40.22
    Add to Cart

    DURATION
    PRICE

Supplemental Materials

What is included with this book?

Summary

Scalable parallel systems or, more generally, distributed memory systems offer a challenging model of computing and pose fascinating problems regarding compiler optimization, ranging from language design to run time systems. Research in this area is foundational to many challenges from memory hierarchy optimizations to communication optimization. This unique, handbook-like monograph assesses the state of the art in the area in a systematic and comprehensive way. The 21 coherent chapters by leading researchers provide complete and competent coverage of all relevant aspects of compiler optimization for scalable parallel systems. The book is divided into five parts on languages, analysis, communication optimizations, code generation, and run time systems. This book will serve as a landmark source for education, information, and reference to students, practitioners, professionals, and researchers interested in updating their knowledge about or active in parallel computing.

Table of Contents

Preface v
Santosh Pande
Dharma P. Agrawal
Introduction xxi
Santosh Pande
Dharma P. Agrawal
Compiling for Distributed Memory Multiprocessors xxi
Motivation xxi
Complexity xxii
Outline of the Monograph xxii
Future Directions xxvii
Section I : Languages
High Performance Fortran 2.0
3(42)
Ken Kennedy
Charles Koelbel
Introduction
3(1)
History and Overview of HPF
3(4)
Data Mapping
7(11)
Basic Language Features
7(6)
Advanced Topics
13(5)
Data Parallelism
18(16)
Basic Language Features
19(10)
Advanced Topics
29(5)
Task Parallelism
34(5)
EXTRINSIC Procedures
34(3)
The TASK_REGION Directive
37(2)
Input and Output
39(2)
Summary and Future Outlook
41(4)
The Sisal Project: Real World Functional Programming
45(28)
Jean-Luc Gaudiot
Tom DeBoni
John Feo
Wim Bohm
Walid Najjar
Patrick Miller
Introduction
45(1)
The Sisal Language: A Short Tutorial
46(3)
An Early Implementation: The Optimizing Sisal Compiler
49(4)
Update in Place and Copy Elimination
49(1)
Build in Place
50(1)
Reference Counting Optimization
51(1)
Vectorization
51(1)
Loop Fusion, Double Buffering Pointer Swap, and Inversion
51(2)
Sisal90
53(5)
The Foreign Language Interface
54(4)
A Prototype Distributed-Memory SISAL Compiler
58(4)
Base Compiler
59(1)
Rectangular Arrays
59(1)
Block Messages
60(1)
Multiple Alignment
60(1)
Results
61(1)
Further Work
62(1)
Architecture Support for Multithreaded Execution
62(7)
Blocking and Non-blocking Models
63(1)
Code Generation
64(4)
Summary of Performance Results
68(1)
Conclusions and Future Research
69(4)
HPC++ and the HPC++Lib Toolkit
73(36)
Dennis Gannon
Peter Beckman
Elizabeth Johnson
Todd Green
Mike Levine
Introduction
73(1)
The HPC++ Programming and Execution Model
74(4)
Level 1 HPC++
75(1)
The Parallel Standard Template Library
76(1)
Parallel Iterators
77(1)
Parallel Algorithms
77(1)
Distributed Containers
78(1)
A Simple Example: The Spanning Tree of a Graph
78(4)
Multi-threaded Programming
82(14)
Synchronization
84(8)
Examples of Multi-threaded Computations
92(4)
Implementing the HPC++ Parallel Loop Directives
96(3)
Multi-context Programming and Global Pointers
99(6)
Remote Function and Member Calls
101(2)
Using Corba IDL to Generate Proxies
103(2)
The SPMD Execution Model
105(1)
Barrier Synchronization and Collective Operations
105(1)
Conclusion
106(3)
A Concurrency Abstraction Model for Avoiding Inheritance Anomaly in Object-Oriented Programs
109(32)
Sandeep Kumar
Dharma P. Agrawal
Introduction
109(4)
Approaches to Parallelism Specification
113(2)
Issues in Designing a COOPL
113(1)
Issues in Designing Libraries
114(1)
What Is the Inheritance Anomaly?
115(5)
State Partitioning Anomaly (SPA)
116(2)
History Sensitiveness of Acceptable States Anomaly (HSASA)
118(1)
State Modification Anomaly (SMA)
118(1)
Anomaly A
119(1)
Anomaly B
120(1)
What Is the Reusability of Sequential Classes?
120(1)
A Framework for Specifying Parallelism
121(1)
Previous Approaches
122(1)
The Concurrency Abstraction Model
123(3)
The CORE Language
126(3)
Specifying a Concurrent Region
126(1)
Defining an AC
126(1)
Defining a Parallel Block
127(2)
Synchronization Schemes
129(1)
Illustrations
129(4)
Reusability of Sequential Classes
130(1)
Avoiding the Inheritance Anomaly
131(2)
The Implementation Approach
133(1)
Conclusions and Future Directions
134(7)
Section II: Analysis
Loop Parallelization Algorithms
141(32)
Alain Darte
Yves Robert
Frederic Vivien
Introduction
141(1)
Input and Output of Parallelization Algorithms
142(3)
Input: Dependence Graph
143(1)
Output: Nested Loops
144(1)
Dependence Abstractions
145(4)
Dependence Graphs and Distance Sets
145(2)
Polyhedral Reduced Dependence Graphs
147(1)
Definition and Simulation of Classical Dependence Representations
148(1)
Allen and Kennedy's Algorithm
149(3)
Algorithm
150(1)
Power and Limitations
151(1)
Wolf and Lam's Algorithm
152(4)
Purpose
153(1)
Theoretical Interpretation
153(1)
The General Algorithm
154(1)
Power and Limitations
155(1)
Darte and Vivien's Algorithm
156(11)
Another Algorithm Is Needed
156(2)
Polyhedral Dependences: A Motivating Example
158(2)
Illustrating Example
160(2)
Uniformization Step
162(1)
Scheduling Step
162(3)
Schematic Explanations
165(1)
Power and Limitations
166(1)
Feautrier's Algorithm
167(2)
Conclusion
169(4)
Array Dataflow Analysis
173(48)
Paul Feautrier
Introduction
173(3)
Exact Array Dataflow Analysis
176(14)
Notations
176(1)
The Program Model
176(5)
Data Flow Analysis
181(8)
Summary of the Algorithm
189(1)
Related Work
190(1)
Approximate Array Dataflow Analysis
190(14)
From ADA to FADA
191(4)
Introducing Parameters
195(2)
Taking Properties of Parameters into Account
197(4)
Eliminating Parameters
201(1)
Related Work
202(2)
Analysis of Complex Statements
204(4)
What Is a Complex Statement
204(2)
ADA in the Presence of Complex Statements
206(1)
Procedure Calls as Complex Statements
206(2)
Applications of ADA and FADA
208(6)
Program Comprehension and Debugging
209(2)
Parallelization
211(1)
Array Expansion and Array Privatization
212(2)
Conclusions
214(7)
Appendix : Mathematical Tools
214(1)
Polyhedra and Polytopes
214(1)
Z-modules
215(1)
Z-polyhedra
216(1)
Parametric Problems
216(5)
Interprocedural Analysis Based on Guarded Array Regions
221(26)
Zhiyuan Li
Junjie Gu
Gyungho Lee
Introduction
221(2)
Preliminary
223(3)
Traditional Flow-Insensitive Summaries
223(2)
Array Data Flow Summaries
225(1)
Guarded Array Regions
226(6)
Operations on GAR's
228(2)
Predicate Operations
230(2)
Constructing Summary GAR's Interprocedurally
232(6)
Hierarchical Supergraph
232(1)
Summary Algorithms
233(2)
Expansions
235(3)
Implementation Considerations
238(2)
Symbolic Analysis
238(1)
Region Numbering
239(1)
Range Operations
240(1)
Application to Array Privatization and Preliminary Experimental Results
240(3)
Array Privatization
241(1)
Preliminary Experimental Results
241(2)
Related Works
243(1)
Conclusion
244(3)
Automatic Array Privatization
247(38)
Peng Tu
David Padua
Introduction
247(1)
Background
248(2)
Algorithm for Array Privatization
250(11)
Data Flow Framework
250(2)
Inner Loop Abstraction
252(4)
An Example
256(1)
Profitability of Privatization
257(1)
Last Value Assignment
258(3)
Demand-Driven Symbolic Analysis
261(16)
Gated Single Assignment
263(1)
Demand-Driven Backward Substitution
264(2)
Backward Substitution in the Presence of Gating Functions
266(1)
Examples of Backward Substitution
267(2)
Bounds of Symbolic Expression
269(1)
Comparison of Symbolic Expressions
269(3)
Recurrence and the μ Function
272(1)
Bounds of Monotonic Variables
273(1)
Index Array
274(1)
Conditional Data Flow Analysis
275(1)
Implementation and Experiments
276(1)
Related Work
277(8)
Section III : Communication Optimizations
Optimal Tiling for Minimizing Communication in Distributed Shared-Memory Multiprocessors
285(54)
Anant Agarwal
David Kranz
Rajeev Barua
Venkat Natarajan
Introduction
285(4)
Contributions and Related Work
286(2)
Overview of the Paper
288(1)
Problem Domain and Assumptions
289(3)
Program Assumptions
289(2)
System Model
291(1)
Loop Partitions and Data Partitions
292(3)
A Framework for Loop and Data Partitioning
295(19)
Loop Tiles in the Iteration Space
296(2)
Footprints in the Data Space
298(2)
Size of a Footprint for a Single Reference
300(4)
Size of the Cumulative Footprint
304(7)
Minimizing the Size of the Cumulative Footprint
311(3)
General Case of G
314(4)
G Is Invertible, but Not Unimodular
314(2)
Columns of G Are Dependent and the Rows Are Independent
316(1)
The Rows of G Are Dependent
316(2)
Other System Environments
318(4)
Coherence-Related Cache Misses
318(2)
Effect of Cache Line Size
320(1)
Data Partitioning in Distributed-Memory Multicomputers
320(2)
Combined Loop and Data Partitioning in DSMs
322(6)
The Cost Model
322(3)
The Multiple Loops Heuristic Method
325(3)
Implementation and Results
328(6)
Algorithm Simulator Experiments
330(1)
Experiments on the Alewife Multiprocessor
330(4)
Conclusions
334(5)
A Formulation of Loop Tiles Using Bounding Hyperplanes
337(1)
Synchronization References
337(2)
Communication-Free Partitioning of Nested Loops
339(46)
Kuei-Ping Shih
Chua-Huang Huang
Jang-Ping Sheu
Introduction
339(2)
Fundamentals of Array References
341(6)
Iteration Spaces and Data Spaces
342(1)
Reference Functions
343(1)
Properties of Reference Functions
343(4)
Loop-Level Partitioning
347(18)
Iteration and Data Spaces Partitioning - Uniformly Generated References
347(6)
Hyperplane Partitioning of Data Space
353(6)
Hyperplane Partitioning of Iteration and Data Spaces
359(6)
Statement-Level Partitioning
365(12)
Affine Processor Mapping
366(6)
Hyperplane Partitioning
372(5)
Comparisons and Discussions
377(4)
Conclusions
381(4)
Solving Alignment Using Elementary Linear Algebra
385(28)
Vladimir Kotlyar
David Bau
Induprakas Kodukula
Keshav Pingali
Paul Stodghill
Introduction
385(3)
Linear Alignment
388(5)
Equational Constraints
388(2)
Reduction to Null Space Computation
390(1)
Remarks
391(1)
Reducing the Solution Basis
392(1)
Affine Alignment
393(3)
Encoding Affine Constraints as Linear Constraints
393(3)
Replication
396(2)
Formulation of Replication
397(1)
Heuristics
398(4)
Lessons from Some Common Computational Kernels
399(3)
Implications for Alignment Heuristic
402(1)
Conclusion
402(11)
Reducing the Solution Matrix
404(1)
Unrelated Constraints
404(1)
General Procedure
405(3)
A Comment on Affine Encoding
408(5)
A Compilation Method for Communication--Efficient Partitioning of DOALL Loops
413(32)
Santosh Pande
Tareq Bali
Introduction
413(1)
DOALL Partitioning
414(7)
Motivating Example
415(4)
Our Approach
419(2)
Terms and Definitions
421(2)
Example
422(1)
Problem
423(4)
Compatibility Subsets
423(1)
Cyclic Directions
424(3)
Communication Minimization
427(2)
Algorithm : Maximal Compatibility Subsets
427(1)
Algorithm : Maximal Fibonacci Sequence
428(1)
Data Partitioning
428(1)
Partition Merging
429(3)
Granularity Adjustment
431(1)
Load Balancing
431(1)
Mapping
432(1)
Example : Texture Smoothing Code
432(3)
Performance on Cray T3D
435(10)
Conclusions
440(5)
Compiler Optimization of Dynamic Data Distributions for Distributed-Memory Multicomputers
445(40)
Daniel J. Palermo
Eugene W. Hodges IV
Prithviraj Banerjee
Introduction
445(2)
Related Work
447(2)
Dynamic Distribution Selection
449(13)
Motivation for Dynamic Distributions
449(1)
Overview of the Dynamic Distribution Approach
450(1)
Phase Decomposition
451(6)
Phase and Phase Transition Selection
457(5)
Data Redistribution Analysis
462(3)
Reaching Distributions and the Distribution Flow Graph
462(1)
Computing Reaching Distributions
463(1)
Representing Distribution Sets
464(1)
Interprocedural Redistribution Analysis
465(7)
Distribution Synthesis
467(1)
Redistribution Synthesis
468(3)
Static Distribution Assignment (SDA)
471(1)
Results
472(8)
Synthetic HPF Redistribution Example
473(2)
2-D Alternating Direction Implicit (ADI2D) Iterative Method
475(3)
Shallow Water Weather Prediction Benchmark
478(2)
Conclusions
480(5)
A Framework for Global Communication Analysis and Optimizations
485(40)
Manish Gupta
Introduction
485(2)
Motivating Example
487(1)
Available Section Descriptor
488(6)
Representation of ASD
490(2)
Computing Generated Communication
492(2)
Data Flow Analysis
494(11)
Data Flow Variables and Equations
495(3)
Decomposition of Bidirectional Problem
498(1)
Overall Data-Flow Procedure
499(6)
Communication Optimizations
505(3)
Elimination of Redundant Communication
505(1)
Reduction in Volume of Communication
506(1)
Movement of Communication for Subsumption and for Hiding Latency
507(1)
Extensions: Communication Placement
508(2)
Operations on Available Section Descriptors
510(6)
Operations on Bounded Regular Section Descriptors
512(2)
Operations on Mapping Function Descriptors
514(2)
Preliminary Implementation and Results
516(3)
Related Work
519(2)
Global Communication Optimizations
519(1)
Data Flow Analysis and Data Descriptors
520(1)
Conclusions
521(4)
Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded Architecture
525(28)
Andrew Sohn
Yuetsu Kodama
Jui-Yuan Ku
Mitsuhisa Sato
Yoshinori Yamaguchi
Introduction
525(2)
Multithreading Principles and Its Realization
527(8)
The Principle
527(3)
The EM-X Multithreaded Distributed-Memory Multiprocessor
530(3)
Architectural Support for Fine-Grain Multithreading
533(2)
Designing Multithreaded Algorithms
535(5)
Multithreaded Bitonic Sorting
535(3)
Multithreaded Fast Fourier Transform
538(2)
Overlapping Analysis
540(4)
Analysis of Switches
544(3)
Conclusions
547(6)
Section IV: Code Generation
Advanced Code Generation for High Performance Fortran
553(44)
Vikram Adve
John Mellor-Crummey
Introduction
553(3)
Background: The Code Generation Problem for HPF
556(5)
Communication Analysis and Code Generation for HPF
556(2)
Previous Approaches to Communication Analysis and Code Generation
558(3)
An Integer Set Framework for Data-Parallel Compilation
561(4)
Primitive Components of the Framework
561(1)
Implementation of the Framework
562(3)
Computation Partitioning
565(8)
Computation Partitioning Models
565(2)
Code Generation to Realize Computation Partitions
567(6)
Communication Code Generation
573(11)
Communication Generation with Message Vectorization and Coalescing
577(4)
Recognizing In-Place Communication
581(1)
Implementing Loop-Splitting for Reducing Communication Overhead
582(2)
Control Flow Simplification
584(6)
Motivation
584(4)
Overview of Algorithm
588(1)
Evaluation and Discussion
589(1)
Conclusions
590(7)
Integer Lattice Based Methods for Local Address Generation for Block-Cyclic Distributions
597(52)
J. Ramanujam
Introduction
597(2)
Background and Related Work
599(4)
Related Work on One-Level Mapping
600(2)
Related Work on Two-Level Mapping
602(1)
A Lattice Based Approach for Address Generation
603(2)
Assumptions
603(1)
Lattices
604(1)
Determination of Basis Vectors
605(9)
Basis Determination Algorithm
607(2)
Extremal Basis Vectors
609(3)
Improvements to the Algorithm for s > k
612(1)
Complexity
613(1)
Address Sequence Generation by Lattice Enumeration
614(2)
Optimization of Loop Enumeration: GO-LEFT and GO-RIGHT
616(4)
Implementation
620(1)
Experimental Results for One-Level Mapping
620(6)
Address Sequence Generation for Two-Level Mapping
626(2)
Problem Statement
626(2)
Algorithms for Two-Level Mapping
628(7)
Itable: An Algorithm That Constructs a Table of Offsets
629(2)
Optimization of the Itable Method
631(3)
Search-Based Algorithms
634(1)
Experimental Results for Two-Level Mapping
635(3)
Other Problems in Code Generation
638(3)
Communication Generation
639(1)
Union and Difference of Regular Sections
640(1)
Code Generation for Complex Subscripts
640(1)
Data Structures for Runtime Efficiency
640(1)
Array Redistribution
641(1)
Summary and Conclusions
641(8)
Section V: Task Parallelism, Dynamic Data Structures and Run Time Systems
A Duplication Based Compile Time Scheduling Method for Task Parallelism
649(34)
Sekhar Darbha
Dharma P. Agrawal
Introduction
649(3)
STDS Algorithm
652(12)
Complexity Analysis
663(1)
Illustration of the STDS Algorithm
664(6)
Performance of the STDS Algorithm
670(10)
CRC Is Satisfied
670(2)
Application of Algorithm for Random Data
672(2)
Application of Algorithm to Practical DAGs
674(1)
Scheduling of Diamond DAGs
675(5)
Comparison with Other Algorithms
680(1)
Conclusions
680(3)
SPMD Execution in the Presence of Dynamic Data Structures
683(26)
Rajiv Gupta
Introduction
683(1)
Language Support for Regular Data Structures
684(9)
Processor Structures
685(1)
Dynamic Data Structures
685(3)
Name Generation and Distribution Strategies
688(1)
Examples
689(4)
Compiler Support for Regular Data Structures
693(10)
Representing Pointers and Data Structures
693(1)
Translation of Pointer Operations
694(9)
Supporting Irregular Data Structures
703(2)
Compile-Time Optimizations
705(1)
Related Work
706(3)
Supporting Dynamic Data Structures with Olden
709(42)
Martin C. Carlisle
Anne Rogers
Introduction
709(2)
Programming Model
711(4)
Programming Language
711(1)
Data Layout
711(3)
Marking Available Parallelism
714(1)
Execution Model
715(7)
Handling Remote References
715(3)
Introducing Parallelism
718(1)
A Simple Example
719(3)
Selecting Between Mechanisms
722(9)
Using Local Path Lengths
723(1)
Update Matrices
724(2)
The Heuristic
726(5)
Experimental Results
731(4)
Comparison with Other Published Work
733(1)
Heuristic Results
733(2)
Summary
735(1)
Profiling in Olden
735(4)
Verifying Local Path Lengths
736(3)
Related Work
739(6)
Gupta's Work
741(1)
Object-Oriented Systems
741(2)
Extensions of C with Fork-Join Parallelism
743(1)
Other Related Work
744(1)
Conclusions
745(6)
Runtime and Compiler Support for Irregular Computations
751(28)
Raja Das
Yuan-Shin Hwang
Joel Saltz
Alan Sussman
Introduction
751(2)
Overview of the CHAOS Runtime System
753(5)
Compiler Transformations
758(11)
Transformation Example
759(4)
Definitions
763(2)
Transformation Algorithm
765(4)
Experiments
769(6)
Hand Parallelization with CHAOS
769(4)
Compiler Parallelization Using CHAOS
773(2)
Conclusions
775(4)
Author Index 779

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program