rent-now

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

9781558602861

Optimizing Compilers for Modern Architectures

by Allen; Kennedy
  • ISBN13:

    9781558602861

  • ISBN10:

    1558602860

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 2001-09-26
  • Publisher: Elsevier Science
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $138.00 Save up to $0.14
  • Buy New
    $137.86
    Add to Cart Free Shipping Icon Free Shipping

    PRINT ON DEMAND: 2-4 WEEKS. THIS ITEM CANNOT BE CANCELLED OR RETURNED.

Summary

Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications. The approaches presented are based on research conducted over the past two decades, emphasizing the strategies implemented in research prototypes at Rice University and in several associated commercial systems. Randy Allen and Ken Kennedy have provided an indispensable resource for researchers, practicing professionals, and graduate students engaged in designing and optimizing compilers for modern computer architectures. * Offers a guide to the simple, practical algorithms and approaches that are most effective in real-world, high-performance microprocessor and parallel systems. * Demonstrates each transformation in worked examples. * Examines how two case study compilers implement the theories and practices described in each chapter. * Presents the most complete treatment of memory hierarchy issues of any compiler text. * Illustrates ordering relationships with dependence graphs throughout the book. * Applies the techniques to a variety of languages, including Fortran 77, C, hardware definition languages, Fortran 90, and High Performance Fortran. * Provides extensive references to the most sophisticated algorithms known in research.

Table of Contents

Preface xvii
Compiler Challenges for High-Performance Architectures
1(34)
Overview and Goals
1(3)
Pipelining
4(7)
Pipelined Instruction Units
4(2)
Pipelined Execution Units
6(1)
Parallel Functional Units
7(1)
Compiling for Scalar Pipelines
8(3)
Vector Instructions
11(3)
Vector Hardware Overview
11(1)
Compiling for Vecotr Pipelines
12(2)
Superscalar and VLIW Processors
14(3)
Multiple-Issue Instruction Units
14(1)
Compiling for Multiple-Issue Processors
15(2)
Processor Parallelism
17(4)
Overview of Processor Parallelism
17(2)
Compiling for Asynchronous Parallelism
19(2)
Memory Hierarchy
21(2)
Overview of Memory Systems
21(1)
Compiling for Memory Hierarchy
22(1)
A Case Study: Matrix Multiplication
23(5)
Advanced Compiler Technology
28(3)
Dependence
28(2)
Transformations
30(1)
Summary
31(1)
Case Studies
31(1)
Historical Comments and References
32(3)
Exercises
33(2)
Dependence: Theory and Practice
35(38)
Introduction
35(1)
Dependence and Its Properties
36(20)
Load-Store Classification
37(1)
Dependence in Loops
38(3)
Dependence and Transformations
41(4)
Distance and Direction Vectors
45(4)
Loop-Carried and Loop-Independent Dependences
49(7)
Simple Dependence Testing
56(3)
Parallelization and Vectorization
59(10)
Parallelization
59(1)
Vectorization
60(3)
An Advanced Vectorization Algorithm
63(6)
Summary
69(1)
Case Studies
70(1)
Historical Comments and References
70(3)
Exercises
71(2)
Dependence Testing
73(62)
Introduction
73(6)
Background and Terminology
74(5)
Dependence Testing Overview
79(2)
Subscript Partitioning
80(1)
Merging Direction Vectors
81(1)
Single-Subscript Dependence Tests
81(30)
ZIV Test
82(1)
SIV Tests
82(12)
Multiple Induction-Variable Tests
94(17)
Testing in Coupled Groups
111(10)
The Delta Test
112(8)
More Powerful Multiple-Subscript Tests
120(1)
An Empirical Study
121(2)
Putting It All Together
123(6)
Summary
129(1)
Case Studies
130(1)
Historical Comments and References
131(4)
Exercises
132(3)
Preliminary Transformations
135(36)
Introduction
135(3)
Information Requirements
138(1)
Loop Normalization
138(3)
Data Flow Analysis
141(14)
Definition-Use Chains
142(3)
Dead Code Elimination
145(1)
Constant Propagation
146(2)
Static Single-Assignment Form
148(7)
Induction-Variable Exposure
155(11)
Forward Expression Substitution
156(2)
Induction-Variable Substitution
158(4)
Driving the Substitution Process
162(4)
Summary
166(1)
Case Studies
167(1)
Historical Comments and References
168(3)
Exercises
169(2)
Enhancing Fine-Grained Parallelism
171(68)
Introduction
171(2)
Loop Interchange
173(11)
Safety of Loop Interchange
174(3)
Profitability of Loop Interchange
177(2)
Loop Interchange and Vectorization
179(5)
Scalar Expansion
184(11)
Scalar and Array Renaming
195(7)
Node Splitting
202(3)
Recognition of Reductions
205(4)
Index-Set Splitting
209(5)
Threshold Analysis
209(2)
Loop Peeling
211(1)
Section-Based Splitting
212(2)
Run-Time Symbolic Resolution
214(2)
Loop Skewing
216(5)
Putting It All Together
221(5)
Complications of Real Machines
226(4)
Summary
230(1)
Case Studies
231(5)
PFC
231(1)
Ardent Titan Compiler
232(2)
Vectorization Performance
234(2)
Historical Comments and References
236(3)
Exercises
237(2)
Creating Coarse-Grained Parallelism
239(80)
Introduction
239(1)
Single-Loop Methods
240(31)
Privatization
240(5)
Loop Distribution
245(1)
Alignment
246(3)
Code Replication
249(5)
Loop Fusion
254(17)
Perfect Loop Nests
271(17)
Loop Interchange for Parallelization
271(4)
Loop Selection
275(4)
Loop Reversal
279(1)
Loop Skewing for Parallelization
280(4)
Unimodular Transformations
284(1)
Profitability-Based Parallelization Methods
285(3)
Imperfectly Nested Loops
288(9)
Multilevel Loop Fusion
289(3)
A Parallel Code Generation Algorithm
292(5)
An Extended Example
297(1)
Packaging of Parallelism
298(11)
Strip Mining
300(1)
Pipeline Parallelism
301(3)
Scheduling Parallel Work
304(2)
Guided Self-Scheduling
306(3)
Summary
309(1)
Case Studies
310(5)
PFC and ParaScope
310(1)
Ardent Titan Compiler
311(4)
Historical Comments and References
315(4)
Exercises
316(3)
Handling Control Flow
319(62)
Introduction
319(1)
If-Conversion
320(30)
Definition
321(1)
Branch Classification
322(1)
Forward Branches
323(4)
Exit Branches
327(6)
Backward Branches
333(3)
Complete Forward Branch Removal
336(2)
Simplification
338(5)
Iterative Dependences
343(5)
If-Reconstruction
348(2)
Control Dependence
350(26)
Constructing Control Dependence
352(3)
Control Dependence in Loops
355(1)
An Execution Model for Control Dependences
356(3)
Application of Control Dependence to Parallelization
359(17)
Summary
376(1)
Case Studies
376(1)
Historical Comments and References
377(4)
Exercises
378(3)
Improving Register Usage
381(88)
Introduction
381(1)
Scalar Register Allocation
381(6)
Data Dependence for Register Reuse
383(1)
Loop-Carried and Loop-Independent Reuse
384(2)
A Register Allocation Example
386(1)
Scalar Replacement
387(16)
Pruning the Dependence Graph
387(5)
Simple Replacement
392(1)
Handling Loop-Carried Dependences
392(1)
Dependences Spanning Multiple Iterations
393(1)
Eliminating Scalar Copies
394(1)
Moderating Register Pressure
395(1)
Scalar Replacement Algorithm
396(4)
Experimental Data
400(3)
Unroll-and-Jam
403(12)
Legality of Unroll-and-Jam
406(3)
Unroll-and-Jam Algorithm
409(3)
Effectiveness of Unroll-and-Jam
412(3)
Loop Interchange for Register Reuse
415(5)
Considerations for Loop Interchange
417(2)
Loop Interchange Algorithm
419(1)
Loop Fusion for Register Reuse
420(33)
Profitable Loop Fusion for Reuse
421(3)
Loop Alignment for Fusion
424(4)
Fusion Mechanics
428(6)
A Weighted Loop Fusion Algorithm
434(16)
Multilevel Loop Fusion for Register Reuse
450(3)
Putting It All Together
453(3)
Ordering the Transformations
453(1)
An Example: Matrix Multiplication
454(2)
Complex Loop Nests
456(9)
Loops with If Statements
457(2)
Trapezoidal Loops
459(6)
Summary
465(1)
Case Studies
465(1)
Historical Comments and References
466(3)
Exercises
467(2)
Managing Cache
469(42)
Introduction
469(2)
Loop Interchange for Spatial Locality
471(6)
Blocking
477(14)
Unaligned Data
478(2)
Legality of Blocking
480(2)
Profitability of Blocking
482(1)
A Simple Blocking Algorithm
483(2)
Blocking with Skewing
485(1)
Fusion and Alignment
486(3)
Blocking in Combination with Other Transformations
489(1)
Effectiveness
490(1)
Cache Management in Complex Loop Nests
491(4)
Triangular Cache Blocking
491(1)
Special-Purpose Transformations
492(3)
Software Prefetching
495(12)
A Software Prefetching Algorithm
496(10)
Effectiveness of Software Prefetching
506(1)
Summary
507(1)
Case Studies
508(1)
Historical Comments and References
509(2)
Exercises
510(1)
Scheduling
511(38)
Introduction
511(1)
Instruction Scheduling
512(24)
Machine Model
514(1)
Straight-Line Graph Scheduling
515(1)
List Scheduling
516(2)
Trace Scheduling
518(6)
Scheduling in Loops
524(12)
Vector Unit Scheduling
536(7)
Chaining
537(3)
Coprocessors
540(3)
Summary
543(1)
Case Studies
543(3)
Historical Comments and References
546(3)
Exercises
547(2)
Interprocedural Analysis and Optimization
549(56)
Introduction
549(1)
Interprocedural Analysis
550(42)
Interprocedural Problems
550(6)
Interprocedural Probem Classification
556(4)
Flow-Insensitive Side Effect Analysis
560(8)
Flow-Insensitive Alias Analysis
568(5)
Constant Propagation
573(5)
Kill Analysis
578(3)
Symbolic Analysis
581(4)
Array Section Analysis
585(3)
Call Graph Construction
588(4)
Interprocedural Optimization
592(3)
Inline Substitution
592(2)
Procedure Cloning
594(1)
Hybrid Optimizations
595(1)
Managing Whole-Program Compilation
595(4)
Summary
599(1)
Case Studies
600(2)
Historical Comments and References
602(3)
Exercises
604(1)
Dependence in C and Hardware Design
605(50)
Introduction
605(1)
Optimizing C
606(11)
Pointers
608(2)
Naming and Structures
610(2)
Loops
612(1)
Scoping and Statics
613(1)
Dialect
613(2)
Miscellaneous
615(2)
Hardware Design
617(34)
Hardware Description Languages
619(3)
Optimizing Simulation
622(17)
Synthesis Optimization
639(12)
Summary
651(1)
Case Studies
652(1)
Historical Comments and References
652(3)
Exercises
653(2)
Compiling Array Assignments
655(34)
Introduction
655(1)
Simple Scalarization
656(4)
Scalarization Transformations
660(8)
Loop Reversal
660(1)
Input Prefetching
661(5)
Loop Splitting
666(2)
Multidimensional Scalarization
668(15)
Simple Scalarization in Multiple Dimensions
670(1)
Outer Loop Prefetching
671(3)
Loop Interchange for Scalarization
674(3)
General Multidimensional Scalarization
677(4)
A Scalarization Example
681(2)
Considerations for Vector Machines
683(1)
Postscalarization Interchange and Fusion
684(2)
Summary
686(1)
Case Studies
687(1)
Historical Comments and References
687(2)
Exercises
688(1)
Compiling High Performance Fortran
689(46)
Introduction
689(5)
HPF Compiler Overview
694(4)
Basic Loop Compilation
698(12)
Distribution Propagation and Analysis
698(2)
Iteration Partitioning
700(4)
Communication Generation
704(6)
Optimization
710(18)
Communication Vectorization
710(6)
Overlapping Communication and Computation
716(1)
Alignment and Replication
717(2)
Pipelining
719(2)
Identification of Common Recurrences
721(1)
Storage Management
722(4)
Handling Multiple Dimensions
726(2)
Interprocedural Optimization for HPF
728(1)
Summary
729(1)
Case Studies
729(3)
Historical Comments and References
732(3)
Exercises
733(2)
Appendix Fundamentals of Fortran 90 735(8)
Introduction
735(1)
Lexical Properties
735(1)
Array Assignment
736(5)
Library Functions
741(1)
Further Reading
742(1)
References 743(22)
Index 765

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program