Architecture-Independent Loop... | Buy

Architecture-independent programming and automatic parallelisation have long been regarded as two different means of alleviating the prohibitive costs of parallel software development. Building on recent advances in both areas, Architecture-Independent Loop Parallelisation proposes a unified approach to the parallelisation of scientific computing code. This novel approach is based on the bulk-synchronous parallel model of computation, and succeeds in automatically generating parallel code that is architecture-independent, scalable, and of analytically predictable performance.

Glossary of Notations

xiii

List of Figures

xvii

Introduction

1

(4)

Motivation

1

(1)

Parallelisation Approach Proposed in the Book

2

(1)

Organisation of the Book

3

(2)

The Bulk-Synchronous Parallel Model

5

(8)

Introduction

5

(1)

Bulk-Synchronous Parallel Computers

5

(1)

The BSP Programming Model

6

(1)

The BSP Cost Model

7

(1)

Assessing the Efficiency of BSP Code

8

(1)

The Development of BSP Applications

9

(1)

BSP Pseudocode

10

(3)

Data Dependence Analysis and Code Transformation

13

(10)

Introduction

13

(1)

Data Dependence

13

(4)

Definition

14

(1)

Data Dependence Representation

15

(1)

Dependence Tests

16

(1)

Dependence Graphs

16

(1)

Directed Acyclic Graphs

16

(1)

Code Transformation Techniques

17

(6)

Generalities

17

(1)

Loop Parallelisation

17

(1)

Loop Interchange and Loop Permutation

18

(1)

Loop Distribution

19

(1)

Loop Skewing, Wavefront Scheduling, and Iteration Space Tiling

20

(2)

Other Transformations for High-Performance Computing

22

(1)

Communication Overheads in Loop Nest Scheduling

23

(20)

Introduction

23

(2)

Related Work

25

(1)

Communication Overheads Due to Input Data

26

(12)

The Footprint Size of a Pure-Input Array

27

(9)

Input Communication Overheads Due to Input/Output Arrays

36

(2)

Inter-Tile Communication Overheads

38

(4)

Summary

42

(1)

Template-Matching Parallelisation

43

(42)

Introduction

43

(1)

Related Work

43

(1)

Communication-Free Scheduling

44

(8)

Scheduling Loop Nests Comprising Fully Parallel Loops

45

(2)

Scheduling Loop Nests with no Fully Parallel Loop

47

(3)

Improving the Load Balancing of Communication-Free Scheduling

50

(2)

Wavefront Block Scheduling

52

(9)

Scheduling Fully Permutable Loop Nests

53

(5)

Extension to Generic Uniform-Dependence Loop Nests

58

(2)

Improving the Load Balancing of Wavefront Block Scheduling

60

(1)

Iterative Scheduling

61

(5)

Description of the Technique

61

(3)

Extension to Generic Loops and Load Balancing

64

(1)

Comparison with Wavefront Block Scheduling

65

(1)

Reduction Scheduling

66

(2)

Recurrence Scheduling

68

(2)

Scheduling Broadcast Loop Nests

70

(12)

Definition of a Broadcast Loop Nest

70

(3)

Scheduling Through Broadcast Implementation

73

(5)

Scheduling Through Broadcast Elimination

78

(3)

Comparison of the Two Approaches

81

(1)

Summary

82

(3)

Generic Loop Nest Parallelisation

85

(24)

Introduction

85

(1)

Related Work

86

(2)

Data Dependence Analysis

88

(1)

Potential Parallelism Identification

89

(6)

Data and Computation Partitioning

95

(6)

Communication and Synchronisation Generation

101

(4)

Performance Analysis

105

(2)

Summary

107

(2)

A Strategy and a Tool for Architecture-Independent Loop Parallelisation

109

(16)

Introduction

109

(1)

Related Work

109

(2)

A Two-Phase Strategy for Loop Nest Parallelisation

111

(1)

BSPscheduler: an Architecture-Independent Loop Paralleliser

112

(11)

The Structure of the Parallelisation Tool

112

(1)

The User Interface

113

(1)

The Parser Module

114

(2)

The Dependence Analysis Module

116

(1)

The Scheduling Modules

117

(3)

The Code Generation Module

120

(3)

Summary

123

(2)

The Effectiveness of Architecture-Independent Loop Parallelisation

125

(14)

Introduction

125

(1)

Matrix-Vector and Matrix-Matrix Multiplication

125

(2)

LU Decomposition

127

(2)

Algebraic Path Problem

129

(3)

Finite Difference Iteration on a Cartersian Grid

132

(2)

Merging

134

(1)

Summary

134

(5)

Conclusions

139

(6)

Summary of Contributions and Concluding Remarks

139

(3)

Future work directions

142

(3)

Appendix A. Theorem proofs

145

(6)

Appendix B. Syntax of the BSPscheduler input language

151

(4)

Appendix C. Syntax of the BSPscheduler output language

155

(2)

Appendix D. Automatically generated code for Example 7.5

157

(4)

Bibliography

161

(10)

Index

171

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.