Advanced Backend Code Optimization

by ;
  • ISBN13:


  • ISBN10:


  • Format: Hardcover
  • Copyright: 2014-06-23
  • Publisher: Iste/Hermes Science Pub

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $154.00 Save up to $15.40
  • Rent Book $138.60
    Add to Cart Free Shipping


Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The Rental copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.


This book is a summary of more than a decade of research in the area of backend optimization. It contains the latest fundamental research results in this field. While existing books are often more oriented toward Masters students, this book is aimed more towards professors and researchers as it contains more advanced subjects.
It is unique in the sense that it contains information that has not previously been covered by other books in the field, with chapters on phase ordering in optimizing compilation; register saturation in instruction level parallelism; code size reduction for software pipelining; memory hierarchy effects and instruction level parallelism.
Other chapters provide the latest research results in well-known topics such as register need, and software pipelining and periodic register allocation.

Table of Contents

1 Prologue

1.1 Introduction

2 Phase Ordering in Optimising Compilation

2.1 Introduction to the Phase Ordering Problem

2.2 Background on Phase Ordering

2.2.1 Performance Modelling and Prediction

2.2.2 Some Attempts in Phase Ordering

2.3 Towards a Theoretical Model for Phase Ordering Problem

2.3.1 Decidability Results

2.3.2 Another Formulation of the Phase Ordering Problem

2.4 Examples of Decidable Simplified Cases

2.4.1 Models with Compilation Costs

2.4.2 One-pass Generative Compilers

2.5 Compiler Optimisation Parameters Space Exploration

2.5.1 Towards a Theoretical model

2.5.2 Examples of Simplified Decidable Cases

2.6 Conclusion on Phase Ordering

3 The Register Need

3.1 Data Dependence Graph and Processor Models

3.2 The Acyclic Register Need

3.3 The Periodic Register Need

3.3.1 Software Pipelining, Periodic Scheduling, Cyclic Scheduling

3.3.2 The Circular Lifetime Intervals

3.4 Computing the Periodic Register Need

3.5 Some Results on the Periodic Register Need

3.5.1 Minimal Periodic Register Need vs. Initiation Interval

3.5.2 Computing the Periodic Register Sufficiency

3.5.3 Stage Scheduling under Register Constraints

3.6 Conclusion on the Register Requirement

4 The Register Saturation

4.1 Motivations on the Register Saturation Concept

4.2 Computing the Acyclic Register Saturation

4.2.1 Characterising the Register Saturation

4.2.2 Efficient Algorithmic Heuristic for RS Computation

4.2.3 Experimental Efficiency of Greedy-k

4.3 Computing the Periodic Register Saturation

4.4 Conclusion on the Register Saturation

5 Spill Code Reduction

5.1 Introduction on Register Constraints in Software Pipelining

5.2 Related Work in Periodic Register Allocation

5.3 SIRA: Schedule Independant Register Allocation

5.3.1 Reuse Graphs

5.3.2 DDG Associated to Reuse Graph

5.3.3 Exact SIRA with Integer Linear Programming

5.3.4 SIRA with Fixed Reuse Edges

5.4 SIRALINA: An Efficient Polynomial Heuristic for SIRA

5.5 Experimental Results with SIRA

5.6 Conclusion on Spill Code Reduction

6 Exploiting the Register Access Delays

6.1 Problem Description of DDG Cycles with Non-positive Distances .

6.2 Eliminating Non-Positive Cycles

6.3 Experimental Results on Eliminating Non-Positive Cycles

6.4 Conclusion on Non-Positive Cycles Elimination

7 Loop Unrolling Degree Minimisation

7.1 Introduction

7.2 Unroll Degree Minimisation of Unscheduled Loops

7.2.1 Problem Description of Unroll Factor Minimisation for Unscheduled Loops

7.2.2 Algorithmic Solution for Unroll Factor Minimisation: Single Register Type

7.2.3 Solution for LCM Problem

7.2.4 Unroll Factor Minimisation in Presence of Multiple Register Types

7.2.5 Solution for Minimal Loop Unrolling

7.3 Unroll Degree Minimisation of Scheduled Loops

7.4 Experimental Results

7.5 Conclusion on Loop Unroll Degree Minimisation

8 Memory Hierarchy E↵ects and ILP

8.1 Problem of Memory Disambiguation at Runtime

8.1.1 Introduction

8.1.2 Related Work

8.1.3 Experimental Environment

8.1.4 Experimentation Methodology

8.1.5 Experimental Study of Cache Behavior

8.1.6 The E↵ectiveness of Load/Store Vectorisation

8.1.7 Conclusion on Memory Disambiguation Mechanisms

8.2 Data Preloading and Prefetching

8.2.1 Introduction

8.2.2 Related Work

8.2.3 Problems of Optimising Cache E↵ects at the Instruction Level

8.2.4 Target Processor Description

8.2.5 Our Methodology of Instruction-Level Code Optimisation

8.2.6 Experimental Results

8.2.7 Conclusion on Pre-fetching and Pre-Loading

9 Statistical Performance Analysis

9.1 Code Performance Variation

9.2 The Speedup-Test Protocole

9.2.1 The Observed Speedups

9.2.2 The Speedup of the Observed Average Execution Time

9.2.3 The Speedup of the Observed Median Execution Time, as well as Individual Runs

9.3 Discussion and Conclusion on the Speedup-Test

10 Epilogue

10.1 Problem of Instruction Selection

10.2 Perspectives on Code Optimisation for Multi-Core Processors

10.3 General Conclusion

A Benchmarks Presentation 125

A.1 Qualitative Benchmarks Presentation

A.2 Quantitative Benchmarks Presentation

A.3 Changing the Architectural Configuration of the Processor

B Experiments on Register Saturation

B.1 The Acyclic Register Saturation

B.1.1 On the Oprimal RS Computation

B.1.2 On the Accuracy of Greedy-k Heuristic vs. Optimal RS

B.1.3 Greedy-k Execution Times

B.2 The Periodic Register Saturation

B.2.1 Optimal PRS Computation

B.2.2 Approximate PRS Computation with Heuristic

C Experiments on SIRA

C.1 Efficiency of SIRALINA on Standalone DDG

C.1.1 Naming conventions for register optimisation orders

C.1.2 Experimental efficiency of SIRALINA

C.1.3 Measuring the Increase of the MII

C.1.4 Efficiency of SIRALINA Execution Times

C.2 Efficiency of SIRALINA plugged Inside Industrial Compiler

C.2.1 Static Performance Results

C.2.2 Execution Time Performance Results

D Experiments on Non-Positive Cycles Elimination 151

D.1 Experimental Setup

D.1.1 Heuristics Nomenclature

D.1.2 Empirical Efficiency Measures

D.2 Comparison of the Heuristics Execution Times

D.2.1 Time to Minimise Register Pressure for a fixed II

D.3 Convergence of the Proactive Heuristic (Iterative SIRALINA)

D.4 Qualitative Analysis of the Heuristics

D.4.1 Number of Saved Registers

D.4.2 Proportion of Success

D.4.3 Increase of the MII when Success

D.5 Conclusion on Non-Positive Cycles Elimination Strategy

E Experiments on Unroll Degree Minimisation

E.1 Standalone Experiments with Single Register types

E.1.1 Experiments with Unscheduled Loops

E.1.2 Results on Randomly Generated DDG

E.1.3 Experiments on Real DDG

E.1.4 Experiments with Scheduled Loops

E.2 Experiments with Multiple Register Types

F Experiments on Preloading and Prefetching

Rewards Program

Write a Review