A Communication Framework for Fault-Tolerant Parallel Execution | p. 1 |
The STAPL pList | p. 16 |
Hardware Support for OpenMP Collective Operations | p. 31 |
Loop Transformation Recipes for Code Generation and Auto-Tuning | p. 50 |
MIMD Interpretation on a GPU | p. 65 |
TL-DAE: Thread-Level Decoupled access/Execution for OpenMP on Cyclops-64 Many-Core Processor | p. 80 |
Mapping Streaming Languages to General Purpose Processors through Vectorization | p. 95 |
A Balanced Approach to application Performance Tuning | p. 111 |
Automatically Tuning Parallel and Parallelized Programs | p. 126 |
DFT Performance Prediction in FFTW | p. 140 |
Safe and Familiar Multi-core Programming by Means of a Hybrid Functional and Imperative Language | p. 157 |
Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement | p. 172 |
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers | p. 183 |
Programming with Intervals | p. 203 |
Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with on-Chip Local Memories | p. 218 |
Synchronization-Free Automatic Parallelization: Beyond Affine Iteration-Space Slicing | p. 233 |
Automatic Data Distribution for Improving Data Locality on the Cell BE Architecture | p. 247 |
Automatic Restructuring of Linked Data Structures | p. 263 |
Using the Meeting Graph Framework to Minimise Kernal Loop Unrolling for Scheduled Loops | p. 278 |
Efficient Tiled Loop Generation: D-Tiling | p. 293 |
Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization | p. 308 |
Speculative Optimizations for Parallel programs on Multicores | p. 323 |
Fastpath Speculative Parallelization | p. 338 |
PSnAp: Accurate Synthetic Address Streams through Memory Profiles | p. 353 |
Enforcing Textual Alignment of Collectives Using Dynamic Checks | p. 368 |
A Code Generation Approach for Auto- Vectoirization in the SPADE Compiler | p. 383 |
Portable Just-in-Time Specialization of Dynamically Typed Scripting Languages | p. 391 |
Reducing Training Time in a One-Shot Machine Learning-Based Compiler | p. 399 |
Optimizing Local Memory Allocation and Assignment through a Decoupled Approach | p. 408 |
Unrolling Loops Containing Task Parallelism | p. 416 |
Auther Index | p. 425 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.