Introduction | p. 1 |
Basic goals of the book | p. 1 |
What do I get for one Watt today? | p. 1 |
Main memory bottleneck | p. 3 |
Optimize resource usage | p. 3 |
Application design | p. 4 |
Organization of the book | p. 4 |
Historical aspects | p. 4 |
Parameterization | p. 5 |
Models | p. 5 |
Core optimization | p. 6 |
Node optimization | p. 6 |
Cluster optimization | p. 6 |
Grid-brokering to save energy | p. 7 |
Historical highlights | p. 9 |
Evolution of computing | p. 9 |
The first computer companies | p. 14 |
ERA, EMCC and Univac | p. 14 |
Control Data Corporation, CDC | p. 14 |
Cray Research | p. 15 |
Thinking Machines Corporation | p. 16 |
International Business Machines (IBM) | p. 17 |
The ASCI effort | p. 18 |
The Japanese efforts | p. 19 |
The computer generations | p. 20 |
The evolution in computing performance | p. 20 |
Performance/price evolution | p. 22 |
Evolution of basic software | p. 22 |
Evolution of algorithmic complexity | p. 23 |
The TOP500 list | p. 25 |
Outlook with the TOP500 curves | p. 27 |
The GREEN500 List | p. 28 |
Proposal for a REAL500 list | p. 30 |
Parameterization | p. 31 |
Definitions | p. 31 |
Parameterization of applications | p. 35 |
Application parameter set | p. 35 |
Parameterization of BLAS library routines | p. 36 |
SMXV: Parameterization of sparse matrix*vector operation | p. 38 |
Parameterization of a computational nodes Pi ∈ ri | p. 39 |
Parameterization of the interconnection networks | p. 41 |
Types of networks | p. 41 |
Parameterization of clusters and networks | p. 42 |
Parameters related to running applications | p. 44 |
Conclusion | p. 47 |
Models | p. 49 |
The performance prediction model | p. 49 |
The execution time evaluation model (ETEM) | p. 53 |
A network performance model | p. 53 |
The extended ¿ - ¿ model | p. 55 |
Validation of the models | p. 56 |
Methodology | p. 56 |
Example: The full matrix*matrix multiplication DGEMM | p. 57 |
Example: Sparse matrix*vector multiplication SMXV | p. 59 |
Core optimization | p. 63 |
Some useful notions | p. 63 |
Data hierarchy | p. 63 |
Data representation | p. 64 |
Floating point operations | p. 67 |
Pipelining | p. 68 |
Single core optimization | p. 70 |
Single core architectures | p. 70 |
Memory conflicts | p. 70 |
Indirect addressing | p. 74 |
Unrolling | p. 75 |
Dependency | p. 76 |
Inlining | p. 78 |
If statement in a loop | p. 78 |
Code porting aspects | p. 79 |
How to develop application software | p. 83 |
Application to plasma physics codes | p. 84 |
Tokamaks and Stellerators | p. 84 |
Optimization of VMEC | p. 88 |
Optimization of TERPSICHORE | p. 91 |
Conclusions for single core optimization | p. 94 |
Node optimization | p. 95 |
Shared memory computer architectures | p. 95 |
SMP/NUMA architectures | p. 95 |
The Cell | p. 99 |
GPGPU for HPC | p. 100 |
Node comparison and OpenMP | p. 105 |
Race condition with OpenMP | p. 109 |
Application optimization with OpenMP: the 3D Helmholtz solver | p. 110 |
Fast Helmholtz solver for parallelepipedic geometries | p. 111 |
NEC SX-5 reference benchmark | p. 113 |
Single processor benchmarks | p. 114 |
Parallelizalion with OpenMP | p. 115 |
Parallelizalion with MPI | p. 115 |
Conclusion | p. 119 |
Application optimization with OpenMP: TERPSICHORE | p. 119 |
Cluster optimization | p. 121 |
Introduction on parallelization | p. 121 |
Internode communication networks | p. 121 |
Network architectures | p. 121 |
Comparison between network architectures | p. 129 |
Distributed memory parallel computer architectures | p. 131 |
Integrated parallel computer architectures | p. 131 |
Commodity cluster architectures | p. 134 |
Energy consumption issues | p. 136 |
The issue of resilience | p. 137 |
Type of parallel applications | p. 138 |
Embarrassingly parallel applications | p. 138 |
Applications with point-to-point communications | p. 138 |
Applications with multicast communication needs | p. 139 |
Shared memory applications (OpenMP) | p. 139 |
Components based applications | p. 139 |
Domain decomposition techniques | p. 139 |
Test example: The Gyrotron | p. 140 |
The geometry and the mesh | p. 142 |
Connectivity conditions | p. 142 |
Parallel matrix solver | p. 143 |
The electrostatic precipitator | p. 145 |
Scheduling of parallel applications | p. 146 |
Static scheduling | p. 146 |
Dynamic scheduling | p. 146 |
SpecuLOOS | p. 147 |
Introduction | p. 147 |
Test case description | p. 147 |
Complexity on one node | p. 149 |
Wrong complexity on the Blue Gene/L | p. 150 |
Fine results on the Blue Gene/L | p. 151 |
Conclusions | p. 151 |
TERPSICHORE | p. 153 |
Parallelization of the LEMan code with MPI and OpenMP | p. 154 |
Introduction | p. 154 |
Parallelization | p. 154 |
CPU time results | p. 156 |
Conclusions | p. 159 |
Grid-level Brokering to save energy | p. 161 |
About Grid resource brokering | p. 161 |
An Introduction to ïanos | p. 162 |
Job Submission Scenario | p. 164 |
The cost model | p. 165 |
Mathematical formulation | p. 165 |
CPU costs Ke | p. 167 |
License fees Kl | p. 169 |
Costs due to waiting time Kw | p. 169 |
Energy costs Keco | p. 169 |
Data transfer costs Kd | p. 171 |
Example: The Pleiades clusters CPU cost per hour | p. 171 |
Different currencies in a Grid environment | p. 173 |
The implementation | p. 173 |
Architecture & Design | p. 174 |
The Grid Adapter | p. 174 |
The Meta Scheduling Service (MSS) | p. 175 |
The Resource Broker | p. 176 |
The System Information | p. 177 |
The Data Warehouse | p. 177 |
The Monitoring Service | p. 177 |
The Monitoring Module VAMOS | p. 178 |
Integration with UNICORL Grid System | p. 179 |
Scheduling algorithm | p. 179 |
User Interfaces to the ïanos framework | p. 181 |
DVS-able processors | p. 182 |
Power consumption of a CPU | p. 183 |
An algorithm to save energy | p. 184 |
First results with SMXV | p. 185 |
A first implementation | p. 186 |
Conclusions | p. 188 |
Recommendations | p. 189 |
Application oriented recommendations | p. 189 |
Code development | p. 189 |
Code validation | p. 189 |
Porting codes | p. 190 |
Optimizing parallelized applications | p. 190 |
Race condition | p. 190 |
Hardware and basic software aspects | p. 191 |
Basic software | p. l91 |
Choice of system software | p. 192 |
Energy reduction | p. 192 |
Processor frequency adaptation | p. 192 |
Improved cooling | p. 193 |
Choice of optimal resources | p. 193 |
Best choice of new computer | p. 193 |
Last but not least | p. 194 |
Miscellaneous | p. 194 |
Course material | p. 194 |
A new REAL500 List | p. 194 |
Glossary | p. 197 |
References | p. 205 |
About the authors | p. 213 |
Index | p. 215 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.