Preface.

Contributors.

PART 1. PROGRAMMING MODEL.

1. ClusterGOP: A High-Level Programming Environment for Clusters (Fan Chan, Jiannong Cao and Minyi Guo).

1.1 Introduction.

1.2 GOP Model and ClusterGOP Architecture.

1.3 VisualGOP.

1.4 The ClusterGOP Library.

1.5 MPMD Programming Support.

1.6 Programming Using ClusterGOP.

1.7 Summary.

2. The Challenge of Providing A High-Level Programming Model for High-Performance Computing (Barbara Chapman).

2.1 Introduction.

2.2 HPC Architectures.

2.3 HPC Programming Models: The First Generation.

2.4 The Second generation of HPC Programming Models.

2.5 OpenMP for DMPs.

2.6 Experiments with OpenMP on DMPs.

2.7 Conclusions.

3. SAT: Toward Structured Parallelism Using Skeletons (Sergei Gorlatch).

3.1 Introduction.

3.2 SAT: A Methodology Outline.

3.3 Skeletons and Collective Operations.

3.4 Case Study: Maximum Segment SUM (MSS).

3.5 Performance Aspect in SAT.

3.6 Conclusions and Related Work.

4. Bulk-Synchronous Parallelism: An Emerging Paradigm of High-Performance Computing (Alexander Tiskin).

4.1 The BSP Model.

4.2 BSP Programming.

4.3 Conclusions.

5. Cilk Versus MPI: Comparing Two Parallel Programming Styles on Heterogenous Systems (John Morris, KyuHo Lee and JunSeong Kim).

5.1 Introduction.

5.2 Experiments.

5.3 Results.

5.4 Conclusion.

6. Nested Parallelism and Pipelining in OpenMP (Marc Gonzalez, E. Ayguade, X. Martorell and J. Labarta).

6.1 Introduction.

6.2 OpenMP Extensions for Nested Parallelism.

6.3 OpenMP Extensions for Thread Synchronization.

6.4 Summary.

7. OpenMP for Chip Multiprocessors (Feng Liu and Vipin Chaudhary).

7.1 Introduction.

7.2 3SoC Architecture Overview.

7.3 The OpenMP Conpiler/Translator.

7.4 Extensions to OpenMP for DSEs.

7.5 Optimization for OpenMP.

7.6 Implementation.

7.7 Performance Evaluation.

7.8 Conclusions.

PART 2. ARCHITECTURAL AND SYSTEM SUPPORT.

8. Compiler and Run-Time Parallelization Techniques for Scientific Computations on Distributed-Memory Parallel Computers (PeiZong Lee, Cheien-Min Wang and Jan-Jan Wu).

8.1 Introduction.

8.2 Background Material.

8.3 Compiling Regular Programs on DMPCs.

8.4 Compiler and Run-Time Support for Irregular Programs.

8.5 Library Support for Irregular Applications.

8.6 Related Works.

8.7 Concluding Remarks.

9. Enabling Partial-Cache Line Prefetching Through Data Compression (Youtao Zhang and Rajiv Gupta).

9.1 Introduction.

9.2 Motivation of Partial Cache-Line Perfetching.

9.3 Cache Design Details.

9.4 Experimental Results.

9.5 Related Work.

9.6 Conclusion.

10. MPI Atomicity and Concurrent Overlapping I/O (Wei-Keng Liao, Alok Choudhary, Kenin Coloma, Lee Ward, Eric Russell and Neil Pundit).

10.1 Introduction.

10.2 Concurrent Overlapping I/O.

10.3 Implementation Strategies.

10.4 Experiment Results.

10.5 Summary.

11. Code Tiling: One Size Fits All (Jingling Xue and Qingguang Huang).

11.1 Introduction.

11.2 Cache Model.

11.3 Code Tiling.

11.4 Data Tiling.

11.5 Finding Optimal Tile Sizes.

11.6 Experimental Results.

11.7 Related Work.

11.8 Conclusion.

12. Data Conversion for Heterogeneous Migration/Checkpointing (Hai Jiang, Vipin Chaudhary and John Paul Walters).

12.1 Introduction.

12.2 Migration and Checkpointing.

12.3 Data Conversion.

12.4 Coarse-Grain Tagged RMR in MigThread.

12.5 Microbenchmarks and Experiments.

12.6 Related Work.

12.7 Conclusions and Future Work.

13. Receiving-Message Prediction and Its Speculative Execution (Takanobu Baba, Takashi Yokota, Kamemitsu Ootsu, Fumihitto Furukawa and Yoshiyuki Iwamoto).

13.1 Background.

13.2 Receiving-Message Prediction Method.

13.3 Implementation of the Method in the MIPI Libraries.

13.4 Experimental Results.

13.5 Conclusing Remarks.

14. An Investigation of the Applicability of Distributed FPGAs to High-Performance Computing (John P. Morrison, Padraig O’Dowd and Philip D. Healy).

14.1 Introduction.

14.2 High Performance Computing with Cluster Computing.

14.3 Reconfigurable Computing with EPGAs.

14.4 DRMC: A Distributed Reconfigurable Metacomputer.

14.5 Algorithms Suited to the Implementation on FPGAs/DRMC.

14.6 Algorithms Not Suited to the Implementation on FPGAs/DRMC.

14.7 Summary.

PART 3. SCHEDULING AND RESOURCE MANAGEMENT.

15. Bandwidth-Aware Resource Allocation for Heterogeneous Computing Systems to Maximize Throughput (Bo Hong and Viktor K. Prasanna).

15.1 Introduction.

15.2 Related Work.

15.3 Systems Model and Problem Statement.

15.4 Resource Allocation to Maximize System Throughput.

15.5 Experimental Results.

15.6 Conclusion.

16. Scheduling Algorithms with Bus Bandwidth Considerations for SMPs (Christos D. Antonopoulos, Dimitrios S., Nikolopoulos and Theeodore S. Papatheodorou).

16.1 Introduction.

16.2 Related Work.

16.3 The Implications of Bus Bandwidth for Application Performance.

16.4 Scheduling Policies for Preserving Bus Bandwidth.

16.5 Experimental Evaluation.

16.6 Conclusions.

17. Toward Performance Guarantee of Dynamic Task Scheduling of a Parameter-Sweep Application onto a Computational Grid (Noriyuki Fujimoto and Kenichi Hagihara).

17.1 Introduction.

17.2 A Grid Scheduling Model.

17.3 Related Works.

17.4 The Proposed Algorithm RR.

17.5 The Performance Guarantee of the Proposed Algorithm.

17.6 Conclusion.

18. Performance Study of Reliability Maximization and Turnaround Minimization with GA-based Task Allocation in DCS (Deo Prakash Vidyarthi, Anil Kumar Tripathi, Biplab Kumer Sarker, Kirti Rani and Laurence T. Yang).

18.1 Introduction.

18.2 GA for Task Allocation.

18.3 The Algorithm.

18.4 Illustrative Examples.

18.5 Discussions and Conclusion.

19. Toward Fast and Efficient Compile-Time Task Scheduling in Heterogeneous Computing Systems (Tarek Hagras and Jan Janecek).

19.1 Introduction.

19.2 Problem Definition.

19.3 The Suggested Algorithm.

19.4 Heterogeneous Systems Scheduling Heuristics.

19.5 Experimental Results and Discussion.

19.6 Conclusion.

20. An On-Line Approach for Classifying and Extracting Application Behavior on Linux (Luciano José Senger, Rodrigo Fernandes de Mello, Marcos José Santana, Regina Helena Carlucci Santana and Laurence Tianruo Yang).

20.1 Introduction.

20.2 Related Work.

20.3 Information Acquisition.

20.4 Linux Process Classification Model.

20.5 Results.

20.6 Evaluation of The Model Intrusion on the System Performance.

20.7 Conclusions.

PART 4. CLUSTERS AND GRID COMPUTING.

21. Peer-to-Peer Grid Computing and a .NET-Based Alchemi Framework (Akshay Luther, Rajkumar Buyya, Rajiv Ranjan and Srikumar Venugopal).

21.1 Introduction.

21.2 Background.

21.3 Desktop Grid Middleware Considerations.

21.4 Representation Desktop Grid Systems.

21.5 Alchemi Desktop Grid Framework.

21.6 Alchemi Design and Implementation.

21.7 Alchemi Performance Evaluation.

21.8 Summary and Future Work.

22. Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies (Parvin Asadzadeh, Rajkumar Buyya, Chun Ling Kei, Deepa Nayar and Srikumar Venugopal).

22.1 Introduction.

22.2 Overview of Grid Middleware Systems.

22.3 Unicore.

22.4 Globus.

22.5 Legion.

22.6 Gridbus.

22.7 Implementation of UNICORE Adaptor for Gridbus Broker.

22.8 Comparison of Middleware Systems.

22.9 Summary.

23. High-Performance Computing on Clusters: The Distributed JVM Approach (Wenzhang Zhu, Weijian Fang, Cho-Li Wang and Francis C. M. Lau).

23.1 Background.

23.2 Distributed JVM.

23.3 JESSICA2 Distributed JVM.

23.4 Performance Analysis.

23.5 Related Work.

23.6 Summary.

24. Data Grids: Supporting Data-Intensive Applications in Wide-Area Networks (Xiao Qin and Hong Jiang).

24.1 Introduction.

24.2 Data Grid Services.

24.3 High-Performance Data Grid.

24.4 Security Issues.

24.5 Open Issues.

24.6 Conclusions.

25. Application I/O on a Parallel File System for Linux Clusters (Dheeraj Bhaardwaj).

25.1 Introduction.

25.2 Application I/O.

25.3 Parallel I/O System Software.

25.4 Standard Unix & Parallel I/O.

25.5 Example: Seismic Imaging.

25.6 Discussion and Conclusion.

26. One Teraflop Achieved with a Geographically Distributed Linux Cluster (Peng Wang, George Turner, Steven Simms, Dave Hart, Mary Papakhiam and Craig Stewart).

26.1 Introduction.

26.2 Hardware and Software Setup.

26.3 System Tuning and Benchmark Results.

26.4 Performance Costs and Benefits.

27. A Grid-Based Distributed Simulation of Plasma Turbulence (Beniamino Di Martino, Salvatore Venticinque, Sergio Criguglio, Giulana Fogaccia and Gregorio Vlad).

27.1 Introduction.

27.2 MPI Implementation of The Internode Domain Decomposition.

27.3 Integration of The Internode Domain Decomposition with Intranode Particle Decomposition Strategies.

27.4 The MPICH-G2 Implementation.

27.5 Conclusions.

28. Evidence-Aware Trust Model for Dynamic Services (Ali Shaikh Ali, Omer F. Rana and Rashid J. Al-Ali).

28.1 Motivation For Evaluating Trust.

28.2 Service Trust—What Is It?

28.3 Evidence-Aware Trust Model.

28.4 The System Life Cycle.

28.5 Conclusion.

PART 5. PEER-TO-PEER COMPUTING.

29. Resource Discovery in Peer-to-Peer Infrastructures (Huang-Chang Hsiao and Chung-Ta King).

29.1 Introduction.

29.2 Design Requirements.

29.3 Unstructured P2P Systems 4.

29.4 Structured P2P Systems.

29.5 Advanced Resource Discovery for Structured P2P Systems.

29.6 Summary.

30. Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks (Yunhao Liu, Li Xiao, Lionel M. Ni and Zhenyun Zhuang).

30.1 Introduction.

30.2 Serarch Mechanisms.

30.3 Hybrid Periodical Flooding.

30.4 Simulation Methodology.

30.5 Performance Evaluation.

30.6 Conclusion.

31. HIERAS: A DHT-Based Hierarchical P2P Routing Algorithm (Zhiyong Xu, Yiming Hu and Laxmi Bhuyan).

31.1 Introduction.

31.2 Hierarchical P2P Architecture.

31.3 System Design.

31.4 Performance Evaluation.

31.5 Related Works.

31.6 Summary.

32. Flexible and Scalable Group Communication Model for Peer-to-Peer Systems (Tomoya Enokido and Makoto Takizawa).

32.1 Introduction.

32.2 Group of Agents.

32.3 Functions of Group Protocol.

32.4 Autonomic Group Protocol.

32.5 Retransmission.

32.6 Conclusion.

PART 6. WIRELESS AND MOBILE COMPUTING.

33. Study of Cache-Enhanced Dynamic Movement-Based Location Management Schemes for 3G Cellular Networks (Krishna Priya Patury, Yi Pan, Xiaola Lin, Yang Xiao and Jie Li).

33. 1 Introduction.

33.2 Location Management with and without Cache.

33.3 The Cache-Enhanced Location Management Scheme.

33.4 Simulation Results and Analysis.

33.5 Conclusion.

34. Maximizing Multicast Lifetime in Wireless Ad Hoc Networks (Guofeng Deng and Sandeep K. S. Gupta).

34.1 Introduction.

34.2 Energy Consumption Model In WANETs.

34.3 Definitions of Maximum Multicast Lifetime.

34.4 Maximum Multicast Lifetime of The Network Using Single Tree (MMLM).

34.5 Maximum Multicast Lifetime of The Network Using Multiple Trees (MMLM).

34.6 Summary.

35. A QoS-Aware Scheduling Algorithm for Bluetooth Scatternets (Young Man Kim, Ten H. Lai and Anish Arora).

35.1 Introduction.

35.2 Perfect Scheduling Problem for Bipartite Scatternet.

35.3 Perfect Assignment Scheduling Algorithm for Bipartite Scatternets.

35.4 Distributed, Local, and Incremental Scheduling Algorithms.

35.5 Performance and QOS Analysis.

35.6 Conclusion.

PART 7. HIGH PERFORMANCE APPLICATIONS.

36. A Workload Partitioner for Heterogeneous Grids (Daniel J. Harvey, Sajal K. Das and Rupak Biswas).

36.1 Introduction.

36.2 Preliminaries.

36.3 The MinEX Partitioner.

36.4 N-Body Application.

36.5 Experimental Study.

36.6 Conclusion.

37. Building a User-Level Grid for Bag-of-Tasks Applications (Walfredo Cirne, Francisco Brasileiro, Daniel Paranhos, Lauro Costa, Elizeu Santos-Neto and Carla Osthoff).

37.1 Introduction.

37.2 Design Goals.

37.3 Architecture.

37.4 Working Environment.

37.5 Scheduling.

37.6 Implementation.

37.7 Performance Evaluation.

37.8 Conclusions and Future Work.

38. An Efficient Parallel Method for Calculating the Smarandache Function (Sabin Tabirca, Tatiana Tabirca, Kieran Reynolds and Laurence T. Yang).

38.1 Introduction.

38.2 Computing in Parallel.

38.3 Experimental Results.

38.4 Conclusion.

39. Design, Implementation and Deployment of a Commodity Cluster for Peirodic Comparison of Gene Sequences (Anita M. Orendt, Brian Haymore, David Richardson, Sofia Robb, Alejandro Sanchez Alvarado and Julio C. Facelli).

39.1 Introduction.

39.2 System Requirements and Design.

39.3 Performance.

39.4 Conclusions.

40. A Hierarchical Distributed Shared-Memory Parallel Branch & Bound Application with PVM and OpenMP on Multiprocessor Clusters (Rocco Aversa, Beniamino Di Martino, Nicola Mazzocca and Salvatore Venticinque).

40.1 Introduction.

40.2 The B&B Parallel Application.

40.3 The OpenMP Extension.

40.4 Experimental Results.

40.5 Conclusions.

41. IP Based Telecommunication Services (Anna Bonifacio and G. Spinillo).

41.1 Introduction.

Index.