1 Introduction

(24)

N. SUNDARARAJAN

P. SARATCHANDRAN

JIM TORRESEN

1.1 Parallel Processing for Simulating ANNs

(4)

1.1.1 Performance Metrics

(1)

1.1.2 General Aspects of Parallel Processing

(3)

1.2 Classification of ANN Models

(1)

1.3 ANN Models Covered in This Book

(20)

1.3.1 Multilayer Feed-Forward Networks with BP Learning

(6)

1.3.2 Hopfield Network

(2)

1.3.3 Multilayer Recurrent Networks

(1)

1.3.4 Adaptive Resonance Theory (ART) Networks

(1)

1.3.5 Self-Organizing Map (SOM) Networks

(1)

1.3.6 Processor Topologies and Hardware Platforms

(7)

2 A Review of Parallel Implementations of Backpropagation Neural Networks

(40)

JIM TORRESEN

OLAV LANDSVERK

2.1 Introduction

(1)

2.2 Parallelization of Feed-Forward Neural Networks

(29)

2.2.1 Distributed Computing for Each Degree of BP Parallelism

(3)

2.2.2 A Survey of Different Parallel Implementations

(20)

2.2.3 Neural Network Applications

(5)

2.3 Conclusions on Neural Applications and Parallel Hardware

(11)

I Analysis of Parallel Implementations

(118)

3 Network Parallelism for Backpropagation Neural Networks on a Heterogeneous Architecture

(44)

R. ARULARASAN

P. SARATCHANDRAN

N. SUNDARARAJAN

SHOU KING FOO

3.1 Introduction

(2)

3.2 Heterogeneous Network Topology

(1)

3.3 Mathematical Model for the Parallelized BP Algorithm

(6)

3.3.1 Timing Diagram for the Parallelized BP Algorithm

(5)

3.3.2 Prediction of Iteration Time

(1)

3.4 Experimental Validation of the Model Using Benchmark Problems

(3)

3.4.1 Benchmark Problems Used for Validation

(1)

3.4.2 Validation Setup and Results

(3)

3.5 Optimal Distribution of Neurons Among the Processing Nodes

(4)

3.5.1 Communication Constraints

(1)

3.5.2 Temporal Dependence Constraints

(1)

3.5.3 Memory Constraints

(1)

3.5.4 Feasibility Constraints

(1)

3.5.5 Optimal Mapping

(1)

3.6 Methods of Solution to the Optimal Mapping Problem

(5)

3.6.1 Genetic Algorithmic Solution

(3)

3.6.2 Approximate Linear Heuristic (ALH) Solution

(1)

3.6.3 Experimental Results

(1)

3.7 Statistical Validation of the Optimal Mapping

(3)

3.8 Discussion

(9)

3.8.1 Worthwhileness of Finding Optimal Mappings

(3)

3.8.2 Processor Location in a Ring

(2)

3.8.3 Cost-Benefit Analysis

(1)

3.8.4 Optimal Number of Processors for Homogeneous Processor Arrays

(3)

3.9 Conclusion

100

(1)

A3.1 Theoretical Expressions for Processes in the Parallel BP Algorithm

101

(4)

A3.1.1 Computation Processes

101

(3)

A3.1.2 Communication Processes

104

(1)

A3.2 Memory Constraints

105

(1)

A3.2.1 Storing the Training Set

105

(1)

A3.2.2 Storing the Neural Network Parameters

105

(1)

A3.2.3 Overall Memory Requirement

106

(1)

A3.3 Elemental Timings for T805 Transputers

106

(5)

4 Training-Set Parallelism for Backpropagation Neural Networks on a Heterogeneous Architecture

111

(24)

SHOU KING FOO

P. SARATCHANDRAN

N. SUNDARARAJAN

4.1 Introduction

111

(1)

4.2 Parallelization of BP Algorithm

112

(7)

4.2.1 Process Synchronization Graph

114

(3)

4.2.2 Variable Synchronization Graph

117

(1)

4.2.3 Predicting the Epoch Time

117

(2)

4.3 Experimental Validation of the Model Using Benchmark Problems

119

(1)

4.4 Optimal Distribution of Patterns Among the Processing Nodes

120

(4)

4.4.1 Communication Constraints

121

(1)

4.4.2 Temporal Dependence Constraints

121

(1)

4.4.3 Memory Constraints

122

(1)

4.4.4 Feasibility Constraints

123

(1)

4.4.5 Feasibility of Pattern Assignments

123

(1)

4.4.6 Feasibility of Waiting

123

(1)

4.4.7 Optimal Mapping

123

(1)

4.5 Genetic Algorithmic Solution to the Optimal Mapping Problem

124

(1)

4.5.1 Experimental Results

125

(1)

4.6 Statistical Validation of the Optimal Mapping

125

(1)

4.7 Discussion

126

(2)

4.7.1 Worthwhileness of Finding Optimal Distribution

126

(1)

4.7.2 Processor Location in a Ring

127

(1)

4.8 Conclusion

128

(1)

4.9 Process Decomposition

129

(1)

4.10 Memory Requirements

130

(5)

4.10.1 Storing the Network Parameters

130

(1)

4.10.2 Storing the Training Set

130

(1)

4.10.3 Memory Required for the Forward Pass of the Backpropagation

130

(1)

4.10.4 Memory Required for the Backward Pass of the Backpropagation

131

(1)

4.10.5 Temporary Memory Storage during Weight Changes Transfer

131

(1)

4.10.6 Overall Memory Requirement

131

(4)

5 Parallel Real-Time Recurrent Algorithm for Training Large Fully Recurrent Neural Networks

135

(22)

ELIAS S. MANOLAKOS

GEORGE KECHRIOTIS

5.1 Introduction

135

(1)

5.2 Background

136

(4)

5.2.1 The Real-Time Recurrent Learning Algorithm

136

(3)

5.2.2 Matrix Formulation of the RTRL Algorithm

139

(1)

5.3 Parallel RTRL Algorithm Derivation

140

(9)

5.3.1 The Retrieving Phase

140

(3)

5.3.2 The Learning Phase

143

(6)

5.4 Training Very Large RNNs on Fixed-Size Ring Arrays

149

(5)

5.4.1 Partitioning for the Retrieving Phase

149

(1)

5.4.2 Partitioning for the Learning Phase

150

(1)

5.4.3 A Transputer-Based Implementation

150

(4)

5.5 Conclusions

154

(3)

6 Parallel Implementation of ART1 Neural Networks on Processor Ring Architectures

157

(26)

ELIAS S. MANOLAKOS

STYLIANOS MARKOGIANNAKIS

6.1 Introduction

157

(1)

6.2 ART1 Network Architecture

158

(3)

6.3 Serial Algorithm

161

(3)

6.4 Parallel Ring Algorithm

164

(6)

6.4.1 Partitioning Strategy

169

(1)

6.5 Experimental Results

170

(3)

6.5.1 The MEIKO Computing Surface System

170

(1)

6.5.2 Performance and Scalability Analysis

171

(2)

6.6 Conclusions

173

(10)

II Implementations on a Big General-Purpose Parallel Computer

183

(48)

7 Implementation of Backpropagation Neural Networks on Large Parallel Computers

185

(46)

JIM TORRESEN

SHINJI TOMITA

7.1 Introduction

185

(1)

7.2 Hardware for Running Neural Networks

186

(2)

7.2.1 Fujitsu AP1000

186

(1)

7.2.2 Neural Network Applications Used in This Work

187

(1)

7.2.3 Experimental Conditions in This Work

188

(1)

7.3 General Mapping onto 2D-Torus MIMD Computers

188

(13)

7.3.1 The Proposed Mapping Scheme

189

(7)

7.3.2 Heuristic for Selection of the Best Mapping

196

(5)

7.3.3 Summary

201

(1)

7.4 Results on the General BP Mapping

201

(24)

7.4.1 Nettalk

201

(15)

7.4.2 Sonar Target Classification

216

(5)

7.4.3 Speech Recognition Network

221

(1)

7.4.4 Image Compression

221

(4)

7.5 Conclusions on the Application Adaptable Mapping

225

(6)

III Special Parallel Architectures and Application Case Studies

231

8 Massively Parallel Architectures for Large-Scale Neural Network Computations

233

(38)

YOSHIJI FUJIMOTO

8.1 Introduction

233

(2)

8.2 General Neuron Model

235

(2)

8.3 Toroidal Lattice and Planar Lattice Architectures of Virtual Processors

237

(1)

8.4 The Simulation of a Hopfield Neural Network

237

(5)

8.4.1 The Simulation of an HNN on TLA

238

(3)

8.4.2 The Simulation of an HNN on PLA

241

(1)

8.5 The Simulation of a Multilayer Perceptron

242

(3)

8.6 Mapping onto Physical Node Processors from Virtual Processors

245

(5)

8.7 Load Balancing of Node Processors

250

(1)

8.8 Estimation of the Performance

251

(4)

8.9 Implementation

255

(4)

8.10 Conclusions

259

(2)

A8.1 Load Balancing Mapping Algorithm

261

(2)

A8.2 Processing Time of the NP Array

263

(8)

9 Regularly Structured Neural Networks on the DREAM Machine

271

(32)

SOHEIL SHAMS

JEAN-LUC GAUDIOT

9.1 Introduction

271

(1)

9.2 Mapping Method Preliminaries

272

(7)

9.2.1 Neural Network Computation and Structure

272

(2)

9.2.2 Implementing Neural Networks on the Ring Systolic Architecture

274

(2)

9.2.3 System Utilization Characteristic of the Mapping onto the Ring Systolic Architecture

276

(1)

9.2.4 Execution Rate Characteristics of the Mapping onto the Ring Systolic Architecture

277

(1)

9.2.5 Mapping Multilayer Neural Networks onto the Ring Systolic Architecture

278

(1)

9.2.6 Deficiencies of the Mapping onto the Ring Systolic Architecture

278

(1)

9.3 DREAM Machine Architecture

279

(4)

9.3.1 System Level Overview

279

(1)

9.3.2 Processor-Memory Interface

280

(1)

9.3.3 Implementing a Table Lookup Mechanism on the DREAM Machine

281

(1)

9.3.4 Interprocessor Communication Network

282

(1)

9.4 Mapping Structured Neural Networks onto the DREAM Machine

283

(10)

9.4.1 General Mapping Problems

283

(1)

9.4.2 The Algorithmic Mapping Method and Its Applicability

284

(1)

9.4.3 Using Variable Length Rings to Implement Neural Network Processing

285

(2)

9.4.4 Implementing Multilayer Networks

287

(1)

9.4.5 Implementing Backpropagation Learning Algorithms

288

(1)

9.4.6 Implementing Blocked Connected Networks

289

(2)

9.4.7 Implementing Neural Networks Larger Than the Processor Array

291

(1)

9.4.8 Batch-Mode Implementation

291

(1)

9.4.9 Implementing Competitive Learning

292

(1)

9.5 Implementation Examples and Performance Evaluation

293

(4)

9.5.1 Performance Metric

294

(1)

9.5.2 Implementing Fully Connected Multilayer Neural Networks

294

(1)

9.5.3 Implementing a Block-Connected Multilayer Neural Network

295

(1)

9.5.4 Implementing a Fully Connected Single Layer Network

295

(2)

9.6 Conclusion

297

(6)

10 High-Performance Parallel Backpropagation Simulation with On-Line Learning

303

(42)

URS A. MULLER

PATRICK SPIESS

MICHAEL KOCHEISEN

BEAT FLEPP

ANTON GUNZINGER

WALTER GUGGENBUHL

10.1 Introduction

303

(1)

10.2 The MUSIC Parallel Supercomputer

304

(2)

10.2.1 System Hardware

304

(2)

10.2.2 System Programming

306

(1)

10.3 Backpropagation Implementation

306

(4)

10.3.1 The Backpropagation Algorithm

306

(1)

10.3.2 Parallelization

307

(3)

10.4 Performance Analysis

310

(2)

10.4.1 A Speedup Model

311

(1)

10.4.2 Loss Factors

311

(1)

10.4.3 Performance Results

312

(1)

10.5 The NeuroBasic Parallel Simulation Environment

312

(4)

10.5.1 Implementation

313

(1)

10.5.2 An Example Program

314

(2)

10.5.3 Performance versus Programming Time

316

(1)

10.6 Examples of Practical Research Work

316

(17)

10.6.1 Neural Networks in Photofinishing

316

(11)

10.6.2 The Truck Backer-Upper

327

(6)

10.7 Analysis of RISC Performance for Backpropagation

333

(6)

10.7.1 Introduction

334

(1)

10.7.2 Linearization of the Instruction Stream

334

(1)

10.7.3 Reduction of Load/Store Operations

335

(1)

10.7.4 Improvement of the Internal Instruction Stream Parallelism

336

(2)

10.7.5 Results

338

(1)

10.8 Conclusions

339

(6)

11 Training Neural Networks with SPERT-II

345

(20)

KRSTE ASANOVIC

JAMES BECK

DAVID JOHNSON

BRIAN KINGSBURY

NELSON MORGAN

JOHN WAWRZYNEK

11.1 Introduction

345

(1)

11.2 Algorithm Development

346

(1)

11.3 TO: A Vector Microprocessor

347

(2)

11.4 The SPERT-II Workstation Accelerator

349

(2)

11.5 Mapping Backpropagation to SPERT-II

351

(4)

11.6 Mapping Kohonen Nets to SPERT-II

355

(2)

11.7 Conclusions

357

(8)

12 Concluding Remarks

365

(2)

N. SUNDARARAJAN

P. SARATCHANDRAN

12.1 Future Trend

367

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Parallel Architectures for Artificial Neural Networks Paradigms and Implementations

9780818683992

0818683996

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program

Rent More, Save More! Use code: ECRENTAL

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

Parallel Architectures for Artificial Neural Networks Paradigms and Implementations

9780818683992

0818683996

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program

Digital License