did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9781852337940

Simd Programming Manual for Linux and Windows

by ;
  • ISBN13:

    9781852337940

  • ISBN10:

    185233794X

  • Format: Hardcover
  • Copyright: 2004-07-15
  • Publisher: Springer-Verlag New York Inc

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $179.99 Save up to $146.58
  • Buy Used
    $134.99
    Add to Cart Free Shipping Icon Free Shipping

    USUALLY SHIPS IN 2-4 BUSINESS DAYS

Supplemental Materials

What is included with this book?

Summary

The book is intended as a programmer's introduction to the use of SIMD on PCs. It presents the underlying technology of SIMD processing on current PCs and looks at tools to exploit this including the Intel SIMD library and the Parallel Processing Language Vector Pascal. It explains how to cast algorithms in parallel to exploit the parallel processing capability of standard PCs obtaining large performance gains relative to conventional sequential compilers. It assumes a familiarity with imperative programming but not specifically with Pascal. It does not assume any prior familiarity with the SIMD programming model. The language translation system will be available either as a downloadable for Linux or Windows in association with the book. This book will be particularly useful for programmers in the rapidly growing area of games and multi-media entertainment, and it would also to academics interested in parallel programming techniques or array programming languages.

Table of Contents

List of Tables xvii
List of Figures xix
List of Algorithms xxiii
Introduction xxv
I SIMD Programming 1(106)
Paul Cockshott
1 Computer Speed, Program Speed
3(8)
1.1 Clocks
3(1)
1.2 Width
4(1)
1.3 Instruction Speed
5(1)
1.4 Overhead Instructions
6(2)
1.5 Algorithm Complexity
8(3)
2 SIMD Instruction-sets
11(12)
2.1 The SIMD Model
11(1)
2.2 The MMX Register Architecture
12(1)
2.3 MMX Data-types
13(2)
2.4 3DNow!
15(4)
2.4.1 Cache Handling
17(1)
2.4.2 Cache Line Length and Prefetching
18(1)
2.5 Streaming SIMD
19(3)
2.5.1 Cache Optimisation
21(1)
2.6 The Motorola Altivec Architecture
22(1)
3 SIMD Programming in Assembler and C
23(24)
3.1 Vectorising C Compilers
23(2)
3.1.1 Dead for Loop Elimination
24(1)
3.1.2 Loop Unrolling
25(1)
3.2 Direct Use of Assembler Code
25(2)
3.2.1 The Example Program
26(1)
3.3 Use of Assembler Intrinsics
27(1)
3.4 Use of C++ Classes
27(1)
3.5 Use of the Nasm Assembler
28(10)
3.5.1 General Instruction Syntax
29(1)
3.5.2 Operand Forms
29(3)
3.5.3 Directives
32(2)
3.5.4 Linking and Object File Formats
34(1)
3.5.5 Summing a Vector
35(3)
3.6 Coordinate Transformations Using 3DNow!
38(6)
3.7 Coordinate Transformations Using SSE Instructions
44(3)
4 Intel SIMD Instructions
47(52)
4.1 Types
47(4)
4.2 shrl
51(1)
4.3 saturate
51(1)
4.4 Instructions
51(1)
4.4.1 ADDPS
52(1)
4.4.2 ADDSS
52(1)
4.4.3 ANDNPS
52(1)
4.4.4 ANDPS
52(1)
4.4.5 CMPPS
53(1)
4.4.6 CMPSS
54(1)
4.4.7 COMISS
54(1)
4.4.8 CVTPI2PS
55(1)
4.4.9 CVTPS2PI
55(1)
4.4.10 CVTTPS2PI
55(1)
4.4.11 CVTSI2SS
56(1)
4.4.12 CVTSS2SI
56(1)
4.4.13 CVTTSS2SI
56(1)
4.4.14 DIVPD
56(1)
4.4.15 DIVPS
57(1)
4.4.16 DIVSD
57(1)
4.4.17 DIVSS
57(1)
4.4.18 EMMS
57(1)
4.4.19 FXRSTOR
58(1)
4.4.20 FXSAVE
58(1)
4.4.21 MASKMOVQ
59(1)
4.4.22 MAXPD
59(1)
4.4.23 MAXPS
60(1)
4.4.24 MAXSD
60(1)
4.4.25 MAXSS
60(1)
4.4.26 MINPD
61(1)
4.4.27 MINPS
61(1)
4.4.28 MINSD
61(1)
4.4.29 MINSS
61(1)
4.4.30 MOVAPS-load
62(1)
4.4.31 MOVAPS_store
62(1)
4.4.32 MOVD-load
62(1)
4.4.33 MOVD_store
63(1)
4.4.34 MOVD_load_sse
63(1)
4.4.35 MOVD_store_sse
63(1)
4.4.36 MOVHLPS
63(1)
4.4.37 MOVHPS_load
64(1)
4.4.38 MOVHPS_store
64(1)
4.4.39 MOVLHPS
64(1)
4.4.40 MOVLPS-load
64(1)
4.4.41 MOVLPS_store
64(1)
4.4.42 MOVMSKPS
65(1)
4.4.43 MOVNTPS
65(1)
4.4.44 MOVNTQ
65(1)
4.4.45 MOVQ_load
66(1)
4.4.46 MOVQ_store
66(1)
4.4.47 MOVSS_load
66(1)
4.4.48 MOVSS_store
66(1)
4.4.49 MOVUPS_load
67(1)
4.4.50 MOVUPS_store
67(1)
4.4.51 MULPD
67(1)
4.4.52 MULPS
67(1)
4.4.53 MULSD
68(1)
4.4.54 MULSS
68(1)
4.4.55 ORPS
68(1)
4.4.56 PACKSSDW
69(1)
4.4.57 PACKSSWB
69(1)
4.4.58 PACKUSWB
69(1)
4.4.59 PADDB
70(1)
4.4.60 PADDB_sse
70(1)
4.4.61 PADDW
70(1)
4.4.62 PADDW_sse
71(1)
4.4.63 PADDD
71(1)
4.4.64 PADDD_sse
71(1)
4.4.65 PADDQ
72(1)
4.4.66 PADDQ_sse
72(1)
4.4.67 PADDSB
72(1)
4.4.68 PADDSB_sse
73(1)
4.4.69 PADDUSB
73(1)
4.4.70 PADDUSB_sse
74(1)
4.4.71 PAND
74(1)
4.4.72 PAND_sse
74(1)
4.4.73 PANDN
75(1)
4.4.74 PANDN_sse
75(1)
4.4.75 PAVGB
75(1)
4.4.76 PAVGB_sse
76(1)
4.4.77 PAVGW
76(1)
4.4.78 PAVGW_sse
76(1)
4.4.79 PCMPEQB
77(1)
4.4.80 PCMPEQB-sse
77(1)
4.4.81 PCMPEQW
77(1)
4.4.82 PCMPEQW_sse
78(1)
4.4.83 PCMPEQD
78(1)
4.4.84 PCMPEQD_sse
79(1)
4.4.85 PCMPGTB
79(1)
4.4.86 PCMPGTB_sse
79(1)
4.4.87 PCMPGTW
80(1)
4.4.88 PCMPGTW_sse
80(1)
4.4.89 PCMPGTD
80(1)
4.4.90 PCMPGTD_sse
81(1)
4.4.91 PEXTRW
81(1)
4.4.92 PEXTRW_sse
81(1)
4.4.93 PINSRW
82(1)
4.4.94 PMADDWD
82(1)
4.4.95 PMAXSW
82(1)
4.4.96 PMAXUB
83(1)
4.4.97 PMINSW
83(1)
4.4.98 PMINUB
84(1)
4.4.99 PMOVMSKB
84(1)
4.4.100 PMULHUW
84(1)
4.4.101 PMULHW
85(1)
4.4.102 PMULLW
85(1)
4.4.103 POR
86(1)
4.4.104 PREFETCHNTA
86(1)
4.4.105 PREFETCHT1
86(1)
4.4.106 PREFETCHT0
86(1)
4.4.107 PSADBW
87(1)
4.4.108 PSHUFD
87(1)
4.4.109 PSHUFW
87(1)
4.4.110 PSxxf
88(1)
4.4.111 PSUBx
89(1)
4.4.112 PSUBSx
89(1)
4.4.113 PSUBUSx
90(1)
4.4.114 PSWAPD
90(1)
4.4.115 PUNPCKHBW
90(1)
4.4.116 PUNPCKLBW
91(1)
4.4.117 PUNPCKHWD
91(1)
4.4.118 PUNPCKLWD
91(1)
4.4.119 PUNPCKHDQ
92(1)
4.4.120 PUNPCKLDQ
92(1)
4.4.121 PXOR
92(1)
4.4.122 RCPPS
93(1)
4.4.123 RCPSS
93(1)
4.4.124 RSQRTPS
93(1)
4.4.125 RSQRTSS
94(1)
4.4.126 SFENCE
94(1)
4.4.127 SQRTPS
95(1)
4.4.128 SQRTSS
95(1)
4.4.129 SUBPS
95(1)
4.4.130 SUBSS
96(1)
4.4.131 UNPCKHPS
96(1)
4.4.132 UNPCLPS
96(1)
4.4.133 XORPS
97(2)
5 3DNOW Instructions
99(1)
5.0.1 FEMMS
99(1)
5.0.2 PF2ID
99(1)
5.0.3 PFACC
99(1)
5.0.4 PFADD
100(1)
5.0.5 PFCMPEQ
100(1)
5.0.6 PFCMPGT
100(1)
5.0.7 PFCMPGE
101(1)
5.0.8 PFMAX
101(1)
5.0.9 PFMIN
101(1)
5.0.10 PFMUL
102(1)
5.0.11 PFNACC
102(1)
5.0.12 PFPNACC
102(1)
5.0.13 PFRCP
103(1)
5.0.14 PFRCPIT
103(1)
5.0.15 PFSUB
104(1)
5.0.16 PFSUBR
104(1)
5.0.17 PI2FD
105(1)
5.0.18 PI2FW
105(1)
5.0.19 PREFETCH
105(2)
II SIMD Programming Languages 107(102)
Paul Cockshott
6 Another Approach: Data Parallel Languages
109(12)
6.1 Operations on Whole Arrays
109(7)
6.1.1 Array Slicing
111(2)
6.1.2 Conditional Operations
113(1)
6.1.3 Reduction Operations
114(1)
6.1.4 Data Reorganisation
114(2)
6.2 Design Goals
116(5)
6.2.1 Target Machines
118(1)
6.2.2 Backward Compatibility
119(1)
6.2.3 Expressive Power
119(1)
6.2.4 Run-time Efficiency
120(1)
7 Basics of Vector Pascal
121(30)
7.1 Formating Rules
121(4)
7.1.1 Alphabet
121(1)
7.1.2 Reserved Words and Identifiers
122(2)
7.1.3 Character Case
124(1)
7.1.4 Spaces and Comments
124(1)
7.1.5 Semicolons
124(1)
7.2 Base Types
125(2)
7.2.1 Booleans
125(1)
7.2.2 Integer Numbers
125(1)
7.2.3 Real Numbers
125(1)
7.2.4 Characters and Strings
126(1)
7.3 Variables and Constants
127(3)
7.3.1 Declaration Order
127(1)
7.3.2 Constant Declarations
128(1)
7.3.3 Variable Declarations
129(1)
7.3.4 Assignment
129(1)
7.3.5 Predefined Types
129(1)
7.4 Expressions and Operators
130(5)
7.4.1 Arithmetic
130(2)
7.4.2 Operations on Boolean Values
132(1)
7.4.3 Equality Operators
133(1)
7.4.4 Ordered Comparison
133(2)
7.5 Matrix and Vector Operations
135(7)
7.5.1 Array Declarations
135(1)
7.5.2 Matrix and Vector Arithmetic
136(3)
7.5.3 Array Input/Output
139(1)
7.5.4 Array Slices
140(2)
7.6 Vector and Matrix Products
142(7)
7.6.1 Inner Product of Vectors
142(2)
7.6.2 Dot Product of Non-real Typed Vectors
144(1)
7.6.3 Matrix to Vector Product
145(1)
7.6.4 Data-flow Hazards
146(2)
7.6.5 Matrix to Matrix Multiplication
148(1)
7.7 Typography of Vector Pascal Programs
149(2)
8 Algorithmic Features of Vector Pascal
151(14)
8.1 Conditional Evaluation
151(1)
8.2 Functions
152(5)
8.2.1 User-defined Functions
152(3)
8.2.2 Procedures
155(1)
8.2.3 Procedure ReadAndValidate
156(1)
8.2.4 Function H
157(1)
8.2.5 Function Log2
157(1)
8.3 Branching
157(2)
8.3.1 Two-way Branches
157(1)
8.3.2 Multi-way Branches
158(1)
8.4 Unbounded Iteration
159(2)
8.4.1 While
159(1)
8.4.2 Repeat
160(1)
8.5 Bounded Iteration
161(2)
8.5.1 For to
161(1)
8.5.2 For Downto
162(1)
8.6 Goto
163(2)
9 User-defined Types
165(22)
9.1 Scalar Types
165(4)
9.1.1 SUCC and PRED
166(2)
9.1.2 ORD
168(1)
9.1.3 Input/Output of Scalars
168(1)
9.1.4 Representation
168(1)
9.2 Sub-range Types
169(1)
9.2.1 Representation
170(1)
9.3 Dimensioned Numbers
170(5)
9.3.1 Arithmetic on Dimensioned Numbers
173(1)
9.3.2 Handling Different Units of Measurement
174(1)
9.4 Records
175(2)
9.5 Pointers
177(5)
9.5.1 Pointer Idioms
179(2)
9.5.2 Freeing Storage
181(1)
9.6 Set Types
182(1)
9.6.1 Set Literals
182(1)
9.6.2 Operations on Sets
183(1)
9.7 String Types
183(4)
10 Input and Output
187(1)
10.1 File Types
187(3)
10.1.1 Binary Files
187(1)
10.1.2 Text Files
188(1)
10.1.3 Operating System Files
188(2)
10.2 Output
190(3)
10.2.1 Binary File Output
190(1)
10.2.2 Text File Output
190(3)
10.2.3 Generic Array Output
193(1)
10.3 Input
193(2)
10.3.1 Generic Array Input
193(1)
10.3.2 Binary File Input
194(1)
10.3.3 Text File Input
194(1)
10.4 File Predicates
195(1)
10.5 Random Access to Files
195(1)
10.5.1 Seek
195(1)
10.5.2 filepos
195(1)
10.5.3 Untyped i/o
196(1)
10.6 Error Conditions
196(1)
11 Permutations and Polymorphism
197(1)
11.1 Array Reorganisation
198(4)
11.1.1 An Example
200(1)
11.1.2 Array Shifts
200(1)
11.1.3 Element Permutation
200(2)
11.1.4 Efficiency Considerations
202(1)
11.2 Dynamic Arrays
202(2)
11.2.1 Schematic Arrays
203(1)
11.3 Polymorphic Functions
204(7)
11.3.1 Multiple Uses of Parametric Units
205(1)
11.3.2 Function dategt
206(3)
III Programming Examples 209(84)
Paul Cockshott
12 Advanced Set Programming
211(82)
12.1 Use of Sets to Find Prime Numbers
211(2)
12.1.1 Set Implementation
212(1)
12.2 Ordered Sets
213(5)
12.2.1 openfiles
215(1)
12.2.2 loadset
216(2)
12.3 Sets of Records
218(1)
12.3.1 Retrieval Operations
219(1)
12.4 Use of Sets in Text Indexing
219(3)
12.5 Constructing an Indexing Program
222(2)
12.5.1 dirlist: A Program for Traversing a Directory Tree
222(1)
12.5.2 intodir
223(1)
12.6 bloomfilter
224(2)
12.6.1 hashword
225(1)
12.6.2 setfilter
225(1)
12.6.3 testfilter
226(1)
12.7 The Main Program to Index Files
226(25)
12.7.1 processfile
227(1)
12.7.2 A Retrieval Program
227(2)
13 Parallel Image Processing
229(1)
13.1 Declaring an Image Data Type
229(1)
13.2 Brightness and Contrast Adjustment
229(1)
13.2.1 Efficiency in Image Code
230(1)
13.3 Image Filtering
231(2)
13.3.1 Blurring
233(1)
13.3.2 Sharpening
233(2)
13.3.3 Comparing Implementations
235(3)
13.4 genconv
238(2)
13.4.1 dup
240(1)
13.4.2 prev
241(1)
13.4.3 pm
241(1)
13.4.4 doedges
242(1)
13.4.5 freestore
242(1)
13.5 Digital Half-toning
242(2)
13.5.1 Parallel Half-tone
244(1)
13.5.2 errordifuse
245(2)
13.6 Image Resizing
247(2)
13.7 Horizontal Resize
249(2)
13.8 Horizontal Interpolation
251(1)
13.9 Interpolate Vertically
251(1)
13.10 Displaying Images
251(6)
13.10.1 demoimg - An Example Image Display Program
251(6)
13.11 The Unit BMP
257(8)
13.11.1 Procedure initbmpheader
260(1)
13.11.2 Procedure storebmpfile
261(1)
13.11.3 Function loadbmpfile
261(1)
13.11.4 Procedure adjustcontrast
262(1)
13.11.5 Procedure pconv
263(1)
13.11.6 Procedure convp
264(1)
14 Pattern Recognition and Image Compression
265(1)
14.1 Principles of Image Compression
265(6)
14.1.1 Data Compression in General
265(1)
14.1.2 Image Compression
266(1)
14.1.3 Vector Quantisation of Images
266(2)
14.1.4 Data Structures
268(1)
14.1.5 encode
269(2)
14.2 The K Means Algorithm
271(9)
14.2.1 Vector Quantisation of Colour Images
277(2)
15 3D Graphics
279(4)
15.1 Mesh Representation
280(2)
15.2 linedemo: An Illustration of 3D Projection
282(1)
15.3 demo3d: Main Procedure of linedemo
283(4)
15.3.1 Viewing Matrices
283(2)
15.3.2 SDL Initialisation
285(2)
15.4 Create a Rotation Matrix
287(1)
15.4.1 Calculate x mod 3
288(1)
15.5 2D Projection
288(2)
15.5.1 Entry Point to Line Drawing
289(1)
15.6 Bresenham Line Drawing Procedure
290(2)
15.7 Performance
292(1)
IV VIPER 293(22)
Ken Renfrew
16 Introduction to VIPER
295(20)
16.1 Rationale
295(1)
16.1.1 The Literate Programming Tool
295(1)
16.1.2 The Mathematical Syntax Converter
296(1)
16.2 A System Overview
296(1)
16.3 Which VIPER to Download?
297(1)
16.4 System Dependencies
297(1)
16.5 Installing Files
298(1)
16.6 Setting Up the Compiler
298(1)
16.7 Setting Up the System
298(5)
16.7.1 Setting System Dependencies
299(1)
16.7.2 Personal Set-up
300(1)
16.7.3 Dynamic Compiler Options
301(2)
16.7.4 VIPER Option Buttons
303(1)
16.8 Moving VIPER
303(1)
16.9 Programming with VIPER
303(3)
16.9.1 Single Files
303(1)
16.9.2 Projects
304(2)
16.9.3 Embedding LATEX in Vector Pascal
306(1)
16.10 Compiling Files in VIPER
306(1)
16.10.1 Compiling Single Files
306(1)
16.10.2 Compiling Projects
306(1)
16.11 Running Programs in VIPER
307(1)
16.12 Making VPTEX
307(1)
16.12.1 VPTEX Options
307(1)
16.12.2 VPMath
308(1)
16.13 19TEX in VIPER
308(1)
16.14 HTML in VIPER
309(1)
16.15 Writing Code to Generate Good VPTEX
309(6)
16.15.1 Use of Special Comments
309(1)
16.15.2 Use of Margin Comments
310(1)
16.15.3 Use of Ordinary Pascal Comments
311(1)
16.15.4 Levels of Detail Within Documentation
311(1)
16.15.5 Mathematical Translation: Motivation and Guidelines
312(1)
16.15.6 LATEX Packages
313(2)
Appendix A Compiler Porting Tools 315(20)
A.1 Dependencies
315(1)
A.2 Compiler Structure
316(1)
A.2.1 Vectorisation
317(1)
A.2.2 Porting Strategy
320(1)
A.3 ILCG
321(1)
A.4 Supported Types
321(1)
A.4.1 Data Formats
321(1)
A.4.2 Typed Formats
322(1)
A.4.3 ref Types
322(1)
A.5 Supported Operations
322(1)
A.5.1 Type Casts
322(1)
A.5.2 Arithmetic
322(1)
A.5.3 Memory
322(1)
A.5.4 Assignment
323(1)
A.5.5 Dereferencing
323(1)
A.6 Machine Description
323(1)
A.6.1 Registers
323(1)
A.6.2 Register Sets
324(1)
A.6.3 Register Arrays
324(1)
A.6.4 Register Stacks
324(1)
A.6.5 Instruction Formats
325(1)
A.7 Grammar of ILCG
325(1)
A.8 ILCG Grammar
326(1)
A.8.1 Helpers
326(1)
A.8.2 Tokens
327(1)
A.8.3 Non-terminal Symbols
329(6)
Appendix B Software Download 335(2)
Appendix C Using the Command Line Compiler 337(6)
C.1 Invoking the Compiler
337(1)
C.1.1 Environment Variable
337(1)
C.1.2 Compiler Options
337(1)
C.1.3 Dependencies
338(1)
C.2 Calling Conventions
338(3)
C.3 Array Representation
341(1)
C.3.1 Range Checking
341(2)
References 343(2)
Index 345

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program