《OpenCL编程指南 英文版》PDF下载

  • 购买积分:18 如何计算积分?
  • 作  者:(美)AaftabMunshiBendeictR.GasterTimothyG.MattsonJamesFungDanGinsburg著
  • 出 版 社:北京:科学出版社
  • 出版年份:2012
  • ISBN:9787030349637
  • 页数:603 页
图书介绍:新的OpenCL标准有助于充分利用CPU、GPU等处理器的丰富资源,已获得Apple、AMD、Intel、IBM等公司的认可,在服务器、嵌入式设备、高性能计算等领域有广阔的应用前景。本书由OpenCL的五大技术权威共同撰写,内容涵盖完整的规范。在分析关键用户案例的基础上,说明了如何用OpenCL表示各类并行算法,并且提供了完整的API和OpenCL C语言参考信息。通过完整的案例学习和代码示例,讲解了编写复杂并行程序的方法,实现在众多不同设备间分解工作量,还介绍了OpenCL软件性能优化的要点。本书是第一本针对OpenCL 1.1规范的全面、权威的实践指南,适合信息技术领域的研发人员和软件架构师阅读参考。

Part Ⅰ The OpenCL 1.1 Language and API 1

1.An Introduction to OpenCL 3

What Is OpenCL,or...Why You Need This Book 3

Our Many-Core Future:Heterogeneous Platforms 4

Software in a Many-Core World 7

Conceptual Foundations of OpenCL 11

Platform Model 12

Execution Model 13

Memory Model 21

Programming Models 24

OpenCL and Graphics 29

The Contents of OpenCL 30

Platform API 31

Runtime API 31

Kernel Programming Language 32

OpenCL Summary 34

The Embedded Profile 35

Learning OpenCL 36

2.HelloWorld:An OpenCL Example 39

Building the Examples 40

Prerequisites 40

Mac OS X and Code::Blocks 41

Microsoft Windows and Visual Studio 42

Linux and Eclipse 44

HelloWorld Example 45

Choosing an OpenCL Platform and Creating a Context 49

Choosing a Device and Creating a Command-Queue 50

Creating and Building a Program Object 52

Creating Kernel and Memory Objects 54

Executing a Kernel 55

Checking for Errors in OpenCL 57

3.Platforms,Contexts,and Devices 63

OpenCL Platforms 63

OpenCL Devices 68

OpenCL Contexts 83

4.Programming with OpenCL C 97

Writing a Data-Parallel Kernel Using OpenCL C 97

Scalar Data Types 99

The half Data Type 101

Vector Data Types 102

Vector Literals 104

Vector Components 106

Other Data Types 108

Derived Types 109

Implicit Type Conversions 110

Usual Arithmetic Conversions 114

Explicit Casts 116

Explicit Conversions 117

Reinterpreting Data as Another Type 121

Vector Operators 123

Arithmetic Operators 124

Relational and Equality Operators 127

Bitwise Operators 127

Logical Operators 128

Conditional Operator 129

Shift Operators 129

Unary Operators 131

Assignment Operator 132

Qualifiers 133

Function Qualifiers 133

Kernel Attribute Qualifiers 134

Address Space Qualifiers 135

Access Qualifiers 140

Type Qualifiers 141

Keywords 141

Preprocessor Directives and Macros 141

Pragma Directives 143

Macros 145

Restrictions 146

5.OpenCL C Built-In Functions 149

Work-Item Functions 150

Math Functions 153

Floating-Point Pragmas 162

Floating-Point Constants 162

Relative Error as ulps 163

Integer Functions 168

Common Functions 172

Geometric Functions 175

Relational Functions 175

Vector Data Load and Store Functions 181

Svnchronization Functions 190

Async Copy and Prefetch Functions 191

Atomic Functions 195

Miscellaneous Vector Functions 199

Image Read and Write Functions 201

Reading from an Image 201

Samplers 206

Determining the Border Color 209

Writing to an Image 210

Querying Image Information 214

6.Programs and Kernels 217

Program and Kernel Object Overview 217

Program Objects 218

Creating and Building Programs 218

Program Build Options 222

Creating Programs from Binaries 227

Managing and Querying Programs 236

Kernel Objects 237

Creating Kernel Objects and Setting Kernel Arguments 237

Thread Safety 241

Managing and Querying Kernels 242

7.Buffers and Sub-Buffers 247

Memory Objects,Buffers,and Sub-Buffers Overview 247

Creating Buffers and Sub-Buffers 249

Querying Buffers and Sub-Buffers 257

Reading,Writing,and Copying Buffers and Sub-Buffers 259

Mapping Buffers and Sub-Buffers 276

8.Images and Samplers 281

Image and Sampler Object Overview 281

Creating Image Objects 283

Image Formats 287

Querying for Image Support 291

Creating Sampler Objects 292

OpenCL C Functions for Working with Images 295

Transferring Image Objects 299

9.Events 309

Commands,Queues,and Events Overview 309

Events and Command-Queues 311

Event Objects 317

Generating Events on the Host 321

Events Impacting Execution on the Host 322

Using Events for Profiling 327

Events Inside Kernels 332

Events from Outside OpenCL 333

10.Interoperability with OpenGL 335

OpenCL/OpenGL Sharing Overview 335

Querying for the OpenGL Sharing Extension 336

Initializing an OpenCL Context for OpenGL Interoperability 338

Creating OpenCL Buffers from OpenGL Buffers 339

Creating OpenCL Image Objects from OpenGL Textures 344

Querying Information about OpenGL Objects 347

Synchronization between OpenGL and OpenCL 348

11.Interoperability with Direct3D 353

Direct3D/OpenCL Sharing Overview 353

Initializing an OpenCL Context for Direct3D Interoperability 354

Creating OpenCL Memory Objects from Direct3D Buffers and Textures 357

Acquiring and Releasing Direct3D Objects in OpenCL 361

Processing a Direct3D Texture in OpenCL 363

Processing D3D Vertex Data in OpenCL 366

12.C++Wrapper API 369

C++Wrapper API Overview 369

C++Wrapper API Exceptions 371

Vector Add Example Using the C++Wrapper API 374

Choosing an OpenCL Platform and Creating a Context 375

Choosing a Device and Creating a Command-Queue 376

Creating and Building a Program Object 377

Creating Kernel and Memory Objects 377

Executing the Vector Add Kernel 378

13.OpenCL Embedded Profile 383

OpenCL Profile Overview 383

64-Bit Integers 385

Images 386

Built-In Atomic Functions 387

Mandated Minimum Single-Precision Floating-Point Capabilities 387

Determining the Profile Supported by a Device in an OpenCL C Program 390

Part Ⅱ OpenCL 1.1 Case Studies 391

14.Image Histogram 393

Computing an Image Histogram 393

Parallelizing the Image Histogram 395

Additional Optimizations to the Parallel Image Histogram 400

Computing Histograms with Half-Float or Float Values for Each Channel 403

15.Sobel Edge Detection Filter 407

What Is a Sobel Edge Detection Filter? 407

Implementing the Sobel Filter as an OpenCL Kernel 407

16.Parallelizing Dijkstra's Single-Source Shortest-Path Graph Algorithm 411

Graph Data Structures 412

Kernels 414

Leveraging Multiple Compute Devices 417

17.Cloth Simulation in the Bullet Physics SDK 425

An Introduction to Cloth Simulation 425

Simulating the Soft Body 429

Executing the Simulation on the CPU 431

Changes Necessary for Basic GPU Execution 432

Two-Layered Batching 438

Optimizing for SIMD Computation and Local Memory 441

Adding OpenGL Interoperation 446

18.Simulating the Ocean with Fast Fourier Transform 449

An Overview of the Ocean Application 450

Phillips Spectrum Generation 453

An OpenCL Discrete Fourier Transform 457

Determining 2D Decomposition 457

Using Local Memory 459

Determining the Sub-Transform Size 459

Determining the Work-Group Size 460

Obtaining the Twiddle Factors 461

Determining How Much Local Memory Is Needed 462

Avoiding Local Memory Bank Conflicts 463

Using Images 463

A Closer Look at the FFT Kernel 463

A Closer Look at the Transpose Kernel 467

19.Optical Flow 469

Optical Flow Problem Overview 469

Sub-Pixel Accuracy with Hardware Linear Interpolation 480

Application of the Texture Cache 480

Using Local Memory 481

Early Exit and Hardware Scheduling 483

Efficient Visualization with OpenGL Interop 483

Performance 484

20.Using OpenCL with PyOpenCL 487

Introducing PyOpenCL 487

Running the PylmageFilter2D Example 488

PyImageFilter2D Code 488

Context and Command-Queue Creation 492

Loading to an Image Object 493

Creating and Building a Program 494

Setting Kernel Arguments and Executing a Kernel 495

Reading the Results 496

21.Matrix Multiplication with OpenCL 499

The Basic Matrix Multiplication Algorithm 499

A Direct Translation into OpenCL 501

Increasing the Amount of Work per Kernel 506

Optimizing Memory Movement:Local Memory 509

Performance Results and Optimizing the Original CPU Code 511

22.Sparse Matrix-Vector Multiplication 515

Sparse Matrix-Vector Multiplication(SpMV)Algorithm 515

Description of This Implementation 518

Tiled and Packetized Sparse Matrix Representation 519

Header Structure 522

Tiled and Packetized Sparse Matrix Design Considerations 523

Optional Team Information 524

Tested Hardware Devices and Results 524

Additional Areas of Optimization 538

A.Summary of OpenCL 1.1 541

The OpenCL Platform Layer 541

Contexts 541

Querying Platform Information and Devices 542

The OpenCL Runtime 543

Command-Queues 543

Buffer Objects 544

Create Buffer Objects 544

Read,Write,and Copy Buffer Objects 544

Map Buffer Objects 545

Manage Buffer Objects 545

Query Buffer Objects 545

Program Objects 546

Create Program Objects 546

Build Program Executable 546

Build Options 546

Query Program Objects 547

Unload the OpenCL Compiler 547

Kernel and Event Objects 547

Create Kernel Objects 547

Kernel Arguments and Object Queries 548

Execute Kernels 548

Event Objects 549

Out-of-Order Execution of Kernels and Memory Object Commands 549

Profiling Operations 549

Flush and Finish 550

Supported Data Types 550

Built-In Scalar Data Types 550

Built-In Vector Data Types 551

Other Built-In Data Types 551

Reserved Data Types 551

Vector Component Addressing 552

Vector Components 552

Vector Addressing Equivalencies 553

Conversions and Type Casting Examples 554

Operators 554

Address Space Qualifiers 554

Function Qualifiers 554

Preprocessor Directives and Macros 555

Specify Type Attributes 555

Math Constants 556

Work-Item Built-In Functions 557

Integer Built-In Functions 557

Common Built-In Functions 559

Math Built-In Functions 560

Geometric Built-In Functions 563

Relational Built-In Functions 564

Vector Data Load/Store Functions 567

Atomic Functions 568

Async Copies and Prefetch Functions 570

Synchronization,Explicit Memory Fence 570

Miscellaneous Vector Built-In Functions 571

Image Read and Write Built-In Functions 572

Image Objects 573

Create Image Objects 573

Query List of Supported Image Formats 574

Copy between Image,Buffer Objects 574

Map and Unmap Image Objects 574

Read,Write,Copy Image Objects 575

Query Image Objects 575

Image Formats 576

Access Qualifiers 576

Sampler Objects 576

Sampler Declaration Fields 577

OpenCL Device Architecture Diagram 577

OpenCL/OpenGL Sharing APIs 577

CL Buffer Objects>GL Buffer Objects 578

CL Image Objects>GL Textures 578

CL Image Objects>GL Renderbuffers 578

Query Information 578

Share Objects 579

CL Event Objects>GL Sync Objects 579

CL Context>GL Context,Sharegroup 579

OpenCL/Direct3D 10 Sharing APIs 579

Index 581