大规模并行处理器程序设计 第2版 英文版PDF电子书下载
- 电子书积分:15 积分如何计算积分?
- 作 者:(美)柯克,(美)胡文美著
- 出 版 社:北京:机械工业出版社
- 出版年份:2013
- ISBN:9787111416296
- 页数:496 页
CHAPTER 11 Introduction 1
1.1 Heterogeneous Parallel Computing 2
1.2 Architecture of a Modern GPU 8
1.3 Why More Speed or Parallelism? 10
1.4 Speeding Up Real Applications 12
1.5 Parallel Programming Languages and Models 14
1.6 Overarching Goals 16
1.7 Organization of the Book 17
References 21
CHAPTER 2 History of GPU Computing 23
2.1 Evolution of Graphics Pipelines 23
The Era of Fixed-Function Graphics Pipelines 24
Evolution of Programmable Real-Time Graphics 28
Unified Graphics and Computing Processors 31
2.2 GPGPU:An Intermediate Step 33
2.3 GPU Computing 34
Scalable GPUs 35
Recent Developments 36
Future Trends 37
References and Further Reading 37
CHAPTER 3 Introduction to Data Parallelism and CUDA C 41
3.1 Data Parallelism 42
3.2 CUDA Program Structure 43
3.3 A Vector Addition Kernel 45
3.4 Device Global Memory and Data Transfer 48
3.5 Kernel Functions and Threading 53
3.6 Summary 58
Function Declarations 59
Kernel Launch 59
Predefined Variables 59
Runtime API 60
3.7 Exercises 60
References 62
CHAPTER 4 Data-Parallel Execution Model 63
4.1 Cuda Thread Organization 64
4.2 Mapping Threads to Multidimensional Data 68
4.3 Matrix-Matrix Multiplication—A More Complex Kernel 74
4.4 Synchronization and Transparent Scalabilitv 81
4.5 Assigning Resources to Blocks 83
4.6 Querying Device Properties 85
4.7 Thread Scheduling and Latency Tolerance 87
4.8 Summary 91
4.9 Exercises 91
CHAPTER 5 CUDA Memories 95
5.1 Importance of Memory Access Efficiency 96
5.2 CUDA Device Memory Types 97
5.3 A Strategy for Reducing Global Memory Traffic 105
5.4 A Tiled Matrix—Matrix Multiplication Kernel 109
5.5 Memory as a Limiting Factor to Parallelism 115
5.6 Summary 118
5.7 Exercises 119
CHAPTER 6 Performance Considerations 123
6.1 Warps and Thread Execution 124
6.2 Global Memory Bandwidth 132
6.3 Dynamic Partitioning of Execution Resources 141
6.4 Instruction Mix and Thread Granularity 143
6.5 Summary 145
6.6 Exercises 145
References 149
CHAPTER 7 Floating-Point Considerations 151
7.1 Floating-Point Format 152
Normalized Representation of M 152
Excess Encoding of E 153
7.2 Representable Numbers 155
7.3 Special Bit Patterns and Precision in IEEE Format 160
7.4 Arithmetic Accuracy and Rounding 161
7.5 Algorithm Considerations 162
7.6 Numerical Stability 164
7.7 Summary 169
7.8 Exercises 170
References 171
CHAPTER 8 Parallel Patterns:Convolution 173
8.1 Background 174
8.2 1D Parallel Convolution—A Basic Algorithm 179
8.3 Constant Memory and Caching 181
8.4 Tiled 1D Convolution with Halo Elements 185
8.5 A Simpler Tiled 1D Convolution—General Caching 192
8.6 Summary 193
8.7 Exercises 194
CHAPTER 9 Parallel Patterns:Prefix Sum 197
9.1 Background 198
9.2 A Simple Parallel Scan 200
9.3 Work Efficiency Considerations 204
9.4 A Work-Efficient Parallel Scan 205
9.5 Parallel Scan for Arbitrary-Length Inputs 210
9.6 Summary 214
9.7 Exercises 215
Reference 216
CHAPTER 10 Parallel Patterns:Sparse Matrix—Vector Multiplication 217
10.1 Background 218
10.2 Parallel SpMV Using CSR 222
10.3 Padding and Transposition 224
10.4 Using Hybrid to Control Padding 226
10.5 Sorting and Partitioning for Regularization 230
10.6 Summary 232
10.7 Exercises 233
References 234
CHAPTER 11 Application Case Study:Advanced MRI Reconstruction 235
11.1 Application Background 236
11.2 Iterative Reconstruction 239
11.3 Computing FHD 241
Step 1:Determine the Kernel Parallelism Structure 243
Step 2:Getting Around the Memory Bandwidth Limitation 249
Step 3:Using Hardware Trigonometry Functions 255
Step 4:Experimental Performance Tuning 259
11.4 Final Evaluation 260
11.5 Exercises 262
References 264
CHAPTER 12 Application Case Study:Molecular Visualization and Analysis 265
12.1 Application Background 266
12.2 A Simple Kernel Implementation 268
12.3 Thread Granularity Adiustment 272
12.4 Memory Coalescing 274
12.5 Summary 277
12.6 Exercises 279
References 279
CHAPTER 13 Parallel Programming and Computational Thinking 281
13.1 Goals of Parallel Computing 282
13.2 Problem Decomposition 283
13.3 Algorithm Selection 287
13.4 Computational Thinking 293
13.5 Summary 294
13.6 Exercises 294
References 295
CHAPTER 14 An Introduction to OpenCLTM 297
14.1 Background 297
14.2 Data Parallelism Model 299
14.3 Device Architecture 301
14.4 Kernel Functions 303
14.5 Device Management and Kernel Launch 304
14.6 Electrostatic Potential Map in OpenCL 307
14.7 Summary 311
14.8 Exercises 312
References 313
CHAPTER 15 Parallel Programming with OpenACC 315
15.1 OpenACC Versus CUDA C 315
15.2 Execution Model 318
15.3 Memory Model 319
15.4 Basic OpenACC Programs 320
Parallel Construct 320
Loop Construct 322
Kernels Construct 327
Data Management 331
Asynchronous Computation and Data Transfer 335
15.5 Future Directions of OpenACC 336
15.6 Exercises 337
CHAPTER 16 Thrust:A Productivity-Oriented Library for CUDA 339
16.1 Background 339
16.2 Motivation 342
16.3 Basic Thrust Features 343
Iterators and Memory Space 344
Interoperability 345
16.4 Generic Programming 347
16.5 Benefits of Abstraction 349
16.6 Programmer Productivity 349
Robustness 350
Real-World Performance 350
16.7 Best Practices 352
Fusion 353
Structure of Arrays 354
Implicit Ranges 356
16.8 Exercises 357
References 358
CHAPTER 17 CUDA FORTRAN 359
17.1 CUDA FORTRAN and CUDA C Differences 360
17.2 A First CUDA FORTR AN Program 361
17.3 Multidimensional Array in CUDA FORTRAN 363
17.4 Overloading Host/Device Routines With Generic Interfaces 364
17.5 Calling CUDA C Via Iso_C_Binding 367
17.6 Kernel Loop Directives and Reduction Operations 369
17.7 Dynamic Shared Memory 370
17.8 Asynchronous Data Transfers 371
17.9 Compilation and Profiling 377
17.1 0 Calling Thrust from CUDA FORTR AN 378
17.1 1 Exercises 382
CHAPTER 18 An Introduction to C++AMP 383
18.1 Core C++Amp Features 384
18.2 Details of the C++AMP Execution Model 391
Explicit and Implicit Data Copies 391
Asynchronous Operation 393
Section Summary 395
18.3 Managing Accelerators 395
18.4 Tiled Execution 398
18.5 C++AMP Graphics Features 401
18.6 Summary 405
18.7 Exercises 405
CHAPTER 19 Programming a Heterogeneous Computing Cluster 407
19.1 Background 408
19.2 A Running Example 408
19.3 MPI Basics 410
19.4 MPI Point-to-Point Communication Types 414
19.5 Overlapping Computation and Communication 421
19.6 MPI Collective Communication 431
19.7 Summary 431
19.8 Exercises 432
Reference 433
CHAPTER 20 CUDA Dynamic Parallelism 435
20.1 Background 436
20.2 Dynamic Parallelism Overview 438
20.3 Important Details 439
Launch Environment Configuration 439
API Errors and Launch Failures 439
Events 439
Streams 440
Synchronization Scope 441
20.4 Memory Visibility 442
Global Memory 442
Zero-Copy Memory 442
Constant Memory 442
Texture Memory 443
20.5 A Simple Example 444
20.6 Runtime Limitations 446
Memory Footprint 446
Nesting Depth 448
Memory Allocation and Lifetime 448
ECC Errors 449
Streams 449
Events 449
Launch Pool 449
20.7 A More Complex Example 449
Linear Bezier Curves 450
Quadratic Bezier Curves 450
Bezier Curve Calculation(Predynamic Parallelism) 450
Bezier Curve Calculation(with Dynamic Parallelism) 453
20.8 Summary 456
Reference 457
CHAPTER 21 Conclusion and Future Outlook 459
21.1 Goals Revisited 459
21.2 Memory Model Evolution 461
21.3 Kernel Execution Control Evolution 464
21.4 Core Performance 467
21.5 Programming Environment 467
21.6 Future Outlook 468
References 469
Appendix A:Matrix Multiplication Host-Only Version Source Code 471
Appendix B:GPU Compute Capabilities 481
Index 487
- 《指向核心素养 北京十一学校名师教学设计 英语 七年级 上 配人教版》周志英总主编 2019
- 《设计十六日 国内外美术院校报考攻略》沈海泯著 2018
- 《卓有成效的管理者 中英文双语版》(美)彼得·德鲁克许是祥译;那国毅审校 2019
- 《计算机辅助平面设计》吴轶博主编 2019
- 《高校转型发展系列教材 素描基础与设计》施猛责任编辑;(中国)魏伏一,徐红 2019
- 《景观艺术设计》林春水,马俊 2019
- 《程序逻辑及C语言编程》卢卫中,杨丽芳主编 2019
- 《大数据Hadoop 3.X分布式处理实战》吴章勇,杨强 2020
- 《高等教育双机械基础课程系列教材 高等学校教材 机械设计课程设计手册 第5版》吴宗泽,罗圣国,高志,李威 2018
- 《指向核心素养 北京十一学校名师教学设计 英语 九年级 上 配人教版》周志英总主编 2019
- 《SQL与关系数据库理论》(美)戴特(C.J.Date) 2019
- 《魔法销售台词》(美)埃尔默·惠勒著 2019
- 《看漫画学钢琴 技巧 3》高宁译;(日)川崎美雪 2019
- 《优势谈判 15周年经典版》(美)罗杰·道森 2018
- 《社会学与人类生活 社会问题解析 第11版》(美)James M. Henslin(詹姆斯·M. 汉斯林) 2019
- 《海明威书信集:1917-1961 下》(美)海明威(Ernest Hemingway)著;潘小松译 2019
- 《迁徙 默温自选诗集 上》(美)W.S.默温著;伽禾译 2020
- 《上帝的孤独者 下 托马斯·沃尔夫短篇小说集》(美)托马斯·沃尔夫著;刘积源译 2017
- 《巴黎永远没个完》(美)海明威著 2017
- 《剑桥国际英语写作教程 段落写作》(美)吉尔·辛格尔顿(Jill Shingleton)编著 2019
- 《指向核心素养 北京十一学校名师教学设计 英语 七年级 上 配人教版》周志英总主编 2019
- 《北京生态环境保护》《北京环境保护丛书》编委会编著 2018
- 《高等教育双机械基础课程系列教材 高等学校教材 机械设计课程设计手册 第5版》吴宗泽,罗圣国,高志,李威 2018
- 《指向核心素养 北京十一学校名师教学设计 英语 九年级 上 配人教版》周志英总主编 2019
- 《高等院校旅游专业系列教材 旅游企业岗位培训系列教材 新编北京导游英语》杨昆,鄢莉,谭明华 2019
- 《中国十大出版家》王震,贺越明著 1991
- 《近代民营出版机构的英语函授教育 以“商务、中华、开明”函授学校为个案 1915年-1946年版》丁伟 2017
- 《新工业时代 世界级工业家张毓强和他的“新石头记”》秦朔 2019
- 《智能制造高技能人才培养规划丛书 ABB工业机器人虚拟仿真教程》(中国)工控帮教研组 2019
- 《AutoCAD机械设计实例精解 2019中文版》北京兆迪科技有限公司编著 2019