大规模并行处理器程序设计PDF电子书下载
- 电子书积分:11 积分如何计算积分?
- 作 者:(美)柯克(Kirk.D.)著
- 出 版 社:北京市:清华大学出版社
- 出版年份:2010
- ISBN:9787302229735
- 页数:258 页
CHAPTER 1 INTRODUCTION 1
1.1 GPUs as Parallel Computers 2
1.2 Architecture of a Modern GPU 8
1.3 Why More Speed or Parallelism? 10
1.4 Parallel Programming Languages and Models 13
1.5 Overarching Goals 15
1.6 Organization of the Book 16
CHAPTER 2 HISTORY OF GPU COMPUTING 21
2.1 Evolution of Graphics Pipelines 21
2.1.1 The Era of Fixed-Function Graphics Pipelines 22
2.1.2 Evolution of Programmable Real-Time Graphics 26
2.1.3 Unified Graphics and Computing Processors 29
2.1.4 GPGPU:An Intermediate Step 31
2.2 GPU Computing 32
2.2.1 Scalable GPUs 33
2.2.2 Recent Developments 34
2.3 Future Trends 34
CHAPTER 3 INTRODUCTION TO CUDA 39
3.1 Data Parallelism 39
3.2 CUDA Program Structure 41
3.3 A Matrix-Matrix Multiplication Example 42
3.4 Device Memories and Data Transfer 46
3.5 Kernel Functions and Threading 51
3.6 Summary 56
3.6.1 Function declarations 56
3.6.2 Kernel launch 56
3.6.3 Predefined variables 56
3.6.4 Runtime API 57
CHAPTER 4 CUDA THREADS 59
4.1 CUDA Thread Organization 59
4.2 Using blockIdx and threadIdx 64
4.3 Synchronization and Transparent Scalability 68
4.4 Thread Assignment 70
4.5 Thread Scheduling and Latency Tolerance 71
4.6 Summary 74
4.7 Exercises 74
CHAPTER 5 CUDATM MEMORIES 77
5.1 Importance of Memory Access Efficiency 78
5.2 CUDA Device Memory Types 79
5.3 A Strategy for Reducing Global Memory Traffic 83
5.4 Memory as a Limiting Factor to Parallelism 90
5.5 Summary 92
5.6 Exercises 93
CHAPTER 6 PERFORMANCE CONSIDERATIONS 95
6.1 More on Thread Execution 96
6.2 Global Memory Bandwidth 103
6.3 Dynamic Partitioning of SM Resources 111
6.4 Data Prefetching 113
6.5 Instruction Mix 115
6.6 Thread Granularity 116
6.7 Measured Performance and Summary 118
6.8 Exercises 120
CHAPTER 7 FLOATING POINT CONSIDERATIONS 125
7.1 Floating-Point Format 126
7.1.1 Normalized Representation of M 126
7.1.2 Excess Encoding of E 127
7.2 Representable Numbers 129
7.3 Special Bit Patterns and Precision 134
7.4 Arithmetic Accuracy and Rounding 135
7.5 Algorithm Considerations 136
7.6 Summary 138
7.7 Exercises 138
CHAPTER 8 APPLICATION CASE STUDY:ADVANCED MRI RECONSTRUCTION 141
8.1 Application Background 142
8.2 Iterative Reconstruction 144
8.3 Computing FHd 148
Step 1.Determine the Kernel Parallelism Structure 149
Step 2.Getting Around the Memory Bandwidth Limitation 156
Step 3.Using Hardware Trigonometry Functions 163
Step 4.Experimental Performance Tuning 166
8.4 Final Evaluation 167
8.5 Exercises 170
CHAPTER 9 APPLICATION CASE STUDY:MOLECULAR VISUALIZATION AND ANALYSIS 173
9.1 Application Background 174
9.2 A Simple Kernel Implementation 176
9.3 Instruction Execution Efficiency 180
9.4 Memory Coalescing 182
9.5 Additional Performance Comparisons 185
9.6 Using Multiple GPUs 187
9.7 Exercises 188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING 191
10.1 Goals of Parallel Programming 192
10.2 Problem Decomposition 193
10.3 Algorithm Selection 196
10.4 Computational Thinking 202
10.5 Exercises 204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM 205
11.1 Background 205
11.2 Data Parallelism Model 207
11.3 Device Architecture 209
11.4 Kernel Functions 211
11.5 Device Management and Kernel Launch 212
11.6 Electrostatic Potential Map in OpenCL 214
11.7 Summary 219
11.8 Exercises 220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK 221
12.1 Goals Revisited 221
12.2 Memory Architecture Evolution 223
12.2.1 Large Virtual and Physical Address Spaces 223
12.2.2 Unified Device Memory Space 224
12.2.3 Configurable Caching and Scratch Pad 225
12.2.4 Enhanced Atomic Operations 226
12.2.5 Enhanced Global Memory Access 226
12.3 Kernel Execution Control Evolution 227
12.3.1 Function Calls within Kernel Functions 227
12.3.2 Exception Handling in Kernel Functions 227
12.3.3 Simultaneous Execution of Multiple Kernels 228
12.3.4 Interruptible Kernels 228
12.4 Core Performance 229
12.4.1 Double-Precision Speed 229
12.4.2 Better Control Flow Efficiency 229
12.5 Programming Environment 230
12.6 A Bright Outlook 230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE 233
A.1 matrixmul.cu 233
A.2 matrixmul_gold.cpp 237
A.3 matrixmul.h 238
A.4 assist.h 239
A.5 Expected Output 243
APPENDIX B GPU COMPUTE CAPABILITIES 245
B.1 GPU Compute Capability Tables 245
B.2 Memory Coalescing Variations 246
Index 251
- 《指向核心素养 北京十一学校名师教学设计 英语 七年级 上 配人教版》周志英总主编 2019
- 《设计十六日 国内外美术院校报考攻略》沈海泯著 2018
- 《计算机辅助平面设计》吴轶博主编 2019
- 《高校转型发展系列教材 素描基础与设计》施猛责任编辑;(中国)魏伏一,徐红 2019
- 《景观艺术设计》林春水,马俊 2019
- 《程序逻辑及C语言编程》卢卫中,杨丽芳主编 2019
- 《大数据Hadoop 3.X分布式处理实战》吴章勇,杨强 2020
- 《高等教育双机械基础课程系列教材 高等学校教材 机械设计课程设计手册 第5版》吴宗泽,罗圣国,高志,李威 2018
- 《指向核心素养 北京十一学校名师教学设计 英语 九年级 上 配人教版》周志英总主编 2019
- 《Cinema 4D电商美工与视觉设计案例教程》樊斌 2019
- 《大学计算机实验指导及习题解答》曹成志,宋长龙 2019
- 《指向核心素养 北京十一学校名师教学设计 英语 七年级 上 配人教版》周志英总主编 2019
- 《大学生心理健康与人生发展》王琳责任编辑;(中国)肖宇 2019
- 《大学英语四级考试全真试题 标准模拟 四级》汪开虎主编 2012
- 《大学英语教学的跨文化交际视角研究与创新发展》许丽云,刘枫,尚利明著 2020
- 《北京生态环境保护》《北京环境保护丛书》编委会编著 2018
- 《复旦大学新闻学院教授学术丛书 新闻实务随想录》刘海贵 2019
- 《大学英语综合教程 1》王佃春,骆敏主编 2015
- 《大学物理简明教程 下 第2版》施卫主编 2020
- 《指向核心素养 北京十一学校名师教学设计 英语 九年级 上 配人教版》周志英总主编 2019