《并行程序设计原理(英文版)》PDF下载

  • 购买积分:20 如何计算积分?
  • 作  者:
  • 出 版 社:
  • 出版年份:2222
  • ISBN:
  • 页数:0 页
图书介绍:

PART 1 Foundations 1

Chapter 1 Introduction 1

The Power and Potential of Parallelism 2

Parallelism,a Familiar Concept 2

Parallelism in Computer Programs 3

Multi-Core Computers,an Opportunity 4

Even More Opportunities to Use Parallel Hardware 5

Parallel Computing versus Distributed Computing 6

System Level Parallelism 6

Convenience of Parallel Abstractions 8

Examining Sequential and Parallel Programs 8

Parallelizing Compilers 8

A Paradigm Shift 9

Parallel Prefix Sum 13

Parallelism Using Multiple Instruction Streams 15

The Concept of a Thread 15

A Multithreaded Solution to Counting 3s 15

The Goals:Scalability and Performance Portability 25

Scalability 25

Performance Portability 26

Principles First 27

Chapter Summary 27

Historical Perspective 28

Exercises 28

Chapter 2 Understanding Parallel Computers 30

Balancing Machine Specifics with Portability 30

A Look at Six Parallel Computers 31

Chip Multiprocessors 31

Symmetric Multiprocessor Architectures 34

Heterogeneous Chip Designs 36

Clusters 39

Supercomputers 40

Observations from Our Six Parallel Computers 43

An Abstraction of a Sequential Computer 44

Applying the RAM Model 44

Evaluating the RAM Model 45

The PRAM:A Parallel Computer Model 46

The CTA:A Practical Parallel Computer Model 47

The CTA Model 47

Communication Latency 49

Properties of the CTA 52

Memory Reference Mechanisms 53

Shared Memory 53

One-Sided Communication 54

Message Passing 54

Memory Consistency Models 55

Programming Models 56

A Closer Look at Communication 57

Applying the CTA Model 58

Chapter Summary 59

Historical Perspective 59

Exercises 59

Chapter 3 Reasoning about Performance 61

Motivation and Basic Concepts 61

Parallelism versus Performance 61

Threads and Processes 62

Latency and Throughput 62

Sources of Performance Loss 64

Overhead 64

Non-Parallelizable Code 65

Contention 67

Idle Time 67

Parallel Structure 68

Dependences 68

Dependences Limit Parallelism 70

Granularity 72

Locality 73

Performance Trade-Offs 73

Communication versus Computation 74

Memory versus Parallelism 75

Overhead versus Parallelism 75

Measuring Performance 77

Execution Time 77

Speedup 78

Superlinear Speedup 78

Efficiency 79

Concerns with Speedup 79

Scaled Speedup versus Fixed-Size Speedup 81

Scalable Performance 81

Scalable Performance Is Difficult to Achieve 81

Implications for Hardware 82

Implications for Software 83

Scaling the Problem Size 83

Chapter Summary 84

Historical Perspective 84

Exercises 85

PART 2 Parallel Abstractions 87

Chapter 4 First Steps Toward Parallel Programming 88

Data and Task Parallelism 88

Definitions 88

Illustrating Data and Task Parallelism 89

The Peril-L Notation 89

Extending C 90

Parallel Threads 90

Synchronization and Coordination 91

Memory Model 92

Synchronized Memory 94

Reduce and Scan 95

The Reduce Abstraction 96

Count 3s Example 97

Formulating Parallelism 97

Fixed Parallelism 97

Unlimited Parallelism 98

Scalable Parallelism 99

Alphabetizing Example 100

Unlimited Parallelism 101

Fixed Parallelism 102

Scalable Parallelism 104

Comparing the Three Solutions 109

Chapter Summary 110

Historical Perspective 110

Exercises 110

Chapter 5 Scalable Algorithmic Techniques 112

Blocks of Independent Computation 112

Schwartz'Algorithm 113

The Reduce and Scan Abstractions 115

Example of Generalized Reduces and Scans 116

The Basic Structure 118

Structure for Generalized Reduce 119

Example of Components of a Generalized Scan 122

Applying the Generalized Scan 124

Generalized Vector Operations 125

Assigning Work to Processes Statically 125

Block Allocations 126

Overlap Regions 128

Cyclic and Block Cyclic Allocations 129

Irregular Allocations 132

Assigning Work to Processes Dynamically 134

Work Queues 134

Variations of Work Queues 137

Case Study:Concurrent Memory Allocation 137

Trees 139

Allocation by Sub-Tree 139

Dynamic Allocations 140

Chapter Summary 141

Historical Perspective 142

Exercises 142

PART 3 Parallel Programming Languages 143

Chapter 6 Programming with Threads 145

POSIX Threads 145

Thread Creation and Destruction 146

Mutual Exclusion 150

Synchronization 153

Safety Issues 163

Performance Issues 167

Case Study:Successive Over-Relaxation 174

Case Study:Overlapping Synchronization with Computation 179

Case Study:Streaming Computations on a Multi-Core Chip 187

Java Threads 187

Synchronized Methods 189

Synchronized Statements 189

The Count 3s Example 190

Volatile Memory 192

Atomic Objects 192

Lock Objects 193

Executors 193

Concurrent Collections 193

OpenMP 193

The Count 3s Example 194

Semantic Limitations on parallel for 195

Reduction 196

Thread Behavior and Interaction 197

Sections 199

Summary of OpenMP 199

Chapter Summary 200

Historical Perspective 200

Exercises 200

Chapter 7 MPI and Other Local View Languages 202

MPI:The Message Passing Interface 202

The Count 3s Example 203

Groups and Communicators 211

Point-to-Point Communication 212

Collective Communication 214

Example:Successive Over-Relaxation 219

Performance Issues 222

Safety Issues 228

Partitioned Global Address Space Languages 229

Co-Array Fortran 230

Unified Parallel C 231

Titanium 232

Chapter Summary 233

Historical Perspective 234

Exercises 234

Chapter 8 ZPL and Other Global View Languages 236

The ZPL Programming Language 236

Basic Concepts of ZPL 237

Regions 237

Array Computation 240

Life,an Example 242

The Problem 242

The Solution 242

How It Works 243

The Philosophv of Life 245

Distinguishing Features of ZPL 245

Regions 245

Statement-Level Indexing 245

Restrictions Imposed by Regions 246

Performance Model 246

Addition by Subtraction 247

Manipulating Arrays of Different Ranks 247

Partial Reduce 248

Flooding 249

The Flooding Principle 250

Data Manipulation,an Example 251

Flood Regions 252

Matrix Multiplication 253

Reordering Data with Remap 255

Index Arrays 255

Remap 256

Ordering Example 258

Parallel Execution of ZPL Programs 260

Role of the Compiler 260

Specifying the Number of Processes 261

Assigning Regions to Processes 261

Array Allocation 262

Scalar Allocation 263

Work Assignment 263

Performance Model 264

Applying the Performance Model:Life 265

Applying the Performance Model:SUMMA 266

Summary of the Performance Model 266

NESL Parallel Language 267

Language Concepts 267

Matrix Product Using Nested Parallelism 268

NESL Complexity Model 269

Chapter Summary 269

Historical Perspective 269

Exercises 270

Chapter 9 Assessing the State of the Art 271

Four Important Properties of Parallel Languages 271

Correctness 271

Performance 273

Scalability 274

Portability 274

Evaluating Existing Approaches 275

POSIX Threads 275

Java Threads 276

OpenMP 276

MPI 276

PGAS Languages 277

ZPL 278

NESL 278

Lessons for the Future 279

Hidden Parallelism 279

Transparent Performance 280

Locality 280

Constrained Parallelism 280

Implicit versus Explicit Parallelism 281

Chapter Summary 282

Historical Perspective 282

Exercises 282

PART 4 Looking Forward 283

Chapter 10 Future Directions in Parallel Programming 284

Attached Processors 284

Graphics Processing Units 285

Cell Processors 288

Attached Processors Summary 288

Grid Computing 290

Transactional Memory 291

Comparison with Locks 292

Implementation Issues 293

Open Research Issues 295

MapReduce 296

Problem Space Promotion 298

Emerging Languages 299

Chapel 300

Fortress 300

X10 302

Chapter Summary 304

Historical Perspective 304

Exercises 304

Chapter 11 Writing Parallel Programs 305

Getting Started 305

Access and Software 305

Hello,World 306

Parallel Programming Recommendations 307

Incremental Development 307

Focus on the Parallel Structure 307

Testing the Parallel Structure 308

Sequential Programming 309

Be Willing to Write Extra Code 309

Controlling Parameters during Testing 310

Functional Debugging 310

Capstone Project Ideas 311

Implementing Existing Parallel Algorithms 311

Competing with Standard Benchmarks 312

Developing New Parallel Computations 313

Performance Measurement 314

Comparing against a Sequential Solution 315

Maintaining a Fair Experimental Setting 315

Understanding Parallel Performance 316

Performance Analysis 317

Experimental Methodology 318

Portability and Tuning 319

Chapter Summary 319

Historical Perspective 319

Exercises 320

Glossary 321

References 325

Index 328