《可扩展并行计算 技术、结构与编程 英文版》PDF下载

  • 购买积分:22 如何计算积分?
  • 作  者:(黄凯)KaiHwang,(徐志伟)ZhiweiXu著
  • 出 版 社:北京:机械工业出版社
  • 出版年份:1999
  • ISBN:711107176X
  • 页数:802 页
图书介绍:

PartⅠScalability and Clustering 1

Chapter 1 Scalable Computer Platforms and Models 3

1.1 Evolution of Computer Architecture 5

1.1.1 Computer Generations 5

1.1.2 Scalable Computer Architectures 6

1.1.3 Converging System Architectures 8

1.2 Dimensions of Sealability 9

1.2.1 Resource Scalability 9

1.2.2 Application Scalability 11

1.2.3 Technology Scalability 12

1.3 Parallel Computer Models 13

1.3.1 Semantic Attributes 14

1.3.2 Performance Attributes 17

1.3.3 Abstract Machine Models 18

1.3.4 Physical Machine Models 26

1.4 Basic Concepts of Clustering 30

1.4.1 Cluster Characteristics 30

1.4.2 Architectural Comparisons 31

1.4.3 Benefits and Difficulties of Clusters 32

1.5.1 Principle of Independence 37

1.5 Scalable Design Principles 37

1.5.2 Principle of Balanced Design 39

1.5.3 Design for Scalability 44

1.6 Bibliographic Notes and Problems 47

Chapter 2 Basics of Parallel Programming 51

2.1 Parallel Programming Overview 51

2.1.1 Why Is Parallel Programming Difficult? 52

2.1.2 Parallel Programming Environments 55

2.1.3 Parallel Programming Approaches 56

2.2.1 Definitions of an Abstract Process 59

2.2 Processes, Tasks, and Threads 59

2.2.2 Execution Mode 62

2.2.3 Address Space 63

2.2.4 Process Context 65

2.2.5 Process Descriptor 66

2.2.6 Process Control 67

2.2.7 Variations of Process 70

2.3 Parallelism Issues 71

2.3.1 Homogeneity in Processes 72

2.3.2 Static versus Dynamic Parallelism 74

4.3 Microprocessor Architecture Families 74

2.3.3 Process Grouping 75

2.3.4 Allocation Issues 76

2.4 Interaction/Communication Issues 77

2.4.1 Interaction Operations 77

2.4.2 Interaction Modes 80

2.4.3 Interaction Patterns 82

2.4.4 Cooperative versus Competitive Interactions 84

2.5 Semantic Issues in Parallel Programs 85

2.5.1 Program Termination 85

2.5.2 Determinacy of Programs 86

2.6 Bibliographic Notes and Problems 87

3.1 System and Application Benchmarks 91

Chapter 3 Performance Metrics and Benchmarks 91

3.1.1 Micro Benchmarks 92

3.1.2 Parallel Computing Benchmarks 96

3.1.3 Business and TPC Benchmarks 98

3.1.4 SPEC Benchmark Family 100

3.2 Performance versus Cost 102

3.2.1 Execution Time and Throughput 103

3.2.2 Utilization and Cost-Effectiveness 104

3.3 Basic Performance Metrics 108

3.3.1 Workload and Speed Metrics 108

3.3.2 Caveats in Sequential Performance 111

3.4.1 Computational Characteristics 113

3.4 Performance of Parallel Computers 113

3.4.2 Parallelism and Interaction Overheads 115

3.4.3 Overhead Quantification 118

3.5 Performance of Parallel Programs 126

3.5.1 Performance Metrics 126

3.5.2 Available Parallelism in Benchmarks 131

3.6.1 Amdahl s Law: Fixed Problem Size 134

3.6.2 Gustafson s Law: Fixed Time 136

3.6.3 Sun and Ni s Law: Memory Bounding 139

3.6.4 Isoperformance Models 144

3.7Bibliographic Notes and problems 148

Part ⅡEnabling Technologies 153

Chapter 4 Microprocessors as Building Blocks 155

4.1 System Development Trends 155

4.1.1 Advances in Hardware 156

4.1.2 Advances in Software 159

4.1.3 Advances in Applications 160

4.2.1 Basics of Instruction Pipeline 164

4.2 Principles of Processor Design 164

4.2.2 From CISC to RISC and Beyond 169

4.2.3 Architectural Enhancement Approaches 172

4.3.1 Major Architecture Families 174

4.3.2 Superscalar versus Superpipelined Processors 175

4.3.3 Embedded Microprocessors 180

4.4.1 Digital s Alpha 21164 Microprocessor 182

4.4 Case Studies of Microprocessors 182

4.4.2 Intel Pentium Pro Processor 186

4.5 Post-RISC, Multimedia,and VLIW 191

4.5.1 Post-RISC Processor Features 191

4.5.2 Multimedia Extensions 195

4.5.3 The VLIW Architecture 199

4.6 The Future of Microprocessors 201

4.6.1 Hardware Trends and Physical Limits 201

4.6.3 Future Workloads and Challenges 203

4.6.3 Future Microprocessor Architectures 204

4.7 Bibliographic Notes and Problems 206

5.1.1 Characteristics of Storage Devices 211

5.1 Hierarchical Memory Technology 211

Chapter 5 Distributed Memory and Latency Tolerance 211

5.1.2 Memory Hierarchy Properties 214

5.1.3 Memory Capacity Planning 217

5.2 Cache Coherence Protocols 220

5.2.1 Cache Coherency Problem 220

5.2.2 Snoopy Coherency Protocols 222

5.2.3 The MESI Snoopy Protocol 224

5.3.1 Memory Event Ordering 228

5.3 Shared-Memory Consistency 228

5.3.2 Memory Consistency Models 231

5.3.3 Relaxed Memory Models 234

5.4 Distributed Cache/Memory Architecture 237

5.4.1 NORMA, NUMA ,COMA, and DSM Models 237

5.4.2 Directory-Based Coherency Protocol 243

5.4.3 The Stanford Dash Multiprocessorr 245

5.4.5 Directory-Based Protocol in Dash 248

5.5 Latency Tolerance Techniques 250

5.5.1 Latency Avoidance, Reduction ,and Hiding 250

5.5.2 Distributed Coherent Caches 253

5.5.3 Data Prefetching Strategies 255

5.6 Multithreaded Latency Hiding 257

5.5.4 Effects of Relaxed memory Consistency 257

5.6.1 Multithreaded Processor Model 258

5.6.2 Context-Switching Policies 260

5.6.3 Combining latency Hiding Mechanisms 265

5.7 Bibliographic Notes and Problems 266

Chapter 6 System Interconnects and Gigabit Networks 273

6.1 Basics of Interconnection Network 273

6.1.1 Interconnection Environments 273

6.1.2 Network Components 276

6.1.3 Network Characteristics 277

6.1.4 Network Performance Metrics 280

6.2 Network Topologies and Properties 281

6.2.1 Topological and Functional Properties 281

6.2.2 Routing Schemes and Functions 283

6.2.3 Networking Topologies 286

6.3.1 Multiprocessor Buses 294

6.3 Buses, Crossbar ,and Multistage Switches 294

6.3.2 Crossbar Switches 298

6.3.3 Multistage Interconnection Networks 301

6.3.4 Comparison of Switched Interconnects 305

6.4 Gigabit Network Technologies 307

6.4.1 Fiber Channel and FDDI Rings 307

6.4.2 Fast Ethernet and Gigabit Ethernet 310

6.4.3 Myrinet for SAN/LAN Construction 313

6.4.4 HiPPI and SuperHiPPI 314

3.6 Scalability and Speedup Analysis 314

6.5 ATM Switches and Networks 318

6.5.1 ATM Technology 318

6.5.2 ATM Network Interfaces 320

6.5.3 Four Layers of ATM Architecture 321

6.5.4 ATM Internetwork Connectivity 324

6.6 Scalable Coherence Interface 326

6.6.1 SCI Interconnects 327

6.6.2 Implementation Issues 329

6.6.3 SCI Coherence Protocol 332

6.7 Comparison of Network Technologies 334

6.7.1 Standard Networks and Perspectives 334

6.7.2 Network Performance and Applications 335

6.8 Bibliographic Notes and Problems 337

Chapter 7 Threading, Synchronization ,and Communication 343

7.1 Software Multithreading 343

7.1.1 The Thread Concept 344

7.1.2 Threads Management 346

7.1.3 Thread Synchronization 348

7.2 Synchronization Mechanisms 349

7.2.1 Atomicity versus Mutual Exclusion 349

7.2.2 High-Level Synchronization Constructs 355

7.2.3 Low-Level Synchronization Primitives 360

7.2.4 Fast Locking Mechanisms 364

7.3 The TCP/IP Communication Protocol Suite 366

7.3.1 Features of The TCP/IP Suite 367

7.3.2 UDP, TCP, and IP 371

7.3.3 The Sockets Interface 375

7.4 Fast and Efficient Communication 376

7.4.1 Key Problems in Communication 377

7.4.2 The log P Communication Model 384

7.4.3 Low-Level Communications Support 386

7.4.4 Communication Algorithms 396

7.5 Bibliographic Notes and Problems 398

Part Ⅲ Systems Architecture 403

Chapter 8 Symmetric and CC-NUMA Multiprocessors 407

8.1 SMP and CC-NUMA Technology 407

8.1.1 Multiprocessor Architecture 407

8.1.2 Commercial SMP Servers 412

8.1.3 The Intel SHV Server Board 413

8.2 Sun UItra Enterprise 10000 System 416

8.2.1 The Uitra E-10000 Architecture 416

8.2.2 System Board Architecture 418

8.2.3 Scalability and Availability Support 418

8.2.4 Dynamic Domains and Performance 420

8.3 HP/Convex Exemplar X-Class 421

8.3.1 The Exemplar X System Architecture 421

8.3.2 Exemplar Software Environment 424

8.4 The Sequent NUMA-Q 2000 425

8.4.1 The NUMA-Q 2000 Architecture 426

8.4.2 Software Environment of NUMA-Q 430

8.4.3 Performance of the NUMA-Q 431

8.5 the SGI/Cray Origin 2000 Superserver 434

8.5.1 Design Goals of Origin 2000 Series 434

8.5.2 The Origin 2000 Architecture 435

8.5.3 The Cellular IRIX Environment 443

8.5.4 Performance of the Origin 2000 447

8.6 Comparison of CC-NUMA Architectures 447

8.7 Bibliographic Notes and Problems 451

9.1.1 Classification of Clusters 453

Chapter 9 Support of Clustering and Availability 453

9.1 Challenges in Clustering 453

9.1.2 Cluster Architectures 456

9.1.3 Cluster Design Issues 457

9.2 Availability Support for Clustering 459

9.2.1 The Availability Concept 460

9.2.2 Availability Techniques 463

9.2.3 Checkpointing and Failure Recovery 468

9.3 Support for Single System Image 473

9.3.1 Single System Image Layers 473

9.3.2 Single Entry and Single File Hierarchy 475

9.3.3 Single I/O,Networking,and Memory Space 479

9.4 Single System Image in Solaris MC 482

9.4.1 Global File System 482

9.4.2 Global Process Management 484

9.4.3 Single I/O System Image 485

9.5 Job Management in Clusters 486

9.5.1 Job Management System 486

9.5.2 Survey of Job Management Systems 492

9.5.3 Load-Sharing Facility(LSF) 494

9.6 Bibliographic Notes and Problems 501

Chapter 10 Clusters of Servers and Workstations 505

10.1 Cluster Products and Research Projects 505

10.1.1 Supporting Trend of Cluster Products 506

10.1.2 Cluster of SMP Servers 508

10.1.3 Cluster Research Projects 509

10.2 Microsoft Wolfpack for NT Clusters 511

10.2.1 Microsoft Wolfpack Configurations 512

10.2.2 Hot Standby Multiserver Clusters 513

10.2.3 Active Availability Clusters 514

10.2.4 Fault-Tolerant Multiserver Cluster 516

10.3 The IBM SP System 518

10.3.1 Design Goals and Strategies 518

10.3.2 The SP2 System Architecture 521

10.3.3 I/O and Internetworking 523

10.3.4 The SP System Software 526

10.3.5 The SP2 and Beyond 530

10.4 The Digital TurCluster 531

10.4.1 The TruCluster Architecture 531

10.4.2 The Memory Channel Interconnect 534

10.4.3 Programming the TruCluster 537

10.4.4 The TruCluster System Software 540

10.5 The Berkeley NOW Project 541

10.5.1 Active Messages for Fast Communicatio 541

10.5.2 GLUnix for Global Resource Management 547

10.5.3 The xFS Serverless Network File System 549

10.6 TreadMarks: A Software-Implemented DSM Cluster 556

10.6.1 Boundary Conditions 556

10.6.2 User Interface for DSM 557

10.6.3 Implementation Issues 559

10.7 Bibliographic Notes and Problems 561

11.1.1 MPP Characteristics and Issues 565

11.1 An Overview of MPP Technology 565

Chapter 11 MPP Architecture and Performance 565

11.1.2 MPP Systems-An Overview 569

11.2 The Cray T3E System 570

11.2.1 The System Architecture of T3E 571

11.2.2 The System Software in T3E 573

11.3 New Generation of ASCI/MPPs 574

11.3.1 ASCI Scalable Design Strategy 574

11.3.2 Hardware and Software Requirements 576

11.3.3 Contracted ASCI/MPP Platforms 577

11.4.1 The Option Red Architecture 579

11.4 Intel/Sandia ASCI Option Red 579

11.4.2 Option Red System Software 582

11.5 Parallel NAS Benchmark Results 584

11.5.1 The NAS Parallel Benchmarks 585

11.5.2 Superstep Structure and Granularity 586

11.5.3 Memory, I/O, and Communications 587

11.6 MPI and STAP Benchmark Results 590

11.6.1 MPI Performance Measurements 590

11.6.2 MPI Latency and Aggregate Bandwidth 592

11.6.3 STAP Benchmark Evaluation of MPPs 594

11.6.4 MPP Architectural Implications 600

11.7 Bibliographic Notes and Problems 603

Part Ⅳ Parallel Programming 607

Chapter 12 Parallel Paradigms and Programming Models 609

12.1 Paradigms and Programmability 609

12.1.1 Algorithmic Paradigms 609

12.1.2 Programmability Issues 612

12.1.3 Parallel Programming Examples 614

12.2 Parallel Programming Models 617

12.2.1 Implicit Parallelism 617

12.2.2 Explicit Parallel Models 621

12.2.3 Comparison of Four Models 624

12.2.4 Other Parallel Programming Models 627

12.3 Shared-Memory Programming 629

12.3.1 The ANSI X3H5 Shared-Memory Model 629

12.3.2 The POSIX Threads (Pthreads)Moedl 634

12.3.3 The OpenMP Standard 636

12.3.4 The SGI Power C Model 640

12.3.5 C//: A Structured Parallel C Language 643

12.4 Bibliographic Notes and Problems 649

13.1.1 Message-Passing Libraries 653

Chapter 13 Message-Passing Programming 653

13.1 The Message-Passing Paradigm 653

13.1.2 Message-Passing Modes 655

13.2 Message-Passing Interface(MPI) 658

13.2.1 MPI Message 661

13.2.2 Message Envelope in MPI 668

13.2.3 Point-to-Point Communications 674

13.2.4 Collective MIP Communications 678

13.2.5 The MPI-2 Extensions 682

13.3 Parallel Virtual Machine(PVM) 686

13.3.1 Virtual Machine Construction 687

13.3.2 Process Management in PVM 689

13.3.3 Communication with PVM 693

13.4 Bibliographic Notes and Problems 699

Chapter 14 Data-Parallel Programming 705

14.1 The Data-Parallel Moedl 705

14.2 The Fortran 90 Approach 706

14.2.1 Parallel Array Operations 706

14.2.2 Intrinsic Functions in Fortran 90 708

14.3 High-Performance Fortran 711

14.3.1 Support for Data Parallelism 712

14.3.2 Data Mapping in HPF 715

14.3.3 Summary of Fortran 90 and HPF 721

14.4 Other Data-Parallel Approaches 725

14.4.1 Fortran 95 and Fortran 2001 725

14.4.2 The pC++and Nesl Approaches 728

14.5 Bibliographic Notes and Problems 733

Bibliography 737

Web Resources List 765

Subject Index 787

Author Index 799