1 Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Below Your Program 10
1.3 Under the Covers 13
1.4 Performance 26
1.5 The Power Wall 39
1.6 The Sea Change:The Switch from Uniprocessors to Multiprocessors 41
1.7 Real Stuff:Manufacturing and Benchmarking the AMD Opteron X4 44
1.8 Fallacies and Pitfalls 51
1.9 Concluding Remarks 54
1.10 Historical Perspective and Further Reading 55
1.11 Exercises 56
2 Instructions:Language of the Computer 74
2.1 Introduction 76
2.2 Operations of the Computer Hardware 77
2.3 Operands of the Computer Hardware 80
2.4 Signed and Unsigned Numbers 86
2.5 Representing Instructions in the Computer 93
2.6 Logical Operations 100
2.7 Instructions for Making Decisions 104
2.8 Supporting Procedures in Computer Hardware 113
2.9 Communicating with People 122
2.10 ARM Addressing for 32-Bit Immediates and More Complex Addressing Modes 127
2.11 Parallelism and Instructions:Synchronization 133
2.12 Translating and Starting a Program 135
2.13 A C Sort Example to Put It All Together 143
2.14 Arrays versus Pointers 152
2.15 Advanced Material:Compiling C and Interpreting Java 156
2.16 Real Stuff:MIPS Instructions 156
2.17 Real Stuff:x86 Instructions 161
2.18 Fallacies and Pitfalls 170
2.19 Concluding Remarks 171
2.20 Historical Perspective and Further Reading 174
2.21 Exercises 174
3 Arithmetic for Computers 214
3.1 Introduction 216
3.2 Addition and Subtraction 216
3.3 Multiplication 220
3.4 Division 226
3.5 Floating Point 232
3.6 Parallelism and Computer Arithmetic:Associativity 258
3.7 Real Stutf:Floating Point in the x86 259
3.8 Fallacies and Pitfalls 262
3.9 Concluding Remarks 265
3.10 Historical Perspective and Further Reading 268
3.11 Exercises 269
4 The Processor 284
4.1 Introduction 286
4.2 Logic Design Conventions 289
4.3 Building a Datapath 293
4.4 A Simple Implementation Scheme 302
4.5 An Overview of Pipelining 316
4.6 Pipelined Datapath and Control 330
4.7 Data Hazards:Forwarding versus Stalling 349
4.8 Control Hazards 361
4.9 Exceptions 370
4.10 Parallelism and Advanced Instruction-Level Parallelism 377
4.11 Real Stuff:the AMD Opteron X4(Barcelona)Pipeline 390
4.12 Advanced Topic:an Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 392
4.13 Fallacies and Pitfalls 393
4.14 Concluding Remarks 394
4.15 Historical Perspective and Further Reading 395
4.16 Exercises 395
5 Large and Fast:Exploiting Memory Hierarchy 436
5.1 Introduction 438
5.2 The Basics of Caches 443
5.3 Measuring and Improving Cache Performance 461
5.4 Virtual Memory 478
5.5 A Common Framework for Memory Hierarchies 504
5.6 Virtual Machines 511
5.7 Using a Finite-State Machine to Control a Simple Cache 515
5.8 Parallelism and Memory Hierarchies:Cache Coherence 520
5.9 Advanced Material:Implementing Cache Controllers 524
5.10 Real Stuff:the AMD Opteron X4(Barcelona)and Intel Nehalem Memory Hierarchies 525
5.11 Fallacies and Pitfalls 529
5.12 Concluding Remarks 533
5.13 Historical Perspective and Further Reading 534
5.14 Exercises 534
6 Storage and Other I/O Topics 554
6.1 Introduction 556
6.2 Dependability,Reliability,and Availability 559
6.3 Disk Storage 561
6.4 Flash Storage 566
6.5 Connecting Processors,Memory,and I/O Devices 568
6.6 Interfacing I/O Devices to the Processor,Memory,and Operating System 572
6.7 I/O Performance Measures:Examples from Disk and File Systems 582
6.8 Designing an I/O System 584
6.9 Parallelism and I/O:Redundant Arrays of Inexpensive Disks 585
6.10 Real Stuff:Sun Fire x4150 Server 592
6.11 Advanced Topics:Networks 598
6.12 Fallacies and Pitfalls 599
6.13 Concluding Remarks 603
6.14 Historical Perspective and Further Reading 604
6.15 Exercises 605
7 Multicores,Multiprocessors,and Clusters 616
7.1 Introduction 618
7.2 The Difficulty of Creating Parallel Processing Programs 620
7.3 Shared Memory Multiprocessors 624
7.4 Clusters and Other Message-Passing Multiprocessors 627
7.5 Hardware Multithreading 631
7.6 SISD,MIMD,SIMD,SPMD,and Vector 634
7.7 Introduction to Graphics Processing Units 640
7.8 Introduction to Multiprocessor Network Topologies 646
7.9 Multiprocessor Benchmarks 650
7.10 Roofline:A Simple Performance Model 653
7.11 Real Stuff:Benchmarking Four Multicores Using the Roofline Model 661
7.12 Fallacies and Pitfalls 670
7.13 Concluding Remarks 672
7.14 Historical Perspective and Further Reading 674
7.15 Exercises 674