Chapter 1 Fundamentals of Quantitative Design and Analysis 2
1.1 Introduction 2
1.2 Classes of Computers 5
1.3 Defining Computer Architecture 11
1.4 Trends in Technology 17
1.5 Trends in Power and Energy in Integrated Circuits 21
1.6 Trends in Cost 27
1.7 Dependability 33
1.8 Measuring, Reporting, and Summarizing Performance 36
1.9 Quantitative Principles of Computer Design 44
1.10 Putting It All Together: Performance, Price, and Power 52
1.11 Fallacies and Pitfalls 55
1.12 Concluding Remarks 59
1.13 Historical Perspectives and References 61
Case Studies and Exercises by Diana Franklin 61
Chapter 2 Memory Hierarchy Design 72
2.1 Introduction 72
2.2 Ten Advanced Optimizations of Cache Performance 78
2.3 Memory Technology and Optimizations 96
2.4 Protection: Virtual Memory and Virtual Machines 105
2.5 Crosscutting Issues: The Design of Memory Hierarchies 112
2.6 Putting It All Together: Memory Hierachies in the ARM Cortex-A8 and Intel Core i7 113
2.7 Fallacies and Pitfalls 125
2.8 Concluding Remarks: Looking Ahead 129
2.9 Historical Perspective and References 131
Case Studies and Exercises by Norman P. Jouppi,Naveen Muralimanohar, and Sheng Li 131
Chapter 3 Instruction-Level Parallelism and Its Exploitation 148
3.1 Instruction-Level Parallelism: Concepts and Challenges 148
3.2 Basic Compiler Techniques for Exposing ILP 156
3.3 Reducing Branch Costs with Advanced Branch Prediction 162
3.4 Overcoming Data Hazards with Dynamic Scheduling 167
3.5 Dynamic Scheduling: Examples and the Algorithm 176
3.6 Hardware-Based Speculation 183
3.7 Exploiting ILP Using Multiple Issue and Static Scheduling 192
3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, andSpeculation 197
3.9 Advanced Techniques for Instruction Delivery and Speculation 202
3.10 Studies of the Limitations of ILP 213
3.11 Cross-Cutting Issues: ILP Approaches and the Memory System 221
3.12 Multithreading: Exploiting Thread-Level Parallelism to ImproveUniprocessor Throughput 223
3.13 Putting It All Together: The Intel Core i7 and ARM Cortex-A8 233
3.14 Fallacies and Pitfalls 241
3.15 Concluding Remarks: What's Ahead? 245
3.16 Historical Perspective and References 247
Case Studies and Exercises by Jason D. Bakos and Robert P Colwell 247
Chapter4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures 262
4.1 Introduction 262
4.2 Vector Architecture 264
4.3 SIMD Instruction Set Extensions for Multimedia 282
4.4 Graphics Processing Units 288
4.5 Detecting and Enhancing Loop-Level Parallelism 315
4.6 Crosscutting Issues 322
4.7 Putting It All Together: Mobile versus Server GPUs and Tesla versus Core i7 323
4.8 Fallacies and Pitfalls 330
4.9 Concluding Remarks 332
4.10 Historical Perspective and References 334
Case Study and Exercises by Jason D. Bakos 334
Chapter 5 Thread-Level Parallelism 344
5.1 Introduction 344
5.2 Centralized Shared-Memory Architectures 351
5.3 Performance of Symmetric Shared-Memory Multiprocessors 366
5.4 Distributed Shared-Memory and Directory-Based Coherence 378
5.5 Synchronization: The Basics 386
5.6 Models of Memory Consistency: An Introduction 392
5.7 Crosscutting Issues 395
5.8 Putting It All Together: Multicore Processors and Their Performance 400
5.9 Fallacies and Pitfalls 405
5.10 Concluding Remarks 409
5.11 Historical Perspectives and References 412
Case Studies and Exercises by Amr Zaky and David A. Wood 412
Chapter6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism 432
6.1 Introduction 432
6.2 Programming Models and Workloads for Warehouse-Scale Computers 436
6.3 Computer Architecture of Warehouse-Scale Computers 441
6.4 Physical Infrastructure and Costs of Warehouse-Scale Computers 446
6.5 Cloud Computing: The Return of Utility Computing 455
6.6 Crosscutting Issues 461
6.7 Putting It All Together: A Google Warehouse-Scale Computer 464
6.8 Fallacies and Pitfalls 471
6.9 Concluding Remarks 475
6.10 Historical Perspectives and References 476
Case Studies and Exercises by Parthasarathy Ranganathan 476