《嵌入式计算:体系结构、编译器和工具的VLIN方法》PDF下载

  • 购买积分:19 如何计算积分?
  • 作  者:(美)费希尔(Fisher,J.A.)等著
  • 出 版 社:北京:机械工业出版社
  • 出版年份:2006
  • ISBN:7111197712
  • 页数:671 页
图书介绍:本书将技术深度和实践经验完美结合,清晰阐述了通用计算系统和嵌入式计算系统在硬件、软件、工具和操作系统层面上的不同,内容全面,实例丰富,非常适合实践工程师(芯片设计师和嵌入式系统设计师)和专业技术人员使用,同时也可供高等院校相关专业的师生参考学习。

CHAPTER 1 An Introduction to Embedded Processing 1

1.1 What Is Embedded Computing? 3

1.1.1 Attributes of Embedded Devices 4

1.1.2 Embedded Is Growing 5

1.2 Distinguishing Between Embedded and General-Purpose Computing 6

1.2.1 The "Run One Program Only" Phenomenon 8

1.2.2 Backward and Binary Compatibility 9

1.2.3 Physical Limits in the Embedded Domain 10

1.3 Characterizing Embedded Computing 11

1.3.1 Categorization by Type of Processing Engine 12

Digital Signal Processors 13

Network Processors 16

1.3.2 Categorization by Application Area 17

The Image Processing and Consumer Market 18

The Communications Market 20

The Automotive Market 22

1.3.3 Categorization by Workload Differences 22

1.4 Embedded Market Structure 23

1.4.1 The Market for Embedded Processor Cores 24

1.4.2 Business Model of Embedded Processors 25

1.4.3 Costs and Product Volume 26

1.4.4 Software and the Embedded Software Market 28

1.4.5 Industry Standards 28

1.4.6 Product Life Cycle 30

1.4.7 The Transition to SoC Design 31

Effects of SoC on the Business Model 34

Centers of Embedded Design 35

1.4.8 The Future of Embedded Systems 36

Connectivity:Always-on Infrastructure 36

State:Personal Storage 36

Administration 37

Security 37

The Next Generation 37

1.5 Further Reading 38

1.6 Exercises 40

CHAPTER 2 An Overview of VLIW and ILP 45

2.1 Semantics and Parallelism 46

2.1.1 Baseline:Sequential Program Semantics 46

2.1.2 Pipelined Execution,Overlapped Execution,and Multiple Execution Units 47

2.1.3 Dependence and Program Rearrangement 51

2.1.4 ILP and Other Forms of Parallelism 52

2.2 Design Philosophies 54

2.2.1 An Illustration of Design Philosophies:RISC Versus CISC 56

2.2.2 First Definition of VLIW 57

2.2.3 A Design Philosophy:VLIW 59

VLIW Versus Superscalar 59

VLIW Versus DSP 62

2.3 Role of the Compiler 63

2.3.1 The Phases of a High-Performance Compiler 63

2.3.2 Compiling for ILP and VLIW 65

2.4 VLIW in the Embedded and DSP Domains 69

2.5 Historical Perspective and Further Reading 71

2.5.1 ILP Hardware in the 1960s and 1970s 71

Early Supercomputer Arithmetic Units 71

Attached Signal Processors 72

Horizontal Microcode 72

2.5.2 The Development of ILP Code Generation in the 1980s 73

Acyclic Microcode Compaction Techniques 73

Cyclic Techniques:Software Pipelining 75

2.5.3 VLIW Development in the 1980s 76

2.5.4 ILP in the 1990s and 2000s 77

2.6 Exercises 78

CHAPTER 3 An Overview of ISA Design 83

3.1 Overview:What to Hide 84

3.1.1 Architectural State:Memory and Registers 84

3.1.2 Pipelining and Operational Latency 85

3.1.3 Multiple Issue and Hazards 86

Exposing Dependence and Independence 86

Structural Hazards 87

Resource Hazards 89

3.1.4 Exception and Interrupt Handling 89

3.1.5 Discussion 90

3.2 Basic VLIW Design Principles 91

3.2.1 Implications for Compilers and Implementations 92

3.2.2 Execution Model Subtleties 93

3.3 Designing a VLIW ISA for Embedded Systems 95

3.3.1 Application Domain 96

3.3.2 ILP Style 98

3.3.3 Hardware/Software Tradeoffs 100

3.4 Instruction-set Encoding 101

3.4.1 A Larger Definition of Architecture 101

3.4.2 Encoding and Architectural Style 105

RISC Encodings 107

CISC Encodings 108

VLIW Encodings 109

Why Not Superscalar Encodings? 109

DSP Encodings 110

Vector Encodings 111

3.5 VLIW Encoding 112

3.5.1 Operation Encoding 113

3.5.2 Instruction Encoding 113

Fixed-overhead Encoding 115

Distributed Encoding 115

Template-based Encoding 116

3.5.3 Dispatching and Opcode Subspaces 117

3.6 Encoding and Instruction-set Extensions 119

3.7 Further Reading 121

3.8 Exercises 121

CHAPTER 4 Architectural Structures in ISA Design 125

4.1 The Datapath 127

4.1.1 Location of Operands and Results 127

4.1.2 Datapath Width 127

4.1.3 Operation Repertoire 129

Simple Integer and Compare Operations 131

Carry,Overflow,and Other Flags 131

Common Bitwise Utilities 132

Integer Multiplication 132

Fixed-point Multiplication 133

Integer Division 135

Floating-point Operations 136

Saturated Arithmetic 137

4.1.4 Micro-SIMD Operations 139

Alignment Issues 141

Precision Issues 141

Dealing with Control Flow 142

Pack,Unpack,and Mix 143

Reductions 143

4.1.5 Constants 144

4.2 Registers and Clusters 144

4.2.1 Clustering 145

Architecturally Invisible Clustering 147

Architecturally Visible Clustering 147

4.2.2 Heterogeneous Register Files 149

4.2.3 Address and Data Registers 149

4.2.4 Special Register File Features 150

Indexed Register Files 150

Rotating Register Files 151

4.3 Memory Architecture 151

4.3.1 Addressing Modes 152

4.3.2 Access Sizes 153

4.3.3 Alignment Issues 153

4.3.4 Caches and Local Memories 154

Prefetching 154

Local Memories and Lockable Caches 156

4.3.5 Exotic Addressing Modes for Embedded Processing 156

4.4 Branch Architecture 156

4.4.1 Unbundling Branches 158

Two-step Branching 159

Three-step Branching 159

4.4.2 Multiway Branches 160

4.4.3 Multicluster Branches 161

4.4.4 Branches and Loops 162

4.5 Speculation and Predication 163

4.5.1 Speculation 163

Control Speculation 164

Data Speculation 167

4.5.2 Predication 168

Full Predication 169

Partial Predication 170

Cost and Benefits of Predication 171

Predication in the Embedded Domain 172

4.6 System Operations 173

4.7 Further Reading 174

4.8 Exercises 175

CHAPTER 5 Microarchitecture Design 179

5.1 Register File Design 182

5.1.1 Register File Structure 182

5.1.2 Register Files,Technology,and Clustering 183

5.1.3 Separate Address and Data Register Files 184

5.1.4 Special Registers and Register File Features 186

5.2 Pipeline Design 186

5.2.1 Balancing a Pipeline 187

5.3 VLIW Fetch,Sequencing,and Decoding 191

5.3.1 Instruction Fetch 191

5.3.2 Alignment and Instruction Length 192

5.3.3 Decoding and Dispersal 194

5.3.4 Decoding and ISA Extensions 195

5.4 The Datapath 195

5.4.1 Execution Units 197

5.4.2 Bypassing and Forwarding Logic 200

5.4.3 Exposing Latencies 202

5.4.4 Predication and Selects 204

5.5 Memory Architecture 206

5.5.1 Local Memory and Caches 206

5.5.2 Byte Manipulation 209

5.5.3 Addressing,Protection,and Virtual Memory 210

5.5.4 Memories in Multiprocessor Systems 211

5.5.5 Memory Speculation 213

5.6 The Control Unit 214

5.6.1 Branch Architecture 214

5.6.2 Predication and Selects 215

5.6.3 Interrupts and Exceptions 216

5.6.4 Exceptions and Pipelining 218

Drain and Flush Pipeline Models 218

Early Commit 219

Delayed Commit 220

5.7 Control Registers 221

5.8 Power Considerations 221

5.8.1 Energy Efficiency and ILP 222

System-level Power Considerations 224

5.9 Further Reading 225

5.10 Exercises 227

CHAPTER 6 System Design and Simulation 231

6.1 System-on-a-Chip(SoC) 231

6.1.1 IP Blocks and Design Reuse 232

A Concrete SoC Example 233

Virtual Components and the VSIA Alliance 235

6.1.2 Design Flows 236

Creation Flow 236

Verification Flow 238

6.1.3 SoC Buses 239

Data Widths 240

Masters,Slaves,and Arbiters 241

Bus Transactions 242

Test Modes 244

6.2 Processor Cores and SoC 245

6.2.1 Nonprogrammable Accelerators 246

Reconfigurable Logic 248

6.2.2 Multiprocessing on a Chip 250

Symmetric Multiprocessing 250

Heterogeneous Multiprocessing 251

Example:A Multicore Platform for Mobile Multimedia 252

6.3 Overview of Simulation 254

6.3.1 Using Simulators 256

6.4 Simulating a VLIW Architecture 257

6.4.1 Interpretation 258

6.4.2 Compiled Simulation 259

Memory 262

Registers 263

Control Flow 263

Exceptions 266

Analysis of Compiled Simulation 267

Performance Measurement and Compiled Simulation 268

6.4.3 Dynamic Binary Translation 268

6.4.4 Trace-driven Simulation 270

6.5 System Simulation 271

6.5.1 I/O and Concurrent Activities 272

6.5.2 Hardware Simulation 272

Discrete Event Simulation 274

6.5.3 Accelerating Simulation 275

In-Circuit Emulation 275

Hardware Accelerators for Simulation 276

6.6 Validation and Verification 276

6.6.1 Co-simulation 278

6.6.2 Simulation,Verification,and Test 279

Formal Verification 280

Design for Testability 280

Debugging Support for SoC 281

6.7 Further Reading 282

6.8 Exercises 284

CHAPTER 7 Embedded Compiling and Toolchains 287

7.1 What Is Important in an ILP Compiler? 287

7.2 Embedded Cross-Developmant Toolchains 290

7.2.1 Compiler 291

7.2.2 Assembler 292

7.2.3 Libraries 294

7.2.4 Linker 296

7.2.5 Post-link Optimizer 297

7.2.6 Run-time Program Loader 297

7.2.7 Simulator 299

7.2.8 Debuggers and Monitor ROMs 300

7.2.9 Automated Test Systems 301

7.2.10 Profiling Tools 302

7.2.11 Binary Utilities 302

7.3 Structure of an ILP Compiler 302

7.3.1 Front End 304

7.3.2 Machine-independent Optimizer 304

7.3.3 Back End:Machine-specific Optimizations 306

7.4 Code Layout 306

7.4.1 Code Layout Techniques 306

DAG-based Placement 308

The "Pettis-Hansen" Technique 310

Procedure Inlining 310

Cache Line Coloring 311

Temporal-order Placement 311

7.5 Embedded-Specific Tradeoffs for Compilers 311

7.5.1 Space,Time,and Energy Tradeoffs 312

7.5.2 Power-specific Optimizations 315

Fundamentals of Power Dissipation 316

Power-aware Software Techniques 317

7.6 DSP-Specific Compiler Optimizations 320

7.6.1 Compiler-visible Features of DSPs 322

Heterogeneous Registers 322

Addressing Modes 322

Limited Connectivity 323

Local Memories 323

Harvard Architecture 324

7.6.2 Instruction Selection and Scheduling 325

7.6.3 Address Computation and Offset Assignment 327

7.6.4 Local Memories 327

7.6.5 Register Assignment Techniques 328

7.6.6 Retargetable DSP and ASIP Compilers 329

7.7 Further Reading 332

7.8 Exercises 333

CHAPTER 8 Compiling for VLIWs and ILP 337

8.1 Profiling 338

8.1.1 Types of Profiles 338

8.1.2 Profile Collection 341

8.1.3 Synthetic Profiles(Heuristics in Lieu of Profiles) 341

8.1.4 Profile Bookkeeping and Methodology 342

8.1.5 Profiles and Embedded Applications 342

8.2 Scheduling 343

8.2.1 Acyclic Region Types and Shapes 345

Basic Blocks 345

Traces 345

Superblocks 345

Hyperblocks 347

Treegions 347

Percolation Scheduling 348

8.2.2 Region Formation 350

Region Selection 351

Enlargement Techniques 353

Phase-ordering Considerations 356

8.2.3 Schedule Construction 357

Analyzing Programs for Schedule Construction 359

Compaction Techniques 362

Compensation Code 365

Another View of Scheduling Problems 367

8.2.4 Resource Management During Scheduling 368

Resource Vectors 368

Finite-state Automata 369

8.2.5 Loop Scheduling 371

Modulo Scheduling 373

8.2.6 Clustering 380

8.3 Register Allocation 382

8.3.1 Phase-ordering Issues 383

Register Allocation and Scheduling 383

8.4 Speculation and Predication 385

8.4.1 Control and Data Speculation 385

8.4.2 Predicated Execution 386

8.4.3 Prefetching 389

8.4.4 Data Layout Methods 390

8.4.5 Static and Hybrid Branch Prediction 390

8.5 Instruction Selection 390

8.6 Further Reading 391

8.7 Exercises 395

CHAPTER 9 The Run-time System 399

9.1 Exceptions,Interrupts,and Traps 400

9.1.1 Exception Handling 400

9.2 Application Binary Interface Considerations 402

9.2.1 Loading Programs 404

9.2.2 Data Layout 406

9.2.3 Accessing Global Data 407

9.2.4 Calling Conventions 409

Registers 409

Call Instructions 409

Call Sites 410

Function Prologues and Epilogues 412

9.2.5 Advanced ABI Topics 412

Variable-length Argument Lists 412

Dynamic Stack Allocation 413

Garbage Collection 414

Linguistic Exceptions 414

9.3 Code Compression 415

9.3.1 Motivations 416

9.3.2 Compression and Information Theory 417

9.3.3 Architectural Compression Options 417

Decompression on Fetch 420

Decompression on Refill 420

Load-time Decompression 420

9.3.4 Compression Methods 420

Hand-tuned ISAs 421

Ad Hoc Compression Schemes 421

RAM Decompression 422

Dictionary-based Software Compression 422

Cache-based Compression 422

Quantifying Compression Benefits 424

9.4 Embedded Operating Systems 427

9.4.1 "Traditional" OS Issues Revisited 427

9.4.2 Real-time Systems 428

Real-time Scheduling 429

9.4.3 Multiple Flows of Control 431

Threads,Processes,and Microkernels 432

9.4.4 Market Considerations 433

Embedded Linux 435

9.4.5 Downloadable Code and Virtual Machines 436

9.5 Multiprocessing and Multithreading 438

9.5.1 Multiprocessing in the Embedded World 438

9.5.2 Multiprocessing and VLIW 439

9.6 Further Reading 440

9.7 Exercises 441

CHAPTER 10 Application Design and Customization 443

10.1 Programming Language Choices 443

10.1.1 Overview of Embedded Programming Languages 444

10.1.2 Traditional C and ANSI C 445

10.1.3 C++ and Embedded C++ 447

Embedded C++ 449

10.1.4 Matlab 450

10.1.5 Embedded Java 452

The Allure of Embedded Java 452

Embedded Java:The Dark Side 455

10.1.6 C Extensions for Digital Signal Processing 456

Restricted Pointers 456

Fixed-point Data Types 459

Circular Arrays 461

Matrix Referencing and Operators 462

10.1.7 Pragmas,Intrinsics,and Inline Assembly Language Code 462

Compiler Pragmas and Type Annotations 462

Assembler Inserts and Intrinsics 463

10.2 Performance,Benchmarking,and Tuning 465

10.2.1 Importance and Methodology 465

10.2.2 Tuning an Application for Performance 466

Profiling 466

Performance Tuning and Compilers 467

Developing for ILP Targets 468

10.2.3 Benchmarking 473

10.3 Scalability and Customizability 475

10.3.1 Scalability and Architecture Families 476

10.3.2 Exploration and Scalability 477

10.3.3 Customization 478

Customized Implementations 479

10.3.4 Reconfigurable Hardware 480

Using Programmable Logic 480

10.3.5 Customizable Processors and Tools 481

Describing Processors 481

10.3.6 Tools for Customization 483

Customizable Compilers 485

10.3.7 Architecture Exploration 487

Dealing with the Complexity 488

Other Barriers to Customization 488

Wrapping Up 489

10.4 Further Reading 489

10.5 Exercises 490

CHAPTER 11 Application Areas 493

11.1 Digital Printing and Imaging 493

11.1.1 Photo Printing Pipeline 495

JPEG Decompression 495

Scaling 496

Color Space Conversion 497

Dithering 499

11.1.2 Implementation and Performance 501

Summary 505

11.2 Telecom Applications 505

11.2.1 Voice Coding 506

Waveform Codecs 506

Vocoders 507

Hybrid Coders 508

11.2.2 Multiplexing 509

11.2.3 The GSM Enhanced Full-rate Codec 510

Implementation and Performance 510

11.3 Other Application Areas 514

11.3.1 Digital Video 515

MPEG-1and MPEG-2 516

MPEG-4 518

11.3.2 Automotive 518

Fail-safety and Fault Tolerance 519

Engine Control Units 520

In-vehicle Networking 520

11.3.3 Hard Disk Drives 522

Motor Control 524

Data Decoding 525

Disk Scheduling and On-disk Management Tasks 526

Disk Scheduling and Off-disk Management Tasks 527

11.3.4 Networking and Network Processors 528

Network Processors 531

11.4 Further Reading 535

11.5 Exercises 537

APPENDIX A The VEX System 539

A.1 The VEX Instruction-set Architecture 540

A.1.1 VEX Assembly Language Notation 541

A.1.2 Clusters 542

A.1.3 Execution Model 544

A.1.4 Architecture State 545

A.1.5 Arithmetic and Logic Operations 545

Examples 547

A.1.6 Intercluster Communication 549

A.1.7 Memory Operations 550

A.1.8 Control Operations 552

Examples 553

A.1.9 Structure of the Default VEX Cluster 554

Register Files and Immediates 555

A.1.10 VEX Semantics 556

A.2 The VEX Run-time Architecture 558

A.2.1 Data Allocation and Layout 559

A.2.2 Register Usage 560

A.2.3 Stack Layout and Procedure Linkage 560

Procedure Linkage 563

A.3 The VEX C Compiler 566

A.3.1 Command Line Options 568

Output Files 569

Preprocessing 570

Optimization 570

Profiling 572

Language Definition 573

Libraries 574

Passing Options to Compile Phases 574

Terminal Output and Process Control 575

Other Options 575

A.3.2 Compiler Pragmas 576

Unrolling and Profiling 576

Assertions 578

Memory Disambiguation 578

Cache Control 581

A.3.3 Inline Expansion 583

Multiflow-style Inlining 583

C99-style Inlining 584

A.3.4 Machine Model Parameters 585

A.3.5 Custom Instructions 586

A.4 Visualization Tools 588

A.5 The VEX Simulation System 589

A.5.1 gprofSupport 591

A.5.2 Simulating Custom Instructions 594

A.5.3 Simulating the Memory Hierarchy 595

A.6 Customizing the VEX Toolchain 596

A.6.1 Clusters 596

A.6.2 Machine Model Resources 597

A.6.3 Memory Hierarchy Parameters 599

A.7 Examples of Tool Usage 599

A.7.1 Compile and Run 599

A.7.2 Profiling 602

A.7.3 Custom Architectures 603

A.8 Exercises 605

APPENDIX B Glossary 607

APPENDIX C Bibliography 631

Index 661