1 Introduction to Digital Signal Processing Systems 1
1.1 Introduction 1
1.2 Typical DSP Algorithms 2
1.3 DSP Application Demands and Scaled CMOS Technologies 27
1.4 Representations of DSP Algorithms 31
1.5 Book Outline 40
References 41
2 Iteration Bound 43
2.1 Introduction 43
2.2 Data-Flow Graph Representations 43
2.3 Loop Bound and Iteration Bound 45
2.4 Algorithms for Computing Iteration Bound 47
2.5 Iteration Bound of Multirate Data-Flow Graphs 55
2.6 Conclusions 57
2.7 Problems 58
References 61
3 Pipelining and Parallel Processing 63
3.1 Introduction 63
3.2 Pipelining of FIR Digital Filters 64
3.3 Parallel Processing 69
3.4 Pipelining and Parallel Processing for Low Power 74
3.5 Conclusions 82
3.6 Problems 83
References 88
4.1 Introduction 91
4 Retiming 91
4.2 Definitions and Properties 93
4.3 Solving Systems of Inequalities 95
4.4 Retiming Techniques 97
4.5 Conclusions 112
4.6 Problems 112
References 118
5 Unfolding 119
5.1 Introduction 119
5.2 An Algorithm for Unfolding 121
5.3 Properties of Unfolding 124
5.4 Critical Path, Unfolding, and Retiming 127
5.5 Applications of Unfolding 128
5.7 Problems 140
5.6 Conclusions 140
References 147
6 Folding 149
6.1 Introduction 149
6.2 Folding Transformation 151
6.3 Register Minimization Techniques 157
6.4 Register Minimization in Folded Architectures 163
6.5 Folding of Multirate Systems 170
6.6 Conclusions 174
6.7 Problems 174
References 186
7 Systolic Architecture Design 189
7.1 Introduction 189
7.2 Systolic Array Design Methodology 190
7.3 FIR Systolic Arrays 192
7.4 Selection of Scheduling Vector 201
7.5 Matrix-Matrix Multiplication and 2D Systolic Array Design 205
7.6 Systolic Design for Space Representations Containing Delays 210
7.7 Conclusions 213
7.8 Problems 213
References 223
8 Fast Convolution 227
8.1 Introduction 227
8.2 Cook-Toom Algorithm 228
8.3 Winograd Algorithm 237
8.4 Iterated Convolution 244
8.5 Cyclic Convolution 246
8.6 Design of Fast Convolution Algorithm by Inspection 250
8.7 Conclusions 251
8.8 Problems 251
References 253
9 Algorithmic Strength Reduction in Filters and Transforms 255
9.1 Introduction 255
9.2 Parallel FIR Filters 256
9.3 Discrete Cosine Transform and Inverse DCT 275
9.4 Parallel Architectures for Rank-Order Filters 285
9.5 Conclusions 297
9.6 Problems 297
References 310
10.1 Introduction 313
10 Pipelined and Parallel Recursive and Adaptive Filters 313
10.2 Pipeline Interleaving in Digital Filters 314
10.3 Pipelining in 1st-Order IIR Digital Filters 320
10.4 Pipelining in Higher-Order IIR Digital Filters 325
10.5 Parallel Processing for IIR filters 339
10.6 Combined Pipelining and Parallel Processing for IIR Filters 345
10.7 Low-Power IIR Filter Design Using Pipelining and Parallel Processing 348
10.8 Pipelined Adaptive Digital Filters 351
10.9 Conclusions 367
10.10 Problems 367
References 374
11 Scaling and Roundoff Noise 377
11.1 Introduction 377
11.2 Scaling and Roundoff Noise 378
11.3 State Variable Description of Digital Filters 382
11.4 Scaling and Roundoff Noise Computation 386
11.5 Roundoff Noise in Pipelined IIR Filters 391
11.6 Roundoff Noise Computation Using State Variable Description 403
11.7 Slow-Down, Retiming, and Pipelining 405
11.8 Conclusions 410
11.9 Problems 410
References 419
12 Digital Lattice Filter Structures 421
12.1 Introduction 421
12.2 Schur Algorithm 422
12.3 Digital Basic Lattice Filters 429
12.4 Derivation of One-Multiplier Lattice Filter 437
12.5 Derivation of Normalized Lattice Filter 444
12.6 Derivation of Scaled-Normalized Lattice Filter 447
12.7 Roundoff Noise Calculation in Lattice Filters 454
12.8 Pipelining of Lattice IIR Digital Filters 458
12.9 Design Examples of Pipelined Lattice Filters 464
12.10 Low-Power CMOS Lattice IIR Filters 469
12.11 Conclusions 470
12.12 Problems 470
References 474
13 Bit-Level Arithmetic Architectures 477
13.1 Introduction 477
13.2 Parallel Multipliers 478
13.3 Interleaved Floor-plan and Bit-Plane-Based Digital Filters 489
13.4 Bit-Serial Multipliers 490
13.5 Bit-Serial Filter Design and Implementation 499
13.6 Canonic Signed Digit Arithmetic 505
13.7 Distributed Arithmetic 511
13.8 Conclusions 518
13.9 Problems 518
References 527
14 Redundant Arithmetic 529
14.1 Introduction 529
14.2 Redundant Number Representations 530
14.3 Carry-Free Radix-2 Addition and Subtraction 531
14.4 Hybrid Radix-4 Addition 536
14.5 Radix-2 Hybrid Redundant Multiplication Architectures 540
14.6 Data Format Conversion 545
14.7 Redundant to Nonredundant Converter 547
14.8 Conclusions 551
14.9 Problems 552
References 555
15 Numerical Strength Reduction 559
15.1 Introduction 559
15.2 Subexpression Elimination 560
15.3 Multiple Constant Multiplication 560
15.4 Subexpression Sharing in Digital Filters 566
15.5 Additive and Multiplicative Number Splitting 574
15.6 Conclusions 583
15.7 Problems 583
References 589
16 Synchronous, Wave, and Asynchronous Pipelines 591
16.1 Introduction 591
16.2 Synchronous Pipelining and Clocking Styles 593
16.3 Clock Skew and Clock Distribution in Bit-Level Pipelined VLSI Designs 601
16.4 Wave Pipelining 606
16.5 Constraint Space Diagram and Degree of Wave Pipelining 612
16.6 Implementation of Wave-Pipelined Systems 614
16.7 Asynchronous Pipelining 619
16.8 Signal Transition Graphs 622
16.9 Use of STG to Design Interconnection Circuits 626
16.10 Implementation of Computational Units 631
16.11 Conclusions 640
16.12 Problems 640
References 643
17 Low-Power Design 645
17.1 Introduction 645
17.2 Theoretical Background 648
17.3 Scaling Versus Power Consumption 650
17.4 Power Analysis 652
17.5 Power Reduction Techniques 662
17.6 Power Estimation Approaches 671
17.7 Conclusions 688
17.8 Problems 688
References 692
18 Programmable Digital Signal Processors 695
18.1 Introduction 695
18.2 Evolution of Programmable Digital Signal Processors 696
18.3 Important Features of DSP Processors 697
18.4 DSP Processors for Mobile and Wireless Communications 703
18.5 Processors for Multimedia Signal Processing 704
18.6 Conclusions 714
References 714
Appendix A: Shortest Path Algorithms 717
A.1 Introduction 717
A.2 The Bellman-Ford Algorithm 718
A.3 The Floyd-Warshall Algorithm 720
A.4 Computational Complexities 721
References 722
Appendix B: Scheduling and Allocation Techniques 723
B.1 Introduction 723
B.2 Iterative/Constructive Scheduling Algorithms 725
B.3 Transformational Scheduling Algorithms 729
B.4 Integer Linear Programming Models 738
References 741
Appendix C: Euclidean GCD Algorithm 743
C.1 Introduction 743
C.2 Euclidean GCD Algorithm for Integers 743
C.3 Euclidean GCD Algorithm for Polynomials 745
Appendix D: Orthonormality of Schur Polynomials 747
D.1 Orthogonality of Schur Polynomials 747
D.2 Orthonormality of Schur Polynomials 749
E.1 Introduction 753
E.2 Multiplexer-Based Fast Binary Adders 753
Appendix E: Fast Binary Adders and Multipliers 753
E.3 Wallace Tree and Dadda Multiplier 758
References 761
Appendix F: Scheduling in Bit-Serial Systems 763
F.1 Introduction 763
F.2 Outline of the Scheduling Algorithm 764
F.3 Minimum Cost Solution 766
F.4 Scheduling of Edges with Delays 768
References 769
Appendix G: Coefficient Quantization in FIR Filters 771
G.1 Introduction 771
G.2 NUS Quantization Algorithm 771
References 774
Index 775