1 Introduction 1
1.1 2.5-D Integration 5
1.2 Enabling Technologies 8
1.2.1 Fabrication Technology 8
1.2.2 Testing Methodology and Fault Tolerance Technique 9
1.2.3 Design Technology 10
1.3 Objectives and Book Organization 13
References 16
2 A Cost Comparison of VLSI Integration Schemes 21
2.1 Non-Monolithic Integration Schemes 22
2.1.1 Multiple-Reticle Wafer 23
2.1.2 Multiple Chip Module(MCM) 23
2.1.3 Three-Dimensional(3-D)integration 24
2.2 Yield Analysis of Different VLSI Integration Approaches 26
2.2.1 Monolithic Soc 28
2.2.2 Multiple-Reticle Wafer(MRW) 28
2.2.3 Three-Dimensional(3-D)Integration 30
2.2.4 2.5-D System Integration 31
2.2.5 Multi-Chip Module 34
2.2.6 Summing Up 35
2.3 Observations 37
References 38
3 Design Case Studies 42
3.1 Crossbar 43
3.2 A 2.5-D Rambus DRAM Architecture 46
3.2.1 Tackle the Long Bus Wire 46
3.2 2 Serialized Channel in the 3rd Dimension 48
3.3 A 2.5-D Redesign of PipeRench 50
3.3.1 The 2.5-D Implementation 52
3.3.2 Simulation Results 54
3.4 A 2.5-D Integrated Microprocessor System 56
3.4.1 A 2.5-D Integrated Microprocessor System 57
3.4.2 An Analytical Performance Model 62
3.4.3 Detailed Performance Simulation for Reduced Memory Latency 66
3.5 Observations 69
References 71
4 An Automatic 2.5-D Layout Design Flow 74
4.1 A 2.5-D Layout Design Framework 75
4.1.1 2.5-D Floorplanning 77
4.1.2 2.5-D Placement 78
4.1.3 2.5-D Global Routing 78
4.2 Observations 81
References 81
5 Floorplanning for 2.5-D Integration 83
5.1 Floorplan Level Evaluation—Category 2 Circuits 87
5.1.1 Technique 87
5.1.2 Results 89
5.2 Floorplan Level Evaluation—Category 3 Circuits 91
5.2.1 Technique 91
5.2.2 Results 92
5.3 Thermal driven floorplanning 93
5.3.1 Chip Level Thermal Modeling and Analysis for 2.5-D Floorplanning 95
5.3.2 Coupled Temperature and Leakage Estimation 99
5.3.3 2.5-D Thermal Driven Floorplanning Techniques 105
5.3.4 Experimental results 107
5.4 Observations 111
References 113
6 Placement for 2.5-D Integration 117
6.1 Pure Standard Cell Designs 119
6.1.1 Placement Techniques 120
6.1.2 Benchmarks and Layout Model 123
6.1.3 Evaluation of Vertical Partitioning Strategies 125
6.1.4 Wire length scaling 126
6.1.5 Wire length reduction 129
6.1.6 Wire Length vs.Inter-Chip Contact Pitch 133
6.2 Mixed Macro and Standard Cell Designs 134
6.2.1 Placement Techniques 136
6.2.2 Results and Analysis 138
6.3 Observations 140
References 142
7 A Road map of 2.5-D Integration 144
7.1 Stacked Memory 145
7.2 DRAM Integration for Bandwidth-Demanding Applications 147
7.3 Hybrid System Integration 151
7.4 Extremely High Performance Systems 155
7.4.1 Highly Integrated Image Sensor System 155
7.4.2 Radar-in-Cube 158
References 160
8 Conclusion and Future Work 164
8.1 Main Contributions and Conclusions 165
8.2 Future Work 168
8.2.1 Fabrication Technology for 2.5-D Systems 169
8.2.2 Testing Techniques for 2.5-D Integration 171
8.2.3 Design Technology for 2.5-D Integration 173
References 186
Index 188
Figure 1.1 Actual chip complexity increases faster than Moore's law 2
Figure 1.2 An imaginary 2.5-D system(see colour plate) 5
Figure 2.1 Total consumed silicon area of multiple-reticle wafer 30
Figure 2.2 Silicon area of the 2.5-D implementation with 4 slices of chips 33
Figure 2.3 Silicon area of the 2.5-D implementation 34
Figure 2.4 Silicon area of the MCM implementation 35
Figure 2.5 Silicon area comparison of different integration schemes 36
Figure 2.6 System planning for future VLSI systems 38
Figure 3.1 Stick diagram of a monolithic crossbar(see colour plate) 44
Figure 3.2 Stick diagram of a 2.5-D crossbar(see colour plate) 45
Figure 3.3 Rambus DRAM 46
Figure 3.4 2.5-D Rambus DRAM 48
Figure 3.5 RDRAM memory system 49
Figure 3.6 3-D Rambus DRAM:4-channel configuration 50
Figure 3.7 Original monolithic implementation of PipeRench 51
Figure 3.8 Critical path of PipeRench system 52
Figure 3.9 The 2.5-D re-design of PipeRench(see colourplate) 53
Figure 3.10 Alpha 21364 floorplan and memory bus placement 58
Figure 3.11 A 2.5-D stacked microprocessor and DRAM 60
Figure 3.12 A diagram of computer system 60
Figure 3.13 CPI calculation 63
Figure 3.14 CPI with regard to main memory latency and L2 cache miss rate(see colour plate) 65
Figure 3.15 IPC Speedup by reduced memory latency 68
Figure 4.1 A 2.5-D layout synthesis framework 76
Figure 4.2 2.5-D routing graph 79
Figure 5.1 2.5-D floorplanning 87
Figure 5.2 A floorplan example 89
Figure 5.3 Insert a 0-weight cell 91
Figure 5.4 2.5-D thermal-driven floorplanning flow 95
Figure 5.5 A 3-D IC with two stacked chip layers in a package 96
Figure 5.6 Thermal interactions between a region of the top transistor layer to all other regions on both transistor layers(not all interactions are drawn) 98
Figure 5.7 Thermal simulation of a set of floorplans with varying total area and aspect ratio(only one stacked layer is shown for each case) 99
Figure 5.8 Modeling the temperature dependency of the leakage power using a linear model 101
Figure 5.9 Leakage power distribution is confined within the placed circuit blocks 103
Figure 5.10 The distribution of wire length and temperature gradient 109
Figure 5.11 Temperature snapshots of the thermal driven floorplanning with Benchmark AMI49.Both the maximum temperature and the temperature gradient are reduced during the optimization(see colour plate) 111
Figure 6.1 2.5-D placement problem(see colour plate) 119
Figure 6.2 2.5-D placement process 121
Figure 6.3 Wire length reductions vs.vertical partitioning 126
Figure 6.4 Monolithic and 2.5-D placements for the same design 127
Figure 6.5 A profile of wire length reduction 128
Figure 6.6 Wire length reductions of standard cell placement 130
Figure 6.7 Wire length distribution of one design 132
Figure 6.8 Interconnect power comparison—2-D and 2.5-D solutions 133
Figure 6.9 Wire length vs.pitch of inter-chip contact pitch 134
Figure 6.10 Block splitting during mixed placement 138
Figure 6.11 Wire length reductions of mixed placement 140
Figure 7.1 Road map for the development of 2.5-D ICs 145
Figure 7.2 Flash memory capacity in cellular phones(adapted from) 146
Figure 7.3 Peak memory bandwidths of major NVidia GPUs 148
Figure 7.4 Intel's wire-bonded stacked Chip Scale Packaged flash memory(courtesy of Intel Corporation) 148
Figure 7.5 Normalized clock rate vs.peak memory bandwidth of NVidia 149
Figure 7.6 Tile-based multiprocessor architecture 151
Figure 7.7 A multi-chip wireless handset solution(courtesy of Texas Instruments) 152
Figure 7.8 Passive components in package 155
Figure 7.9 An image sensor system digram 156
Figure 7.10 A 2.5-D camera/IR sensor system 158
Figure 7.11 Computational demands for military radar systems(adapted from) 159
Figure 7.12 Block diagram of a radar system 159
Figure 7.13 2.5-D implementation of a radar system 160
Figure 8.1 Area power I/O for 2.5-D integration(see colour plate) 168
Figure 8.2 MEMS based inter-chip contact(see colour plate) 170
Figure 8.3 Design flow for 2.5-D ICs 184
Table 1.1 Design variables involved in designing a 2.5-D system 11
Table 2.1 Wafer bonding based 3-D integration technologies 25
Table 2.2 Values for the major parameters of our cost model 28
Table 3.1 SPICE simulation on the critical path 55
Table 3.2 Configuration of target microprocessor 58
Table 3.3 SPEC2000 benchmark programs under study 67
Table 3.4 IPC improvement by Reduced Memory Latency 68
Table 5.1 2-D and 2.5-D floorplans for Category 2 designs 90
Table 5.2 2-D and 2.5-D floorplans for Category 3 designs 93
Table 5.3 2.5-D thermal-driven floorplans with different weighting factors for thermal cost 108
Table 5.4 3-D floorplans with and without thermal concern 110
Table 6.1 Placement benchmarks 123
Table 6.2 Worst-case wire length reduction for nets with large fan-out 129
Table 6.3 Wire length comparison of standard cell placements 131
Table 6.4 Mixed Layout Benchmarks 135
Table 6.5 Wire length characteristics of mixed placement 139