《软件分布式共享存储系统的性能优化》PDF下载

  • 购买积分:11 如何计算积分?
  • 作  者:施巍松著
  • 出 版 社:北京:高等教育出版社
  • 出版年份:2004
  • ISBN:7040146150
  • 页数:251 页
图书介绍:软件分布式共享存储系统(又称为虚拟共享存储系统)由于结合了共享存储系统的易编程性和分布式存储系统的可伸缩性而成为近十几年来的一个重要研究方向。设计软件分布式共享存储系统最主要的目标是对应用程序不做修改或稍加修改就可以在该系统上运行,并能取得令人满意的性能。但为了维护共享数据的一致性和通信的透明性而引入的系统开销使得现有的很多系统难以达到这个目标。本文着重研究如何提高软件分布式共享存储系统的性能,分别从高速缓存一致性协议、存储器组织方式、系统开销、循环调度、任务迁移和通信优化六个方面提出了相应的优化技术。

Chapter 1 Introduction 1

1.1 Basic Idea of Software DSM 2

1.1 Basic idea of software DSM 3

1.2 Illustration of simple software DSM system 5

1.2 Memory Consistency Model 7

1.3 Cache Coherence Protocol 9

1.4 Application Programming Interface 10

1.5 Memory Organization 11

1.6 Implementation Method 13

1.6.1 Implementation Levels 13

1.6.2 Granularity of the System 15

1.7 Some Representative software DSMs 16

1.1 Some Representative Software DSM Systems 17

1.8 Recent Progress on Software DSM and Open Questions 20

1.8.1 Software DSM-oriented Application Research 20

1.8.2 Fine-grain vs. Coarse-grain Software DSM Systems 22

1.8.3 Hardware Support for Software DSM System 24

I.8.4 More Relaxed Memory Consistency Model 25

1.8.5 SMP-Based Hierarchical Software DSM System 27

1.9 Summarv of Dissertation Contributions 29

1.10 Organization of the Dissertation 33

2.1 Cache Coherence Protocol 35

Chapter 2 Lock-Based Cache Coherence Protocol 35

2.1.1 Write-Invalidate vs. Write-Update 36

2.1.2 Multiple Writer Protocol 37

2.1.3 Delayed Propagation Protocol 38

2.1 Write merging in eager release consistency 39

2.2 Comparison of communication amount in eager and lazy RC 40

2.2 Snoopy Protocols 40

2.3 Directory-Based Protocols 42

2.3.1 Full Bit Vector Directory 43

2.3.2 Limited Pointer Directory 43

2.3.3 Linked List Directory 44

2.3.4 Probowner Directory 45

2.4 Lock-Based Cache Coherence Protocol 46

2.4.1 Design Consideration 46

2.4.2 Supporting Scope Consistency 48

2.4.3 The Basic Protocol 50

2.4.4 Correctness of the Protoeol 51

2.1 Some Notations 52

2.3 State transition digram of the lock-based cache protocol 54

2.4.5 Advantages and Disadvantages 54

2.2 Message Costs of Shared Memory Operations 55

2.3 Comparison of Different Coherence Protocols 56

2.5 Summary 57

3.1 Introduction 58

Chapter 3 JIAJIA Software DSM System 58

3.2 Memory Organization 59

3.1 Memory organization of CC-NUMA 60

3.2 Memory organization of COMA 61

3.3 Memory organization of JIAJIA 63

3.4 Memory Allocation Example 65

3.4 Programming Interface 66

3.3 Lock-Based Cache Coherence Protocol 66

3.5 Implementation 68

3.5 Flow chart of threads creating procedure jiacreat() 69

3.5.1 Starting Multiple Processes 69

3.5.2 Shared Memory Management 70

3.6 Flow chart of memory allocation jia_alloc(size) 72

3.5.3 Synchronization 75

3.7 Examples of nested critical sections 78

3.5.4 Communication 82

3.8 Communication between two processors 85

3.6 Performance Evaluation and Analysis 88

3.5.5 Deadlock Free of Communication Scheme 88

3.6.1 Applications 89

3.6.2 Performance of JIAJIA and CVM 91

3.1 Characteristics of Benchmarks and Execution Results 92

3.2 Eight-Processor Execution Statistics 93

3.6.3 Confidence-Interval Based Summarizing Technique 96

3.9 (?-μ)/?follows a t(n-1)distribution 98

3.6.4 Paired Confidence Interval Method 99

3.6.5 Real World Application:Em3d 102

3.4 Execution Time,Scaled Speedup(S8)for Problem Scale 120×60×208 103

3.3 Execution Time,Fixed Speedup(Sf)and Memory Requirement for Different Scales 103

3.6.6 Scalability of JIAJIA 105

3.10 Speedups of 8 applications under 2,4,8,16 processors 106

3.7 Summary 107

Chapter 4 System Overhead Analysis and Reducing 108

4.1 Introduction 108

4 2 Analysis of Software DSM System Overhead 109

4.1 (a)General prototype of software DSM system.(b)Basic communication framework of JIAJIA 110

4.2 Time partition of SIGSEGV handler and synchronization operation 112

4 3 Performance Measurement and Analysis 114

4.3.1 Experiment Platform 114

4.1 Description of Time Statistical Variables 114

4.3.2 Overview of Applications 115

4.2 Characteristics of Applications 116

4.3.3 Analysis 116

4.3 (a)Speedups of applications on 8 processors.(b)Time statistics of applications 117

4.3 Breakdown of Execution Time of These Applications 118

4.3.4 The CPU Efiect 121

4.4 (a)Comparison of speedups of fast CPU and slow CPU.(b)Effects of CPU speed to system overhead 123

4.4.1 Reducing False Sharing 124

4.4 Reducing System Overhead 124

4.4.2 Reducing Write Detection Overhead 126

4.4.3 Tree Structured Propagation of Barrier Messages 127

4.4.4 Performance Evaluation and Analysis 128

4.4 Characteristics of the Benchmarks 129

4.5 Eight-Way Parallel Execution Results 131

4.5 Breakdown of execution time 134

4.5 Summary 140

5.1 Background 142

Chapter 5 Affinity-Based Self Scheduling 142

5.2 Related Work 145

5.2.1 Static Scheduling(Static) 146

5.2.2 Self Scheduling(SS) 146

5.2.3 Block Self Scheduling(BSS) 147

5.2.4 Guided Self Scheduling(GSS) 147

5.2.5 Factoring Scheduling(FS) 148

5.2.6 Trapezoid Self Scheduling(TSS) 149

5.2.7 Affinity Scheduling(AFS) 149

5.2.8 Safe Self Scheduling(SSS) 150

5.1 Chunk Sizes of Different Scheduling Schemes 151

5.2.9 Adaptive Affinity Scheduling(AAFS) 151

5.1 Basic framework of application 155

5.3 Design and Implementation of ABS 155

5.3.1 Target System 155

5.3.2 Affinity-Based Self Scheduling Algorithm 155

5.2 The Number of Messages and Synchronization Operations Associated with Loop Allocation 157

5.4 Analytic Evaluation 158

5.3 Description of the Symbols 159

5.5.2 Application Description 163

5.5.1 Experiment Platform 163

5.5 Experiment Platforn and Performance Evaluation 163

5.5.3 Performance Evaluation and Analysis 166

5.4 The Effects of Locality and Load Imbalance(unit:second) 167

5.2 Execution time of different scheduling schemes in dedicated environment:(a)SOR.(b)JI.(c)TC.(d)MAT.(e)AC 169

5.5 The Number of Synchronization Operations of Different Scheduling Algorithms in Dedicated Environment 170

5.6 The Number of Getpages of Different Scheduling Algorithms in Dedicated Environment 170

5.7 System Overhead of Different Scheduling Algorithms in Dedicated Environment(Ⅰ)(second) 171

5.8 System Overhead of Different Scheduling Algorithms in Dedicated Environment(Ⅱ) 172

5.3 Execution time of different scheduling schemes in metacomputing environment:(a)SOR.(b)JI.(c)TC.(d)MAT.(e)AC 176

5.10 The Number of Getpages of Different Scheduling Algorithms in Metacomputing Environment 177

5.9 The Number of Synchronization Operations of Different Sc-heduling Algorithms in Metacomputing Environment 177

5.11 System Overhead of Different Scheduling Algorithms in Metacomputing Environment(Ⅰ) 178

5.12 System Overhead of Different Scheduling Algorithms in Metacomputing Environment(Ⅱ) 179

5.4 Execution time with different chunk size under ABS scheduling scheme in metacomputing environment:(a)SOR.(b)JI.(c)TC.(d)MAT.(e)AC 184

5.14 The Number of Getpages with Different Chunk Sizes in Metacomputing Environment 185

5.13 The Number of Synchronization operations with Different Chunk Sizes in Metacomputing Environment 185

5.15 System Overhead of Different Chunk Sizes in Metacomputing Environment 186

5.6 Summary 188

6.1 Introduction 189

Chapter 6 Dynamic Task Migration Scheme 189

6.2 Rationale of Dynamic Task Migration 191

6.1 Basic framework of dynamic task migration scheme 192

6.3.1 Computation Migration 193

6.3 Implementation 193

6.1 Definition of the Symbols 194

6.3.2 Data Migration 195

6.4 Home Migration 196

6.5.2 Applications 199

6.5.1 Experiment Platform 199

6.5 Experimental Results and Analysis 199

6.5.3 Performance Evaluation and Analysis 200

6.2 Performance comparison:(a)execution time.(b)system overhead 201

6.2 System Overheads in Unbalanced Environment 202

6.3 System Overheads in Unbalanced Environment with Task Migration 202

6.6 Related Work 203

6.7 Summary 205

Chapter 7 Communication Optimization for Home-Based Software DSMs 207

7.1 Introduction 208

7.2 Key Issues of ULN 209

7.2.1 Communication Model 210

7.2.2 Data Transfer 211

7.2.4 Address Translation 212

7.2.3 Protection 212

7.2.5 Message Pipelining 213

7.2.6 Arrival Notification 213

7.2.7 Reliability 214

7.1 Comparison of different communication substrate:(a)unreliable.(b)reliable 214

7.2.8 Multicast 215

7.3 Communication Requirements of Software DSMs 215

7.4 Design of JMCL 218

7.2 Interface description of JMCL 218

7.1 Descriptions of JMCL Applications Programming Interface 219

7.4.1 JMCL API 219

7.4.2 Message Flow of JMCL 221

7.4 Message transfer flow in UDP/IP 222

7.3 Message transfer flow in JMCL 222

7.5 Current State and Future Work 224

7.6 Conclusion 225

Chapter 8 Conclusions and Future Directions 226

8.1 Conclusions 226

8.2 Future of Software DSM 230

Bibliography 232