《数据库系统实现 英文》PDF下载

  • 购买积分:29 如何计算积分?
  • 作  者:(美)HectorGarcia-Molina,JenniferWidom,JeffreyD.Ullman编著
  • 出 版 社:北京:机械工业出版社
  • 出版年份:2010
  • ISBN:9787111288602
  • 页数:1184 页
图书介绍:本书中对数据库系统实现原理进行了深入阐述,并具体讨论了数据库管理系统的三个主要成分—存储管理器、查询处理器和事务管理器的实现技术。书中还对信息集成的最新技术,例如数据仓库、OLAP、数据挖掘、Mediator、数据立方体系统等进行了介绍。

1 The Worlds of Database Systems 1

1.1 The Evolution of Database Systems 1

1.1.1 Early Database Management Systems 2

1.1.2 Relational Database Systems 3

1.1.3 Smaller and Smaller Systems 3

1.1.4 Bigger and Bigger Systems 4

1.1.5 Information Integration 4

1.2 Overview of a Database Management System 5

1.2.1 Data-Definition Language Commands 5

1.2.2 Overview of Query Processing 5

1.2.3 Storage and Buffer Management 7

1.2.4 Transaction Processing 8

1.2.5 The Query Processor 9

1.3 Outline of Database-System Studies 10

1.4 References for Chapter 1 12

Ⅰ Relational Database Modeling 15

2 The Relational Model of Data 17

2.1 An Overview of Data Models 17

2.1.1 What is a Data Model? 17

2.1.2 Important Data Models 18

2.1.3 The Relational Model in Brief 18

2.1.4 The Semistructured Model in Brief 19

2.1.5 Other Data Models 20

2.1.6 Comparison of Modeling Approaches 21

2.2 Basics of the Relational Model 21

2.2.1 Attributes 22

2.2.2 Schemas 22

2.2.3 Tuples 22

2.2.4 Domains 23

2.2.5 Equivalent Representations of a Relation 23

2.2.6 Relation Instances 24

2.2.7 Keys of Relations 25

2.2.8 An Example Database Schema 26

2.2.9 Exercises for Section 2.2 28

2.3 Defining a Relation Schema in SQL 29

2.3.1 Relations in SQL 29

2.3.2 Data Types 30

2.3.3 Simple Table Declarations 31

2.3.4 Modifying Relation Schemas 33

2.3.5 Default Values 34

2.3.6 Declaring Keys 34

2.3.7 Exercises for Section 2.3 36

2.4 An Algebraic Query Language 38

2.4.1 Why Do We Need a Special Query Language? 38

2.4.2 What is an Algebra? 38

2.4.3 Overview of Relational Algebra 39

2.4.4 Set Operations on Relations 39

2.4.5 Projection 41

2.4.6 Selection 42

2.4.7 Cartesian Product 43

2.4.8 Natural Joins 43

2.4.9 Theta-Joins 45

2.4.10 Combining Operations to Form Queries 47

2.4.11 Naming and Renaming 49

2.4.12 Relationships Among Operations 50

2.4.13 A Linear Notation for Algebraic Expressions 51

2.4.14 Exercises for Section 2.4 52

2.5 Constraints on Relations 58

2.5.1 Relational Algebra as a Constraint Language 59

2.5.2 Referential Integrity Constraints 59

2.5.3 Key Constraints 60

2.5.4 Additional Constraint Examples 61

2.5.5 Exercises for Section 2.5 62

2.6 Summary of Chapter 2 63

2.7 References for Chapter 2 65

3 Design Theory for Relational Databases 67

3.1 Functional Dependencies 67

3.1.1 Definition of Functional Dependency 68

3.1.2 Keys of Relations 70

3.1.3 Superkeys 71

3.1.4 Exercises for Section 3.1 71

3.2 Rules About Functional Dependencies 72

3.2.1 Reasoning About Functional Dependencies 72

3.2.2 The Splitting/Combining Rule 73

3.2.3 Trivial Functional Dependencies 74

3.2.4 Computing the Closure of Attributes 75

3.2.5 Why the Closure Algorithm Works 77

3.2.6 The Transitive Rule 79

3.2.7 Closing Sets of Functional Dependencies 80

3.2.8 Projecting Functional Dependencies 81

3.2.9 Exercises for Section 3.2 83

3.3 Design of Relational Database Schemas 85

3.3.1 Anomalies 86

3.3.2 Decomposing Relations 86

3.3.3 Boyce-Codd Normal Form 88

3.3.4 Decomposition into BCNF 89

3.3.5 Exercises for Section 3.3 92

3.4 Decomposition:The Good,Bad,and Ugly 93

3.4.1 Recovering Information from a Decomposition 94

3.4.2 The Chase Test for Lossless Join 96

3.4.3 Why the Chase Works 99

3.4.4 Dependency Preservation 100

3.4.5 Exercises for Section 3.4 102

3.5 Third Normal Form 102

3.5.1 Definition of Third Normal Form 102

3.5.2 The Synthesis Algorithm for 3NF Schemas 103

3.5.3 Why the 3NF Synthesis Algorithm Works 104

3.5.4 Exercises for Section 3.5 105

3.6 Multivalued Dependencies 105

3.6.1 Attribute Independence and Its Consequent Redundancy 106

3.6.2 Definition of Multivalued Dependencies 107

3.6.3 Reasoning About Multivalued Dependencies 108

3.6.4 Fourth Normal Form 110

3.6.5 Decomposition into Fourth Normal Form 111

3.6.6 Relationships Among Normal Forms 113

3.6.7 Exercises for Section 3.6 113

3.7 An Algorithm for Discovering MVD's 115

3.7.1 The Closure and the Chase 115

3.7.2 Extending the Chase to MVD's 116

3.7.3 Why the Chase Works for MVD's 118

3.7.4 Projecting MVD's 119

3.7.5 Exercises for Section 3.7 120

3.8 Summary of Chapter 3 121

3.9 References for Chapter 3 122

4 High-Level Database Models 125

4.1 The Entity/Relationship Model 126

4.1.1 Entity Sets 126

4.1.2 Attributes 126

4.1.3 Relationships 127

4.1.4 Entity-Relationship Diagrams 127

4.1.5 Instances of an E/R Diagram 128

4.1.6 Multiplicity of Binary E/R Relationships 129

4.1.7 Multiway Relationships 130

4.1.8 Roles in Relationships 131

4.1.9 Attributes on Relationships 134

4.1.10 Converting Multiway Relationships to Binary 134

4.1.11 Subclasses in the E/R Model 135

4.1.12 Exercises for Section 4.1 138

4.2 Design Principles 140

4.2.1 Faithfulness 140

4.2.2 Avoiding Redundancy 141

4.2.3 Simplicity Counts 142

4.2.4 Choosing the Right Relationships 142

4.2.5 Picking the Right Kind of Element 144

4.2.6 Exercises for Section 4.2 145

4.3 Constraints in the E/R Model 148

4.3.1 Keys in the E/R Model 148

4.3.2 Representing Keys in the E/R Model 149

4.3.3 Referential Integrity 150

4.3.4 Degree Constraints 151

4.3.5 Exercises for Section 4.3 151

4.4 weak Entity Sets 152

4.4.1 Causes of Weak Entity Sets 152

4.4.2 Requirements for Weak Entity Sets 153

4.4.3 Weak Entity Set Notation 155

4.4.4 Exercises for Section 4.4 156

4.5 From E/R Diagrams to Relational Designs 157

4.5.1 From Entity Sets to Relations 157

4.5.2 From E/R Relationships to Relations 158

4.5.3 Combining Relations 160

4.5.4 Handling Weak Entity Sets 161

4.5.5 Exercises for Section 4.5 163

4.6 Converting Subclass Structures to Relations 165

4.6.1 E/R-Style Conversion 166

4.6.2 An Object-Oriented Approach 167

4.6.3 Using Null Values to Combine Relations 168

4.6.4 Comparison of Approaches 169

4.6.5 Exercises for Section 4.6 171

4.7 Unified Modeling Language 171

4.7.1 UML Classes 172

4.7.2 Keys for UML classes 173

4.7.3 Associations 173

4.7.4 Self-Associations 175

4.7.5 Association Classes 175

4.7.6 Subclasses in UML 176

4.7.7 Aggregations and Compositions 177

4.7.8 Exercises for Section 4.7 179

4.8 From UML Diagrams to Relations 179

4.8.1 UML-to-Relations Basics 179

4.8.2 From UML Subclasses to Relations 180

4.8.3 From Aggregations and Compositions to Relations 181

4.8.4 The UML Analog of Weak Entity Sets 181

4.8.5 Exercises for Section 4.8 183

4.9 Object Definition Language 183

4.9.1 Class Declarations 184

4.9.2 Attributes in ODL 184

4.9.3 Relationships in ODL 185

4.9.4 Inverse Relationships 186

4.9.5 Multiplicity of Relationships 186

4.9.6 Types in ODL 188

4.9.7 Subclasses in ODL 190

4.9.8 Declaring Keys in ODL 191

4.9.9 Exercises for Section 4.9 192

4.10 From ODL Designs to Relational Designs 193

4.10.1 From ODL Classes to Relations 193

4.10.2 Complex Attributes in Classes 194

4.10.3 Representing Set-Valued Attributes 195

4.10.4 Representing Other Type Constructors 196

4.10.5 Representing ODL Relationships 198

4.10.6 Exercises for Section 4.10 198

4.11 Summary of Chapter 4 200

4.12 References for Chapter 4 202

Ⅱ Relational Database Programming 203

5 Algebraic and Logical Query Languages 205

5.1 Relational Operations on Bags 205

5.1.1 Why Bags? 206

5.1.2 Union,Intersection,and Difference of Bags 207

5.1.3 Projection of Bags 208

5.1.4 Selection on Bags 209

5.1.5 Product of Bags 210

5.1.6 Joins of Bags 210

5.1.7 Exercises for Section 5.1 212

5.2 Extended Operators of Relational Algebra 213

5.2.1 Duplicate Elimination 214

5.2.2 Aggregation Operators 214

5.2.3 Grouping 215

5.2.4 The Grouping Operator 216

5.2.5 Extending the Projection Operator 217

5.2.6 The Sorting Operator 219

5.2.7 Outerjoins 219

5.2.8 Exercises for Section 5.2 222

5.3 A Logic for Relations 222

5.3.1 Predicates and Atoms 223

5.3.2 Arithmetic Atoms 223

5.3.3 Datalog Rules and Queries 224

5.3.4 Meaning of Datalog Rules 225

5.3.5 Extensional and Intensional Predicates 228

5.3.6 Datalog Rules Applied to Bags 228

5.3.7 Exercises for Section 5.3 230

5.4 Relational Algebra and Datalog 230

5.4.1 Boolean Operations 231

5.4.2 Projection 232

5.4.3 Selection 232

5.4.4 Product 235

5.4.5 Joins 235

5.4.6 Simulating Multiple Operations with Datalog 236

5.4.7 Comparison Between Datalog and Relational Algebra 238

5.4.8 Exercises for Section 5.4 238

5.5 Summary of Chapter 5 240

5.6 References for Chapter 5 241

6 The Database Language SQL 243

6.1 Simple Queries in SQL 244

6.1.1 Projection in SQL 246

6.1.2 Selection in SQL 248

6.1.3 Comparison of Strings 250

6.1.4 Pattern Matching in SQL 250

6.1.5 Dates and Times 251

6.1.6 Null Values and Comparisons Involving NULL 252

6.1.7 The Truth-Value UNKNOWN 253

6.1.8 Ordering the Output 255

6.1.9 Exercises for Section 6.1 256

6.2 Queries Involving More Than One Relation 258

6.2.1 Products and Joins in SQL 259

6.2.2 Disambiguating Attributes 260

6.2.3 Tuple Variables 261

6.2.4 Interpreting Multirelation Queries 262

6.2.5 Union,Intersection,and Difference of Queries 265

6.2.6 Exercises for Section 6.2 267

6.3 Subqueries 268

6.3.1 Subqueries that Produce Scalar Values 269

6.3.2 Conditions Involving Relations 270

6.3.3 Conditions Involving Tuples 271

6.3.4 Correlated Subqueries 273

6.3.5 Subqueries in FROM Clauses 274

6.3.6 SQL Join Expressions 275

6.3.7 Natural Joins 276

6.3.8 Outerjoins 277

6.3.9 Exercises for Section 6.3 279

6.4 Full-Relation Operations 281

6.4.1 Eliminating Duplicates 281

6.4.2 Duplicates in Unions,Intersections,and Differences 282

6.4.3 Grouping and Aggregation in SQL 283

6.4.4 Aggregation Operators 284

6.4.5 Grouping 285

6.4.6 Grouping,Aggregation,and Nulls 287

6.4.7 HAVING Clauses 288

6.4.8 Exercises for Section 6.4 289

6.5 Database Modifications 291

6.5.1 Insertion 291

6.5.2 Deletion 292

6.5.3 Updates 294

6.5.4 Exercises for Section 6.5 295

6.6 Transactions in SQL 296

6.6.1 Serializability 296

6.6.2 Atomicity 298

6.6.3 Transactions 299

6.6.4 Read-Only Transactions 300

6.6.5 Dirty Reads 302

6.6.6 Other Isolation Levels 304

6.6.7 Exercises for Section 6.6 306

6.7 Summary of Chapter 6 307

6.8 References for Chapter 6 308

7 Constraints and Triggers 311

7.1 Keys and Foreign Keys 311

7.1.1 Declaring Foreign-Key Constraints 312

7.1.2 Maintaining Referential Integrity 313

7.1.3 Deferred Checking of Constraints 315

7.1.4 Exercises for Section 7.1 318

7.2 Constraints on Attributes and Tuples 319

7.2.1 Not-Null Constraints 319

7.2.2 Attribute-Based CHECK Constraints 320

7.2.3 Tuple-Based CHECK Constraints 321

7.2.4 Comparison of Tuple-and Attribute-Based Constraints 323

7.2.5 Exercises for Section 7.2 323

7.3 Modification of Constraints 325

7.3.1 Giving Names to Constraints 325

7.3.2 Altering Constraints on Tables 326

7.3.3 Exercises for Section 7.3 327

7.4 Assertions 328

7.4.1 Creating Assertions 328

7.4.2 Using Assertions 329

7.4.3 Exercises for Section 7.4 330

7.5 Triggers 332

7.5.1 Triggers in SQL 332

7.5.2 The Options for Trigger Design 334

7.5.3 Exercises for Section 7.5 337

7.6 Summary of Chapter 7 339

7.7 References for Chapter 7 339

8 Views and Indexes 341

8.1 Virtual Views 341

8.1.1 Declaring Views 341

8.1.2 Querying Views 343

8.1.3 Renaming Attributes 343

8.1.4 Exercises for Section 8.1 344

8.2 Modifying Views 344

8.2.1 View Removal 345

8.2.2 Updatable Views 345

8.2.3 Instead-Of Triggers on Views 347

8.2.4 Exercises for Section 8.2 349

8.3 Indexes in SQL 350

8.3.1 Motivation for Indexes 350

8.3.2 Declaring Indexes 351

8.3.3 Exercises for Section 8.3 352

8.4 Selection of Indexes 352

8.4.1 A Simple Cost Model 352

8.4.2 Some Useful Indexes 353

8.4.3 Calculating the Best Indexes to Create 355

8.4.4 Automatic Selection of Indexes to Create 357

8.4.5 Exercises for Section 8.4 359

8.5 Materialized Views 359

8.5.1 Maintaining a Materialized View 360

8.5.2 Periodic Maintenance of Materialized Views 362

8.5.3 Rewriting Queries to Use Materialized Views 362

8.5.4 Automatic Creation of Materialized Views 364

8.5.5 Exercises for Section 8.5 365

8.6 Summary of Chapter 8 366

8.7 References for Chapter 8 367

9 SQL in a Server Environment 369

9.1 The Three-Tier Architecture 369

9.1.1 The Web-Server Tier 370

9.1.2 The Application Tier 371

9.1.3 The Database Tier 372

9.2 The SQL Environment 372

9.2.1 Environments 373

9.2.2 Schemas 374

9.2.3 Catalogs 375

9.2.4 Clients and Servers in the SQL Environment 375

9.2.5 Connections 376

9.2.6 Sessions 377

9.2.7 Modules 378

9.3 The SQL/Host-Language Interface 378

9.3.1 The Impedance Mismatch Problem 380

9.3.2 Connecting SQL to the Host Language 380

9.3.3 The DECLARE Section 381

9.3.4 Using Shared Variables 382

9.3.5 Single-Row Select Statements 383

9.3.6 Cursors 383

9.3.7 Modifications by Cursor 386

9.3.8 Protecting Against Concurrent Updates 387

9.3.9 Dynamic SQL 388

9.3.10 Exercises for Section 9.3 390

9.4 Stored Procedures 391

9.4.1 Creating PSM Functions and Procedures 391

9.4.2 Some Simple Statement Forms in PSM 392

9.4.3 Branching Statements 394

9.4.4 Queries in PSM 395

9.4.5 Loops in PSM 396

9.4.6 For-Loops 398

9.4.7 Exceptions in PSM 400

9.4.8 Using PSM Functions and Procedures 402

9.4.9 Exercises for Section 9.4 402

9.5 Using a call-Level Interface 404

9.5.1 Introduction to SQL/CLI 405

9.5.2 Processing Statements 407

9.5.3 Fetching Data From a Query Result 408

9.5.4 Passing Parameters to Queries 410

9.5.5 Exercises for Section 9.5 412

9.6 JDBC 412

9.6.1 Introduction to JDBC 412

9.6.2 Creating Statements in JDBC 413

9.6.3 Cursor Operations in JDBC 415

9.6.4 Parameter Passing 416

9.6.5 Exercises for Section 9.6 416

9.7 PHP 416

9.7.1 PHP Basics 417

9.7.2 Arrays 418

9.7.3 The PEAR DB Library 419

9.7.4 Creating a Database Connection Using DB 419

9.7.5 Executing SQL Statements 419

9.7.6 Cursor Operations in PHP 420

9.7.7 Dynamic SQL in PHP 421

9.7.8 Exercises for Section 9.7 422

9.8 Summary of Chapter 9 422

9.9 References for Chapter 9 423

10 Advanced Topics in Relational Databases 425

10.1 Security and User Authorization in SQL 425

10.1.1 Privileges 426

10.1.2 Creating Privileges 427

10.1.3 The Privilege-Checking Process 428

10.1.4 Granting Privileges 430

10.1.5 Grant Diagrams 431

10.1.6 Revoking Privileges 433

10.1.7 Exercises for Section 10.1 436

10.2 Recursion in SQL 437

10.2.1 Defining Recursive Relations in SQL 437

10.2.2 Problematic Expressions in Recursive SQL 440

10.2.3 Exercises for Section 10.2 443

10.3 The Object-Relational Model 445

10.3.1 From Relations to Object-Relations 445

10.3.2 Nested Relations 446

10.3.3 References 447

10.3.4 Object-Oriented Versus Object-Relational 449

10.3.5 Exercises for Section 10.3 450

10.4 User-Defined Types in SQL 451

10.4.1 Defining Types in SQL 451

10.4.2 Method Declarations in UDT's 452

10.4.3 Method Definitions 453

10.4.4 Declaring Relations with a UDT 454

10.4.5 References 454

10.4.6 Creating Object ID's for Tables 455

10.4.7 Exercises for Section 10.4 457

10.5 Operations on Object-Relational Data 457

10.5.1 Following References 457

10.5.2 Accessing Components of Tuples with a UDT 458

10.5.3 Generator and Mutator Functions 460

10.5.4 Ordering Relationships on UDT's 461

10.5.5 Exercises for Section 10.5 463

10.6 On-Line Analytic Processing 464

10.6.1 OLAP and Data Warehouses 465

10.6.2 OLAP Applications 465

10.6.3 A Multidimensional View of OLAP Data 466

10.6.4 Star Schemas 467

10.6.5 Slicing and Dicing 469

10.6.6 Exercises for Section 10.6 472

10.7 Data Cubes 473

10.7.1 The Cube Operator 473

10.7.2 The Cube Operator in SQL 475

10.7.3 Exercises for Section 10.7 477

10.8 Summary of Chapter 10 478

10.9 References for Chapter 10 480

Ⅲ Modeling and Programming for semistructured Data 481

11 The Semistructured-Data Model 483

11.1 Semistructured Data 483

11.1.1 Motivation for the Semistructured-Data Model 483

11.1.2 Semistructured Data Representation 484

11.1.3 Information Integration Via Semistructured Data 486

11.1.4 Exercises for Section 11.1 487

11.2 XML 488

11.2.1 Semantic Tags 488

11.2.2 XML With and Without a Schema 489

11.2.3 Well-Formed XML 489

11.2.4 Attributes 490

11.2.5 Attributes That Connect Elements 491

11.2.6 Namespaces 493

11.2.7 XML and Databases 493

11.2.8 Exercises for Section 11.2 495

11.3 Document Type Definitions 495

11.3.1 The Form of a DTD 495

11.3.2 Using a DTD 499

11.3.3 Attribute Lists 499

11.3.4 Identifiers and References 500

11.3.5 Exercises for Section 11.3 502

11.4 XML Schema 502

11.4.1 The Form of an XML Schema 502

11.4.2 Elements 503

11.4.3 Complex Types 504

11.4.4 Attributes 506

11.4.5 Restricted Simple Types 507

11.4.6 Keys in XML Schema 509

11.4.7 Foreign Keys in XML Schema 510

11.4.8 Exercises for Section 11.4 512

11.5 Summary of Chapter 11 514

11.6 References for Chapter 11 515

12 Programming Languages for XML 517

12.1 XPath 517

12.1.1 The XPath Data Model 518

12.1.2 Document Nodes 519

12.1.3 Path Expressions 519

12.1.4 Relative Path Expressions 521

12.1.5 Attributes in Path Expressions 521

12.1.6 Axes 521

12.1.7 Context of Expressions 522

12.1.8 Wildcards 523

12.1.9 Conditions in Path Expressions 523

12.1.10 Exercises for Section 12.1 526

12.2 XQuery 528

12.2.1 XQuery Basics 530

12.2.2 FLWR Expressions 530

12.2.3 Replacement of Variables by Their Values 534

12.2.4 Joins in XQuery 536

12.2.5 XQuery Comparison Operators 537

12.2.6 Elimination of Duplicates 538

12.2.7 Quantification in XQuery 539

12.2.8 Aggregations 540

12.2.9 Branching in XQuery Expressions 540

12.2.10 Ordering the Result of a Query 541

12.2.11 Exercises for Section 12.2 543

12.3 Extensible Stylesheet Language 544

12.3.1 XSLT Basics 544

12.3.2 Templates 544

12.3.3 Obtaining Values From XML Data 545

12.3.4 Recursive Use of Templates 546

12.3.5 Iteration in XSLT 549

12.3.6 Conditionals in XSLT 551

12.3.7 Exercises for Section 12.3 551

12.4 Summary of Chapter 12 553

12.5 References for Chapter 12 554

Ⅳ Database System Implementation 555

13 Secondary Storage Management 557

13.1 The Memory Hierarchy 557

13.1.1 The Memory Hierarchy 557

13.1.2 Transfer of Data Between Levels 560

13.1.3 Volatile and Nonvolatile Storage 560

13.1.4 Virtual Memory 560

13.1.5 Exercises for Section 13.1 561

13.2 Disks 562

13.2.1 Mechanics of Disks 562

13.2.2 The Disk Controller 564

13.2.3 Disk Access Characteristics 564

13.2.4 Exercises for Section 13.2 567

13.3 Accelerating Access to Secondary Storage 568

13.3.1 The I/O Model of Computation 568

13.3.2 Organizing Data by Cylinders 569

13.3.3 Using Multiple Disks 570

13.3.4 Mirroring Disks 571

13.3.5 Disk Scheduling and the Elevator Algorithm 571

13.3.6 Prefetching and Large-Scale Buffering 573

13.3.7 Exercises for Section 13.3 573

13.4 Disk Failures 575

13.4.1 Intermittent Failures 576

13.4.2 Checksums 576

13.4.3 Stable Storage 577

13.4.4 Error-Handling Capabilities of Stable Storage 578

13.4.5 Recovery from Disk Crashes 578

13.4.6 Mirroring as a Redundancy Technique 579

13.4.7 Parity Blocks 580

13.4.8 An Improvement:RAID 5 583

13.4.9 Coping With Multiple Disk Crashes 584

13.4.10 Exercises for Section 13.4 587

13.5 Arranging Data on Disk 590

13.5.1 Fixed-Length Records 590

13.5.2 Packing Fixed-Length Records into Blocks 592

13.5.3 Exercises for Section 13.5 593

13.6 Representing Block and Record Addresses 593

13.6.1 Addresses in Client-Server Systems 593

13.6.2 Logical and Structured Addresses 595

13.6.3 Pointer Swizzling 596

13.6.4 Returning Blocks to Disk 600

13.6.5 Pinned Records and Blocks 600

13.6.6 Exercises for Section 13.6 602

13.7 Variable-Length Data and Records 603

13.7.1 Records With Variable-Length Fields 604

13.7.2 Records With Repeating Fields 605

13.7.3 Variable-Format Records 607

13.7.4 Records That Do Not Fit in a Block 608

13.7.5 BLOBs 608

13.7.6 Column Stores 609

13.7.7 Exercises for Section 13.7 610

13.8 Record Modifications 612

13.8.1 Insertion 612

13.8.2 Deletion 614

13.8.3 Update 615

13.8.4 Exercises for Section 13.8 615

13.9 Summary of Chapter 13 615

13.10 References for Chapter 13 617

14 Index Structures 619

14.1 Index-Structure Basics 620

14.1.1 Sequential Files 621

14.1.2 Dense Indexes 621

14.1.3 Sparse Indexes 622

14.1.4 Multiple Levels of Index 623

14.1.5 Secondary Indexes 624

14.1.6 Applications of Secondary Indexes 625

14.1.7 Indirection in Secondary Indexes 626

14.1.8 Document Retrieval and Inverted Indexes 628

14.1.9 Exercises for Section 14.1 631

14.2 B-Trees 633

14.2.1 The Structure of B-trees 634

14.2.2 Applications of B-trees 637

14.2.3 Lookup in B-Trees 639

14.2.4 Range Queries 639

14.2.5 Insertion Into B-Trees 640

14.2.6 Deletion From B-Trees 642

14.2.7 Efficiency of B-Trees 645

14.2.8 Exercises for Section 14.2 646

14.3 Hash Tables 648

14.3.1 Secondary-Storage Hash Tables 649

14.3.2 Insertion Into a Hash Table 649

14.3.3 Hash-Table Deletion 650

14.3.4 Efficiency of Hash Table Indexes 651

14.3.5 Extensible Hash Tables 652

14.3.6 Insertion Into Extensible Hash Tables 653

14.3.7 Linear Hash Tables 655

14.3.8 Insertion Into Linear Hash Tables 657

14.3.9 Exercises for Section 14.3 659

14.4 Multidimensional Indexes 661

14.4.1 Applications of Multidimensional Indexes 661

14.4.2 Executing Range Queries Using Conventional Indexes 663

14.4.3 Executing Nearest-Neighbor Queries Using Conventional Indexes 664

14.4.4 Overview of Multidimensional Index Structures 664

14.5 Hash Structures for Multidimensional Data 665

14.5.1 Grid Files 665

14.5.2 Lookup in a Grid File 666

14.5.3 Insertion Into Grid Files 667

14.5.4 Performance of Grid Files 669

14.5.5 Partitioned Hash Functions 671

14.5.6 Comparison of Grid Files and Partitioned Hashing 673

14.5.7 Exercises for Section 14.5 673

14.6 Tree Structures for Multidimensional Data 675

14.6.1 Multiple-Key Indexes 675

14.6.2 Performance of Multiple-Key Indexes 676

14.6.3 kd-Trees 677

14.6.4 Operations on kd-Trees 679

14.6.5 Adapting kd-Trees to Secondary Storage 681

14.6.6 Quad Trees 681

14.6.7 R-Trees 683

14.6.8 Operations on R-Trees 684

14.6.9 Exercises for Section 14.6 686

14.7 Bitmap Indexes 688

14.7.1 Motivation for Bitmap Indexes 689

14.7.2 Compressed Bitmaps 691

14.7.3 Operating on Run-Length-Encoded Bit-Vectors 693

14.7.4 Managing Bitmap Indexes 693

14.7.5 Exercises for Section 14.7 695

14.8 Summary of Chapter 14 695

14.9 References for Chapter 14 697

15 Query Execution 701

15.1 Introduction to Physical-Query-Plan Operators 703

15.1.1 Scanning Tables 703

15.1.2 Sorting While Scanning Tables 704

15.1.3 The Computation Model for Physical Operators 704

15.1.4 Parameters for Measuring Costs 705

15.1.5 I/O Cost for Scan Operators 706

15.1.6 Iterators for Implementation of Physical Operators 707

15.2 One-Pass Algorithms 709

15.2.1 Ohe-Pass Algorithms for Tuple-at-a-Time Operations 711

15.2.2 One-Pass Algorithms for Unary,Full-Relation Operations 712

15.2.3 One-Pass Algorithms for Binary Operations 715

15.2.4 Exercises for Section 15.2 718

15.3 Nested-Loop Joins 718

15.3.1 Tuple-Based Nested-Loop Join 719

15.3.2 An Iterator for Tuple-Based Nested-Loop Join 719

15.3.3 Block-Based Nested-Loop Join Algorithm 719

15.3.4 Analysis of Nested-Loop Join 721

15.3.5 Summary of Algorithms so Far 722

15.3.6 Exercises for Section 15.3 722

15.4 Two-Pass Algorithms Based on Sorting 723

15.4.1 Two-Phase,Multiway Merge-Sort 723

15.4.2 Duplicate Elimination Using Sorting 725

15.4.3 Grouping and Aggregation Using Sorting 726

15.4.4 A Sort-Based Union Algorithm 726

15.4.5 Sort-Based Intersection and Diffefence 727

15.4.6 A Simple Sort-Based Join Algorithm 728

15.4.7 Analysis of Simple Sort-Join 729

15.4.8 A More Efficient Sort-Based Join 729

15.4.9 Summary of Sort-Based Algorithms 730

15.4.10 Exercises for Section 15.4 730

15.5 Two-Pass Algorithms Based on Hashing 732

15.5.1 Partitioning Relations by Hashing 732

15.5.2 A Hash-Based Algorithm for Duplicate Elimination 732

15.5.3 Hash-Based Grouping and Aggregation 733

15.5.4 Hash-Based Union,Intersection,and Difference 734

15.5.5 The Hash-Join Algorithm 734

15.5.6 Saving Some Disk I/O's 735

15.5.7 Summary of Hash-Based Algorithms 737

15.5.8 Exercises for Section 15.5 738

15.6 Index-Based Algorithms 739

15.6.1 Clustering and Nonclustering Indexes 739

15.6.2 Index-Based Selection 740

15.6.3 Joining by Using an Index 742

15.6.4 Joins Using a Sorted Index 743

15.6.5 Exercises for Section 15.6 745

15.7 Buffer Management 746

15.7.1 Buffer Management Architecture 746

15.7.2 Buffer Management Strategies 747

15.7.3 The Relationship Between Physical Operator Selection and Buffer Management 750

15.7.4 Exercises for Section 15.7 751

15.8 Algorithms Using More Than Two Passes 752

15.8.1 Multipass Sort-Based Algorithms 752

15.8.2 Performance of Multipass,Sort-Based Algorithms 753

15.8.3 Multipass Hash-Based Algorithms 754

15.8.4 Performance of Multipass Hash-Based Algorithms 754

15.8.5 Exercises for Section 15.8 755

15.9 Summary of Chapter 15 756

15.10 References for Chapter 15 757

16 The Query Compiler 759

16.1 Parsing and Preprocessing 760

16.1.1 Syntax Analysis and Parse Trees 760

16.1.2 A Grammar for a Simple Subset of SQL 761

16.1.3 The Preprocessor 764

16.1.4 Preprocessing Queries Involving Views 765

16.1.5 Exercises for Section 16.1 767

16.2 Algebraic Laws for Improving Query Plans 768

16.2.1 Commutative and Associative Laws 768

16.2.2 Laws Involving Selection 770

16.2.3 Pushing Selections 772

16.2.4 Laws Involving Projection 774

16.2.5 Laws About Joins and Products 776

16.2.6 Laws Involving Duplicate Elimination 777

16.2.7 Laws Involving Grouping and Aggregation 777

16.2.8 Exercises for Section 16.2 780

16.3 From Parse Trees to Logical Query Plans 781

16.3.1 Conversion to Relational Algebra 782

16.3.2 Removing Subqueries From Conditions 783

16.3.3 Improving the Logical Query Plan 788

16.3.4 Grouping Associative/Commutative Operators 790

16.3.5 Exercises for Section 16.3 791

16.4 Estimating the Cost of Operations 792

16.4.1 Estimating Sizes of Intermediate Relations 793

16.4.2 Estimating the Size of a Projection 794

16.4.3 Estimating the Size of a Selection 794

16.4.4 Estimating the Size of a Join 797

16.4.5 Natural Joins With Multiple Join Attributes 799

16.4.6 Joins of Many Relations 800

16.4.7 Estimating Sizes for Other Operations 801

16.4.8 Exercises for Section 16.4 802

16.5 Introduction to Cost-Based Plan Selection 803

16.5.1 Obtaining Estimates for Size Parameters 804

16.5.2 Computation of Statistics 807

16.5.3 Heuristics for Reducing the Cost of Logical Query Plans 808

16.5.4 Approaches to Enumerating Physical Plans 810

16.5.5 Exercises for Section 16.5 813

16.6 Choosing an Order for Joins 814

16.6.1 Significance of Left and Right Join Arguments 815

16.6.2 Join Trees 815

16.6.3 Left-Deep Join Trees 816

16.6.4 Dynamic Programming to Select a Join Order and Grouping 819

16.6.5 Dynamic Programming With More Detailed Cost Functions 823

16.6.6 A Greedy Algorithm for Selecting a Join Order 824

16.6.7 Exercises for Section 16.6 825

16.7 Completing the Physical-Query-Plan 826

16.7.1 Choosing a Selection Method 827

16.7.2 Choosing a Join Method 829

16.7.3 Pipelining Versus Materialization 830

16.7 4 Pipelining Unary Operations 830

16.7.5 Pipelining Binary Operations 830

16.7.6 Notation for Physical Query Plans 834

16.7.7 Ordering of Physical Operations 837

16.7.8 Exercises for Section 16.7 838

16.8 Summary of Chapter 16 839

16.9 References for Chapter 16 841

17 Coping With System Failures 843

17.1 Issues and Models for Resilient Operation 843

17.1.1 Failure Modes 844

17.1.2 More About Transactions 845

17.1.3 Correct Execution of Transactions 846

17.1.4 The Primitive Operations of Transactions 848

17.1.5 Exercises for Section 17.1 851

17.2 Undo Logging 851

17.2.1 Log Records 851

17.2.2 The Undo-Logging Rules 853

17.2.3 Recovery Using Undo Logging 855

17.2.4 Checkpointing 857

17.2.5 Nonquiescent Checkpointing 858

17.2.6 Exercises for Section 17.2 862

17.3 Redo Logging 863

17.3.1 The Redo-Logging Rule 863

17.3.2 Recovery With Redo Logging 864

17.3.3 Checkpointing a Redo Log 866

17.3.4 Recovery With a Checkpointed Redo Log 867

17.3.5 Exercises for Section 17.3 868

17.4 Undo/Redo Logging 869

17.4.1 The Undo/Redo Rules 870

17.4.2 Recovery With Undo/Redo Logging 870

17.4.3 Checkpointing an Undo/Redo Log 872

17.4.4 Exercises for Section 17.4 874

17.5 Protecting Against Media Failures 875

17.5.1 The Archive 875

17.5.2 Nonquiescent Archiving 875

17.5.3 Recovery Using an Archive and Log 878

17.5.4 Exercises for Section 17.5 879

17.6 Summary of Chapter 17 879

17.7 References for Chapter 17 881

18 Concurrency Control 883

18.1 Serial and Serializable Schedules 884

18.1.1 Schedules 884

18.1.2 Serial Schedules 885

18.1.3 Serializable Schedules 886

18.1.4 The Effect of Transaction Semantics 887

18.1.5 A Notation for Transactions and Schedules 889

18.1.6 Exercises for Section 18.1 889

18.2 Conflict-Serializability 890

18.2.1 Conflicts 890

18.2.2 Precedence Graphs and a Test for Conflict-Serializability 892

18.2.3 Why the Precedence-Graph Test Works 894

18.2.4 Exercises for Section 18.2 895

18.3 Enforcing Serializability by Locks 897

18.3.1 Locks 898

18.3.2 The Locking Scheduler 900

18.3.3 Two-Phase Locking 900

18.3.4 Why Two-Phase Locking Works 901

18.3.5 Exercises for Section 18.3 903

18.4 Locking Systems With Several Lock Modes 905

18.4.1 Shared and Exclusive Locks 905

18.4.2 Compatibility Matrices 907

18.4.3 Upgrading Locks 908

18.4.4 Update Locks 909

18.4.5 Increment Locks 911

18.4.6 Exercises for Section 18.4 913

18.5 An Architecture for a Locking Scheduler 915

18.5.1 A Scheduler That Inserts Lock Actions 915

18.5.2 The Lock Table 918

18.5.3 Exercises for Section 18.5 921

18.6 Hierarchies of Database Elements 921

18.6.1 Locks With Multiple Granularity 921

18.6.2 Warning Locks 922

18.6.3 Phantoms and Handling Insertions Correctly 926

18.6.4 Exercises for Section 18.6 927

18.7 The Tree Protocol 927

18.7.1 Motivation for Tree-Based Locking 927

18.7.2 Rules for Access to Tree-Structured Data 928

18.7.3 Why the Tree Protocol Works 929

18.7.4 Exercises for Section 18.7 932

18.8 Concurrency Control by Timestamps 933

18.8.1 Timestamps 934

18.8.2 Physically Unrealizable Behaviors 934

18.8.3 Problems With Dirty Data 935

18.8.4 The Rules for Timestamp-Based Scheduling 937

18.8.5 Multiversion Timestamps 939

18.8.6 Timestamps Versus Locking 941

18.8.7 Exercises for Section 18.8 942

18.9 Concurrency Control by Validation 942

18.9.1 Architecture of a Validation-Based Scheduler 942

18.9.2 The Validation Rules 943

18.9.3 Comparison of Three Concurrency-Control Mechanisms 946

18.9.4 Exercises for Section 18.9 948

18.10 Summary of Chapter 18 948

18.11 References for Chapter 18 950

19 More About Transaction Management 953

19.1 Serializability and Recoverability 953

19.1.1 The Dirty-Data Problem 954

19.1.2 Cascading Rollback 955

19.1.3 Recoverable Schedules 956

19.1.4 Schedules That Avoid Cascading Rollback 957

19.1.5 Managing Rollbacks Using Locking 957

19.1.6 Group Commit 959

19.1.7 Logical Logging 960

19.1.8 Recovery From Logical Logs 963

19.1.9 Exercises for Section 19.1 965

19.2 Deadlocks 966

19.2.1 Deadlock Detection by Timeout 967

19.2.2 The Waits-For Graph 967

19.2.3 Deadlock Prevention by Ordering Elements 970

19.2.4 Detecting Deadlocks by Timestamps 970

19.2.5 Comparison of Deadlock-Management Methods 972

19.2.6 Exercises for Section 19.2 974

19.3 Long-Duration Transactions 975

19.3.1 Problems of Long Transactions 976

19.3.2 Sagas 978

19.3.3 Compensating Transactions 979

19.3.4 Why Compensating Transactions Work 980

19.3.5 Exercises for Section 19.3 981

19.4 Summary of Chapter 19 982

19.5 References for Chapter 19 983

20 Parallel and Distributed Databases 985

20.1 Parallel Algorithms on Relations 985

20.1.1 Models of Parallelism 986

20.1.2 Tuple-at-a-Time Operations in Parallel 989

20.1.3 Parallel Algorithms for Full-Relation Operations 989

20.1.4 Performance of Parallel Algorithms 990

20.1.5 Exercises for Section 20.1 993

20.2 The Map-Reduce Parallelism Framework 993

20.2.1 The Storage Model 993

20.2.2 The Map Function 994

20.2.3 The Reduce Function 995

20.2.4 Exercises for Section 20.2 996

20.3 Distributed Databases 997

20.3.1 Distribution of Data 997

20.3.2 Distributed Transactions 998

20.3.3 Data Replication 999

20.3.4 Exercises for Section 20.3 1000

20.4 Distributed Query Processing 1000

20.4.1 The Distributed Join Problem 1000

20.4.2 Semijoin Reductions 1001

20.4.3 Joins of Many Relations 1002

20.4.4 Acyclic Hypergraphs 1003

20.4.5 Full Reducers for Acyclic Hypergraphs 1005

20.4.6 Why the Full-Reducer Algorithm Works 1006

20.4.7 Exercises for Section 20.4 1007

20.5 Distributed Commit 1008

20.5.1 Supporting Distributed Atomicity 1008

20.5.2 Two-Phase Commit 1009

20.5.3 Recovery of Distributed Transactions 1011

20.5.4 Exercises for Section 20.5 1013

20.6 Distributed Locking 1014

20.6.1 Centralized Lock Systems 1015

20.6.2 A Cost Model for Distributed Locking Algorithms 1015

20.6.3 Locking Replicated Elements 1016

20.6.4 Primary-Copy Locking 1017

20.6.5 Global Locks From Local Locks 1017

20.6.6 Exercises for Section 20.6 1019

20.7 Peer-to-Peer Distributed Search 1020

20.7.1 Peer-to-Peer Networks 1020

20.7.2 The Distributed-Hashing Problem 1021

20.7.3 Centralized Solutions for Distributed Hashing 1022

20.7.4 Chord Circles 1022

20.7.5 Links in Chord Circles 1024

20.7.6 Search Using Finger Tables 1024

20.7.7 Adding New Nodes 1027

20.7.8 When a Peer Leaves the Network 1030

20.7.9 When a Peer Fails 1030

20.7.10 Exercises for Section 20.7 1031

20.8 Summary of Chapter 20 1031

20.9 References for Chapter 20 1033

Ⅴ Other Issues in Management of Massive Data 1035

21 Information Integration 1037

21.1 Introduction to Information Integration 1037

21.1.1 Why Information Integration? 1038

21.1.2 The Heterogeneity Problem 1040

21.2 Modes of Information Integration 1041

21.2.1 Federated Database Systems 1042

21.2.2 Data Warehouses 1043

21.2.3 Mediators 1046

21.2.4 Exercises for Section 21.2 1048

21.3 Wrappers in Mediator-Based Systems 1049

21.3.1 Templates for Query Patterns 1050

21.3.2 Wrapper Generators 1051

21.3.3 Filters 1052

21.3.4 Other Operations at the Wrapper 1053

21.3.5 Exercises for Section 21.3 1054

21.4 Capability-Based Optimization 1056

21.4.1 The Problem of Limited Source Capabilities 1056

21.4.2 A Notation for Describing Source Capabilities 1057

21.4.3 Capability-Based Query-Plan Selection 1058

21.4.4 Adding Cost-Based Optimization 1060

21.4.5 Exercises for Section 21.4 1060

21.5 Optimizing Mediator Queries 1061

21.5.1 Simplified Adornment Notation 1061

21.5.2 Obtaining Answers for Subgoals 1062

21.5.3 The Chain Algorithm 1063

21.5.4 Incorporating Union Views at the Mediator 1067

21.5.5 Exercises for Section 21.5 1068

21.6 Local-as-View Mediators 1069

21.6.1 Motivation for LAV Mediators 1069

21.6.2 Terminology for LAV Mediation 1070

21.6.3 Expanding Solutions 1071

21.6.4 Containment of Conjunctive Queries 1073

21.6.5 Why the Containment-Mapping Test Works 1075

21.6.6 Finding Solutions to a Mediator Query 1076

21.6.7 Why the LMSS Theorem Holds 1077

21.6.8 Exercises for Section 21.6 1078

21.7 Entity Resolution 1078

21.7.1 Deciding Whether Records Represent a Common Entity 1079

21.7.2 Merging Similar Records 1081

21.7.3 Useful Properties of Similarity and Merge Functions 1082

21.7.4 The R-Swoosh Algorithm for ICAR Records 1083

21.7 5 Why R-Swoosh Works 1086

21.7 6 Other Approaches to Entity Resolution 1086

21.7.7 Exercises for Section 21.7 1087

21.8 Summary of Chapter 21 1089

21.9 References for Chapter 21 1091

22 Data Mining 1093

22.1 Frequent-Itemset Mining 1093

22.1.1 The Market-Basket Model 1094

22.1.2 Basic Definitions 1095

22.1.3 Association Rules 1097

22.1.4 The Computation Model for Frequent Itemsets 1098

22.1.5 Exercises for Section 22.1 1099

22.2 Algorithms for Finding Frequent Itemsets 1100

22.2.1 The Distribution of Frequent Itemsets 1100

22.2.2 The Naive Algorithm for Finding Frequent Itemsets 1101

22.2.3 The A-Priori Algorithm 1102

22.2.4 Implementation of the A-Priori Algorithm 1104

22.2.5 Making Better Use of Main Memory 1105

22.2.6 When to Use the PCY Algorithm 1106

22.2.7 The Multistage Algorithm 1107

22.2.8 Exercises for Section 22.2 1109

22.3 Finding Similar Items 1110

22.3.1 The Jaccard Measure of Similarity 1110

22.3.2 Applications of Jaccard Similarity 1110

22.3.3 Minhashing 1112

22.3.4 Minhashing and Jaccard Distance 1113

22.3.5 Why Minhashing Works 1113

22.3.6 Implementing Minhashing 1114

22.3.7 Exercises for Section 22.3 1115

22.4 Locality-Sensitive Hashing 1116

22.4.1 Entity Resolution as an Example of LSH 1117

22.4.2 Locality-Sensitive Hashing of Signatures 1118

22.4.3 Combining Minhashing and Locality-Sensitive Hashing 1121

22.4.4 Exercises for Section 22.4 1122

22.5 Clustering of Large-Scale Data 1123

22.5.1 Applications of Clustering 1123

22.5.2 Distance Measures 1125

22.5.3 Agglomerative Clustering 1128

22.5.4 k-Means Algorithms 1130

22.5.5 k-Means for Large-Scale Data 1132

22.5.6 Processing a Memory Load of Points 1133

22.5.7 Exercises for Section 22.5 1136

22.6 Summary of Chapter 22 1137

22.7 References for Chapter 22 1139

23 Database Systems and the Internet 1141

23.1 The Architecture of a Search Engine 1141

23.1.1 Components of a Search Engine 1142

23.1.2 Web Crawlers 1143

23.1.3 Query Processing in Search Engines 1146

23.1.4 Ranking Pages 1146

23.2 PageRank for Identifying Important Pages 1147

23.2.1 The Intuition Behind PageRank 1147

23.2.2 Recursive Formulation of PageRank—First Try 1148

23.2.3 Spider Traps and Dead Ends 1150

23.2.4 PageRank Accounting for Spider Traps and Dead Ends 1153

23.2.5 Exercises for Section 23.2 1154

23.3 Topic-Specific PageRank 1156

23.3.1 Teleport Sets 1156

23.3.2 Calculating A Topic-Specific PageRank 1158

23.3.3 Link Spam 1159

23.3.4 Topic-Specific PageRank and Link Spam 1160

23.3.5 Exercises for Section 23.3 1161

23.4 Data Streams 1161

23.4.1 Data-Stream-Management Systems 1162

23.4.2 Stream Applications 1163

23.4.3 A Data-Stream Data Model 1164

23.4.4 Converting Streams Into Relations 1165

23.4.5 Converting Relations Into Streams 1166

23.4.6 Exercises for Section 23.4 1168

23.5 Data Mining of Streams 1169

23.5.1 Motivation 1169

23.5.2 Counting Bits 1171

23.5.3 Counting the Number of Distinct Elements 1175

23.5.4 Exercises for Section 23.5 1176

23.6 Summary of Chapter 23 1177

23.7 References for Chapter 23 1179