《Hbase权威指南 英文》PDF下载

  • 购买积分:16 如何计算积分?
  • 作  者:(比)乔治著
  • 出 版 社:南京:东南大学出版社
  • 出版年份:2012
  • ISBN:9787564133924
  • 页数:524 页
图书介绍:如果你正在寻找一种具备可伸缩性的存储解决方案来适应几乎没有穷尽的数据的话,这本书将可以向你表明Apache HBase完全能够满足你的需求。作为Google BigTable架构的开源实现,HBase能够支持数以十亿计的记录数和数以百万计的字段,与此同时它还能够保证常量级的读写性能。很多IT管理层正在质疑HBase。而这本书提供了很多有意义的答案,无论你是否正在评估这种非关系型数据库或者正试图立刻把它付诸实践。

1. Introduction 1

The Dawn of Big Data 1

The Problem with Relational Database Systems 5

Nonrelational Database Systems, Not-Only SQL or NoSQL? 8

Dimensions 10

Scalability 12

Database (De-)Normalization 13

Building Blocks 16

Backdrop 16

Tables, Rows, Columns, and Cells 17

Auto-Sharding 22

Storage API 23

Implementation 24

Summary 27

HBase: The Hadoop Database 28

History 28

Nomenclature 29

Summary 30

2. Installntion 31

Quick-Start Guide 31

Requirements 34

Hardware 34

Software 40

Filesystems for HBase 52

Local 54

HDFS 54

S3 54

Other Filesystems 55

Installation Choices 55

Apache Binary Release 55

Building from Source 58

Run Modes 58

Standalone Mode 59

Distributed Mode 59

Configuration 63

hbase-site.xml and hbase-default.xml 64

hbase-env.sh 65

regionserver 65

log4j.properties 65

Example Configuration 65

Client Configuration 67

Deployment 68

Script-Based 68

Apache Whirr 69

Puppet and Chef 70

Operating a Cluster 71

Running and ConfirmingYour Installation 71

Web-based UI Introduction 71

Shell Introduction 73

Stopping the Cluster 73

3. Client API: The Basics 75

General Notes 75

CRUD Operations 76

Put Method 76

Get Method 95

Delete Method 105

Batch Operations 114

Row Locks 118

Scans 122

Introduction 122

The ResultScanner Class 124

Caching Versus Batching 127

Miscellaneous Features 133

The HTable Utility Methods 133

The Bytes Class 134

4. Client APl:Advanced Features 137

Filters 137

Introduction to Filters 137

Comparison Filters 140

Dedicated Filters 147

Decorating Filters 155

FilterList 159

Custom Filters 160

Filters Summary 167

Counters 168

Introduction to Counters 168

Single Counters 171

Multiple Counters 172

Coprocessors 175

Introduction to Coprocessors 175

The Coprocessor Class 176

Coprocessor Loading 179

The RegionObserver Class 182

The MasterObserver Class 190

Endpoints 193

HTablePool 199

Connection Handling 203

5. Client API: Administrative Features 207

Schema Definition 207

Tables 207

Table Properties 210

Column Families 212

HBaseAdmin 218

Basic Operations 219

Table Operations 220

Schema Operations 228

Cluster Operations 230

Cluster Status Information 233

6. Available Clients 241

Introduction to REST, Thrift, and Avro 241

Interactive Clients 244

Native Java 244

REST 244

Thrift 251

Avro 255

Other Clients 256

Batch Clients 257

MapReduce 257

Hive 258

Pig 263

Cascading 267

Shell 268

Basics 269

Commands 271

Scripting 274

Web-based UI 277

Master UI 277

Region Server UI 283

Shared Pages 283

7. MapReduce Integration 289

Framework 289

MapReduce Introduction 289

Classes 290

Supporting Classes 293

MapReduce Locality 293

Table Splits 294

MapReduce over HBase 295

Preparation 295

Data Sink 301

Data Source 306

Data Source and Sink 308

Custom Processing 311

8. Architecture 315

Seek Versus Transfer 315

B+ Trees 315

Log-Structured Merge-Trees 316

Storage 319

Overview 319

Write Path 320

Files 321

HFile Format 329

KeyValue Format 333

Write-Ahead Log 333

Overview 334

HLog Class 335

HLogKey Class 336

WALEdit Class 336

LogSyncer Class 337

LogRoller Class 338

Replay 338

Durability 341

Read Path 342

Region Lookups 345

The Region Life Cycle 348

ZooKeeper 348

Replication 351

Life of a Log Edit 352

Internals 353

9. Advanced Usage 357

Key Design 357

Concepts 357

Tall-Narrow Versus Flat-Wide Tables 359

Partial Key Scans 360

Pagination 362

Time Series Data 363

Time-Ordered Relations 367

Advanced Schemas 369

Secondary Indexes 370

Search Integration 374

Transactions 377

Bloom Filters 377

Versioning 381

Implicit Versioning 381

Custom Versioning 384

10. Cluster Monitoring 387

Introduction 387

The Metrics Framework 388

Contexts, Records, and Metrics 389

Master Metrics 394

Region Server Metrics 394

RPC Metrics 396

JVM Metrics 397

Info Metrics 399

Ganglia 400

Installation 401

Usage 405

JMX 408

JConsole 410

JMX Remote API 413

Nagios 417

11. Performance Tuning 419

Garbage Collection Tuning 419

Memstore-Local Allocation Buffer 422

Compression 424

Available Codecs 424

Verifying Installation 426

Enabling Compression 427

Optimizing Splits and Compactions 429

Managed Splitting 429

Region Hotspotting 430

Presplitting Regions 430

Load Balancing 432

Merging Regions 433

Client API: Best Practices 434

Configuration 436

Load Tests 439

Performance Evaluation 439

YCSB 440

12. Cluster Administration 445

Operational Tasks 445

Node Decommissioning 445

Rolling Restarts 447

Adding Servers 447

Data Tasks 452

Import and Export Tools 452

CopyTable Tool 457

Bulk Import 459

Replication 462

Additional Tasks 464

Coexisting Clusters 464

Required Ports 466

Changing Logging Levels 466

Troubleshooting 467

HBase Fsck 467

Analyzing the Logs 469

Common Issues 471

A. HBase Configuration Properties 475

B. Road Map 489

C. Upgrade from Previous Releases 491

D. Distributions 493

E. Hush SQL Schema 495

F. HBaseVersus Bigtable 497

Index 501