《Hadoop操作手册 英文》PDF下载

  • 购买积分:11 如何计算积分?
  • 作  者:ERICSAMMER著
  • 出 版 社:南京:东南大学出版社
  • 出版年份:2013
  • ISBN:9787564142582
  • 页数:283 页
图书介绍:如果你需要维护大型而且复杂的Hadoop集群的话,本书是绝对必需的。随着Hadoop变成数据中心里大规模数据处理的行业标准,操作手册方面的需求急剧增长。Eric Sammer,Cloudera公司的首席方案架构师,在本书中为你展示了产品级Hadoop的运行细节,从规划、安装和配置系统到提供可持续的维护管理。

1.Introduction 1

2.HDFS 7

Goals and Motivation 7

Design 8

Daemons 9

Reading and Writing Data 11

The Read Path 12

The Write Path 13

Managing Filesystem Metadata 14

Namenode High Availability 16

Namenode Federation 18

Access and Integration 20

Command-Line Tools 20

FUSE 23

REST Support 23

3.MapReduce 25

The Stages of MapReduce 26

Introducing Hadoop MapReduce 33

Daemons 34

When It All Goes Wrong 36

YARN 37

4.Planning a Hadoop Cluster 41

Picking a Distribution and Version of Hadoop 41

Apache Hadoop 41

Cloudera's Distribution Including Apache Hadoop 42

Versions and Features 42

What Should I Use? 44

Hardware Selection 45

Master Hardware Selection 46

Worker Hardware Selection 48

Cluster Sizing 50

Blades,SANs,and Virtualization 52

Operating System Selection and Preparation 54

Deployment Layout 54

Software 56

Hostnames,DNS,and Identification 57

Users,Groups,and Privileges 60

Kernel Tuning 62

vm.swappiness 62

vm.overcommit_memory 62

Disk Configuration 63

Choosing a Filesystem 64

Mount Options 66

Network Design 66

Network Usage in Hadoop:A Review 67

1 Gb versus 10 GbNetworks 69

Typical Network Topologies 69

5.Installation and Configuration 75

Installing Hadoop 75

Apache Hadoop 76

CDH 80

Configuration:An Overview 84

The Hadoop XML Configuration Files 87

Environment Variables and Shell Scripts 88

Logging Configuration 90

HDFS 93

Identification and Location 93

Optimization and Tuning 95

Formatting the Namenode 99

Creating a /tmp Directory 100

Namenode High Availability 100

Fencing Options 102

Basic Configuration 104

Automatic Failover Configuration 105

Format and Bootstrap the Namenodes 108

Namenode Federation 113

MapReduce 120

Identification and Location 120

Optimization and Tuning 122

Rack Topology 130

Security 133

6.Identity,Authentication,and Authorization 135

Identity 137

Kerberos and Hadoop 137

Kerberos:A Refresher 138

Kerberos Support in Hadoop 140

Authorization 153

HDFS 153

MapReduce 155

Other Tools and Systems 159

Tying It Together 164

7.ResourceManagement 167

What Is Resource Management? 167

HDFS Quotas 168

MapReduce Schedulers 170

The FIFO Scheduler 171

The Fair Scheduler 173

The Capacity Scheduler 185

The Future 193

8.Cluster Maintenance 195

Managing Hadoop Processes 195

Starting and Stopping Processes with I? Scripts 195

Starting and Stopping Processes Manually 196

HDFS Maintenance Tasks 196

Adding a Datanode 196

Decommissioning a Datanode 197

Checking Filesystem Integrity with fsck 198

Balancing HDFS Block Data 202

Dealing with a Failed Disk 204

MapReduce Maintenance Tasks 205

Adding a Tasktracker 205

Decommissioning a Tasktracker 206

Killing a MapReduce Job 206

Killing a MapReduce Task 207

Dealing with a Blacklisted Tasktracker 207

9.Troubleshooting 209

Differential Diagnosis Applied to Systems 209

Common Failures and Problems 211

Humans(You) 211

Misconfiguration 212

Hardware Failure 213

Resource Exhaustion 213

Host Identification and Naming 214

Network Partitions 214

"Is the Computer Plugged In?" 215

E-SPORE 215

Treatment and Care 217

War Stories 220

A Mystery Bottleneck 221

There's No Place Like 127.0.0.1 224

10.Monitoring 229

An Overview 229

Hadoop Metrics 230

Apache Hadoop 0.20.0 and CDH3(metrics 1) 231

Apache Hadoop 0.20.203 and Later,and CDH4(metrics 2) 237

What about SNMP? 239

Health Monitoring 239

Host-Level Checks 240

All Hadoop Processes 242

HDFS Checks 244

MapReduce Checks 246

11.Backupand Recovery 249

Data Backup 249

Distributed Copy(distcp) 250

Parallel Data Ingestion 252

Namenode Metadata 254

Appendix:Deprecated Configuration Properties 257

Index 267