1.Beyond Relational Databases 1
What’s Wrong with Relational Databases? 1
A Quick Review of Relational Databases 5
RDBMSs:The Awesome and the Not-So-Much 5
Web Scale 12
The Rise of NoSQL 13
Summary 15
2.Introducing Cassandra 17
The Cassandra Elevator Pitch 17
Cassandra in 50 Words or Less 17
Distributed and Decentralized 18
Elastic Scalability 19
High Availability and Fault Tolerance 19
Tuneable Consistency 20
Brewer’s CAP Theorem 23
Row-Oriented 27
High Performance 28
Where Did Cassandra Come From? 28
Release History 30
Is Cassandra a Good Fit for My Project? 35
Large Deployments 35
Lots of Writes,Statistics,and Analysis 36
Geographical Distribution 36
Evolving Applications 36
Getting Involved 36
Summary 38
3.Installing Cassandra 39
Installing the Apache Distribution 39
Extracting the Download 39
What’s In There? 40
Building from Source 41
Additional Build Targets 43
Running Cassandra 43
On Windows 44
On Linux 45
Starting the Server 45
Stopping Cassandra 47
Other Cassandra Distributions 48
Running the CQL Shell 49
Basic cqlsh Commands 50
cqlsh Help 50
Describing the Environment in cqlsh 51
Creating a Keyspace and Table in cqlsh 52
Writing and Reading Data in cqlsh 55
Summary 56
4.The Cassandra Query Language 57
The Relational Data Model 57
Cassandra’s Data Model 58
Clusters 61
Keyspaces 61
Tables 61
Columns 63
CQL Types 65
Numeric Data Types 66
Textual Data Types 67
Time and Identity Data Types 67
Other Simple Data Types 69
Collections 70
User-Defined Types 73
Secondary Indexes 76
Summary 78
5.Data Modeling 79
Conceptual Data Modeling 79
RDBMS Design 80
Design Differences Between RDBMS and Cassandra 81
Defining Application Queries 84
Logical Data Modeling 85
Hotel Logical Data Model 87
Reservation Logical Data Model 89
Physical Data Modeling 91
Hotel Physical Data Model 92
Reservation Physical Data Model 93
Materialized Views 94
Evaluating and Refining 96
Calculating Partition Size 96
Calculating Size on Disk 98
Breaking Up Large Partitions 99
Defining Database Schema 100
DataStax DevCenter 102
Summary 103
6.The Cassandra Architecture 105
Data Centers and Racks 105
Gossip and Failure Detection 106
Snitches 108
Rings and Tokens 109
Virtual Nodes 110
Partitioners 111
Replication Strategies 112
Consistency Levels 113
Queries and Coordinator Nodes 114
Memtables,SSTables,and Commit Logs 115
Caching 117
Hinted Handoff 117
Lightweight Transactions and Paxos 118
Tombstones 120
Bloom Filters 120
Compaction 121
Anti-Entropy,Repair,and Merkle Trees 122
Staged Event-Driven Architecture(SEDA) 124
Managers and Services 125
Cassandra Daemon 125
Storage Engine 126
Storage Service 126
Storage Proxy 126
Messaging Service 127
Stream Manager 127
CQL Native Transport Server 127
System Keyspaces 128
Summary 130
7.Configuring Cassandra 131
Cassandra Cluster Manager 131
Creating a Cluster 132
Seed Nodes 135
Partitioners 136
Murmur3 Partitioner 136
Random Partitioner 137
Order-Preserving Partitioner 137
ByteOrderedPartitioner 137
Snitches 138
Simple Snitch 138
Property File Snitch 138
Gossiping Property File Snitch 139
Rack Inferring Snitch 139
Cloud Snitches 140
Dynamic Snitch 140
Node Configuration 140
Tokens and Virtual Nodes 141
Network Interfaces 142
Data Storage 143
Startup and JVM Settings 144
Adding Nodes to a Cluster 144
Dynamic Ring Participation 146
Replication Strategies 147
SimpleStrategy 147
NetworkTopologyStrategy 148
Changing the Replication Factor 150
Summary 150
8.Clients 151
Hector,Astyanax,and Other Legacy Clients 151
DataStax Java Driver 152
Development Environment Configuration 152
Clusters and Contact Points 153
Sessions and Connection Pooling 155
Statements 156
Policies 164
Metadata 167
Debugging and Monitoring 171
DataStax Python Driver 172
DataStax Node.js Driver 173
DataStax Ruby Driver 174
DataStax C#Driver 175
DataStax C/C++Driver 176
DataStax PHP Driver 177
Summary 178
9.Reading and Writing Data 179
Writing 179
Write Consistency Levels 180
The Cassandra Write Path 181
Writing Files to Disk 183
Lightweight Transactions 185
Batches 188
Reading 190
Read Consistency Levels 191
The Cassandra Read Path 192
Read Repair 195
Range Queries,Ordering and Filtering 195
Functions and Aggregates 198
Paging 202
Speculative Retry 205
Deleting 205
Summary 206
10.Monitoring 207
Logging 207
Tailing 209
Examining Log Files 210
Monitoring Cassandra with JMX 211
Connecting to Cassandra via JConsole 213
Overview of MBeans 216
Cassandra’s MBeans 219
Database MBeans 222
Networking MBeans 226
Metrics MBeans 227
Threading MBeans 228
Service MBeans 228
Security MBeans 228
Monitoring with nodetool 229
Getting Cluster Information 230
Getting Statistics 232
Summary 234
11.Maintenance 235
Health Check 235
Basic Maintenance 236
Flush 236
Cleanup 237
Repair 238
Rebuilding Indexes 242
Moving Tokens 243
Adding Nodes 243
Adding Nodes to an Existing Data Center 243
Adding a Data Center to a Cluster 244
Handling Node Failure 246
Repairing Nodes 246
Replacing Nodes 247
Removing Nodes 248
Upgrading Cassandra 251
Backup and Recovery 252
Taking a Snapshot 253
Clearing a Snapshot 255
Enabling Incremental Backup 255
Restoring from Snapshot 255
SSTable Utilities 256
Maintenance Tools 257
DataStax OpsCenter 257
Netflix Priam 260
Summary 260
12.Performance Tuning 261
Managing Performance 261
Setting Performance Goals 261
Monitoring Performance 262
Analyzing Performance Issues 264
Tracing 265
Tuning Methodology 268
Caching 268
Key Cache 269
Row Cache 269
Counter Cache 270
Saved Cache Settings 270
Memtables 271
Commit Logs 272
SSTables 273
Hinted Handoff 274
Compaction 275
Concurrency and Threading 278
Networking and Timeouts 279
JVM Settings 280
Memory 281
Garbage Collection 281
Using cassandra-stress 283
Summary 286
13.Security 287
Authentication and Authorization 289
Password Authenticator 289
Using CassandraAuthorizer 292
Role-Based Access Control 293
Encryption 294
SSL,TLS,and Certificates 295
Node-to-Node Encryption 296
Client-to-Node Encryption 298
JMX Security 299
Securing JMX Access 299
Security MBeans 301
Summary 301
14.Deploying and Integrating 303
Planning a Cluster Deployment 303
Sizing Your Cluster 303
Selecting Instances 305
Storage 306
Network 307
Cloud Deployment 308
Amazon Web Services 308
Microsoft Azure 310
Google Cloud Platform 311
Integrations 312
Apache Lucene,SOLR,and Elasticsearch 312
Apache Hadoop 312
Apache Spark 313
Summary 319
Index 321