1 SEEINGYOUR LIFE IN DATA&by Nathan Yau 1
Personal Environmental Impact Report(PEIR) 2
your.flowingdata(YFD) 3
Personal Data Collection 3
Data Storage 5
Data Processing 6
Data Visualization 7
The Point 14
How to Participate 15
2 THE BEAUTIFUL PEOPLE:KEEPING USERS IN MIND WHEN DESIGNING DATA COLLECTION METHODS&by Jonathan Follett and Matthew Holm 17
Introduction:User Empathy Is the New Black 17
The Project:Surveying Customers About a New Luxury P roduct 19
Specific Challenges to Data Collection 19
Designing Our Solution 21
Results and Reflection 31
3 EMBEDDED IMAGE DATA PROCESSING ON MARS&by J.M.Hughes 35
Abstract 35
Introduction 35
Some Background 37
To Pack or Not to Pack 40
The Three Tasks 42
Slotting the Images 43
Passing the Image:Communication Among the Three Tasks 46
Getting the Picture:Image Download and Processing 48
Image Compression 50
Downlink,or,It's All Downhill from Here 52
Conclusion 52
4 CLOUD STORAGE DESIGN IN A PNUTSHELL&by Brian F.Cooper,Raghu Ramakrishnan,and Utkarsh Srivastava 55
Introduction 55
Updating Data 57
Complex Queries 64
Comparison with Other Systems 68
Conclusion 71
5 INFORMATION PLATFORMS AND THE RISE OF THE DATA SCIENTIST&by Jeff Hammerbacher 73
Libraries and Brains 73
Facebook Becomes Self-Aware 74
A Business Intelligence System 75
The Death and Rebirth of a Data Warehouse 77
Beyond the Data Warehouse 78
The Cheetah and the Elephant 79
The Unreasonable Effectiveness of Data 80
New Tools and Applied Research 81
MAD Skills and Cosmos 82
Information Platforms As Dataspaces 83
The Data Scientist 83
Conclusion 84
6 THE GEOGRAPHIC BEAUTY OF A PHOTOGRAPHIC ARCHIVE&by Jason Dykes and Jo Wood 85
Beauty in Data:Geograph 86
Visualization,Beauty,and Treemaps 89
A Geographic Perspective on Geograph Term Use 91
Beauty in Discovery 98
Reflection and Conclusion 101
7 DATA FINDS DATA&by Jeff Jonas and Lisa Sokol 105
Introduction 105
The Benefits of Just-in-Time Discovery 106
Corruption at the Roulette Wheel 107
Enterprise Discoverability 111
Federated Search Ain't All That 111
Directories:Priceless 113
Relevance:What Matters and to Whom? 115
Components and Special Considerations 115
Privacy Considerations 118
Conclusion 118
8 PORTABLE DATA IN REAL TIME&by Jud Valeski 119
Introduction 119
The State of the Art 120
Social Data Normalization 128
Conclusion:Mediation via GniP 131
9 SURFACING THE DEEP WEB&by Alon Halevy and Jayant Madhaven 133
What Is the Deep Web? 133
Alternatives to Offering Deep-Web Access 135
Conclusion and Future Work 147
10 BUILDING RADIOHEAD'S HOUSE OF CARDS&by Aaron Koblin with Valdean Klump 149
How It All Started 149
The Data Capture Equipment 150
The Advantages of Two Data Capture Systems 154
The Data 154
Capturing the Data,aka"The Shoot" 155
Processing the Data 160
Post-Processing the Data 160
Launching the Video 161
Conclusion 164
11 VISUALIZING URBAN DATA&by Michal Migurski 167
Introduction 167
Background 168
Cracking the Nut 169
Making It Public 174
Revisiting 178
Conclusion 181
12 THE DESIGN OF SENSE.US&by Jeffrey Heer 183
Visualization and Social Data Analysis 184
Data 186
Visualization 188
Collaboration 194
Voyagers and Voyeurs 199
Conclusion 203
13 WHAT DATA DOESN'T DO&by Coco Krumme 205
When Doesn't Data Drive? 208
Conclusion 217
14 NATURAL LANGUAGE CORPUS DATA&by Peter Norvig 219
Word Segmentation 221
Secret Codes 228
Spelling Correction 234
Other Tasks 239
Discussion and Conclusion 240
15 LIFE IN DATA:THE STORY OF DNA&by Matt Wood and Ben Blackburne 243
DNA As a Data Store 243
DNA As a Data Source 250
Fighting the Data Deluge 253
The Future of DNA 257
16 BEAUTIFYING DATA IN THE REAL WORLD&by Jean-Claude Bradley,Rajarshi Guha,Andrew Lang,Pierre Lindenbaum,Cameron Neylon,Antony Williams,and Egon Willighagen 259
The Problem with Real Data 259
Providing the Raw Data Back to the Notebook 260
Validating Crowdsourced Data 262
Representing the Data Online 263
Closing the Loop:Visualizations to Suggest New Experiments 271
Building a Data Web from Open Data and Free Services 274
17 SUPERFICIAL DATA ANALYSIS:EXPLORING MILLIONS OF SOCIAL STEREOTYPES&by Brendan O'Connor and Lukas Biewald 279
Introduction 279
Preprocessing the Data 280
Exploring the Data 282
Age,Attractiveness,and Gender 285
Looking at Tags 290
Which Words Are Gendered? 294
Clustering 295
Conclusion 300
18 BAY AREA BLUES:THE EFFECT OF THE HOUSING CRISIS&by Hadley Wickham,Deborah F.Swayne,and David Poole 303
Introduction 303
How Did We Get the Data? 304
Geocoding 305
Data Checking 305
Analysis 306
The Influence of Inflation 307
The Rich Get Richer and the Poor Get Poorer 308
Geographic Differences 311
Census Information 314
Exploring San Francisco 318
Conclusion 319
19 BEAUTIFUL POLITICAL DATA&by Andrew Gelman,Jonathan P.Kastellec,and Yair Ghitza 323
Example 1:Redistricting and Parti san Bias 324
Example 2:Time Series of Estimates 326
Example 3:Age and Voting 328
Example 4:Public Opinion and Senate Voting on Supreme Court Nominees 328
Example 5:Localized Partisanship in Pennsylvania 330
Conclusion 332
20 CONNECTING DATA&by Toby Segaran 335
What Public Data Is There,Really? 336
The Possibilities of Connected Data 337
Within Companies 338
Impediments to Connecting Data 339
Possible Solutions 343
Conclusion 348
CONTRIBUTORS 349
INDEX 357