Mining of massive datasetsPDF电子书下载
- 电子书积分:12 积分如何计算积分?
- 作 者:Anand Rajaraman ; Jeffrey D. Ullman
- 出 版 社:Cambridge University Press
- 出版年份:2012
- ISBN:1107015357
- 页数:316 页
1 Data Mining 1
1.1 What is Data Mining? 1
1.2 Statistical Limits on Data Mining 4
1.3 Things Useful to Know 7
1.4 Outline of the Book 15
1.5 Summary of Chapter 1 16
1.6 References for Chapter 1 17
2 Large-Scale File Systems and Map-Reduce 18
2.1 Distributed File Systems 18
2.2 Map-Reduce 21
2.3 Algorithms Using Map-Reduce 26
2.4 Extensions to Map-Reduce 37
2.5 Efficiency of Cluster-Computing Algorithms 42
2.6 Summary of Chapter 2 49
2.7 References for Chapter 2 51
3 Finding Similar Items 53
3.1 Applications of Near-Neighbor Search 53
3.2 Shingling of Documents 57
3.3 Similarity-Preserving Summaries of Sets 60
3.4 Locality-Sensitive Hashing for Documents 67
3.5 Distance Measures 71
3.6 The Theory of Locality-Sensitive Functions 77
3.7 LSH Families for Other Distance Measures 83
3.8 Applications of Locality-Sensitive Hashing 88
3.9 Methods for High Degrees of Similarity 96
3.10 Summary of Chapter 3 104
3.11 References for Chapter 3 106
4 Mining Data Streams 108
4.1 The Stream Data Model 108
4.2 Sampling Data in a Stream 112
4.3 Filtering Streams 115
4.4 Counting Distinct Elements in a Stream 118
4.5 Estimating Moments 122
4.6 Counting Ones in a Window 127
4.7 Decaying Windows 133
4.8 Summary of Chapter 4 136
4.9 References for Chapter 4 137
5 Link Analysis 139
5.1 PageRank 139
5.2 Efficient Computation of PageRank 153
5.3 Topic-Sensitive PageRank 159
5.4 Link Spam 163
5.5 Hubs and Authorities 167
5.6 Summary of Chapter 5 172
5.7 References for Chapter 5 175
6 Frequent Itemsets 176
6.1 The Market-Basket Model 176
6.2 Market Baskets and the A-Priori Algorithm 183
6.3 Handling Larger Datasets in Main Memory 192
6.4 Limited-Pass Algorithms 199
6.5 Counting Frequent Items in a Stream 205
6.6 Summary of Chapter 6 209
6.7 References for Chapter 6 211
7 Clustering 213
7.1 Introduction to Clustering Techniques 213
7.2 Hierarchical Clustering 217
7.3 K-means Algorithms 226
7.4 The CURE Algorithm 234
7.5 Clustering in Non-Euclidean Spaces 237
7.6 Clustering for Streams and Parallelism 241
7.7 Summary of Chapter 7 247
7.8 References for Chapter 7 250
8 Advertising on the Web 252
8.1 Issues in On-Line Advertising 252
8.2 On-Line Algorithms 255
8.3 The Matching Problem 258
8.4 The Adwords Problem 261
8.5 Adwords Implementation 270
8.6 Summary of Chapter 8 273
8.7 References for Chapter 8 275
9 Recommendation Systems 277
9.1 A Model for Recommendation Systems 277
9.2 Content-Based Recommendations 281
9.3 Collaborative Filtering 291
9.4 Dimensionality Reduction 297
9.5 The NetFlix Challenge 305
9.6 Summary of Chapter 9 306
9.7 References for Chapter 9 308
Index 310
- 《STUDIES IN SIXTEENTH-AND SEVENTEENTH-CENTURY ITALIAN SACRED MUSIC》JEFFREY KURTZMAN 2014
- 《欺骗的种子:揭发政府不想面对、企业不让你知道的基因改造灭种黑幕》史密斯(Jeffrey M. Smith)著;张木屯译 2012
- 《肩肘手外科学 骨科核心知识》Thomas E·Trumble,Jeffrey E·Budoff,Roger Cornwall 2009
- 《期货交割》(美)安妮·派克(Anne E. Peck),(美)杰弗利·威廉斯(Jeffrey C. Williams)著;赵文广等译 1998
- 《Visual Basic 6数据库访问技术》(美)(J.P.麦克马纳斯)Jeffrey P.McManus著;赵军锁等译 1999
- 《看透华尔街 第3版》(美)杰弗里·B.利特尔(Jeffrey B. Little),(美)卢西恩·罗兹(Lucien Rhodes)著;匡晓明等译 1999
- 《Windows核心编程》(美)J.里克特(Jeffrey Richter)著;王建华等译 2000
- 《市场营销》(美)格林(Gloria Green),(美)威廉姆斯(Jeffrey Williams)著;李进译 1999
- 《现代产业组织》(美)丹尼斯·卡尔顿(Dennis W.Carlton),(美)杰弗里·佩罗夫(Jeffrey M.Perloff)著;黄亚钧等译 1998
- 《使用Visual Basic 5.0 编程》(美)(J.P.麦克马纳斯)Jeffrey P.McManus著;龚杰等译 1998
- 《中国“80后”大学教师胜任力评价研究=RESEARCH ON THE EVALUATION OF CHINA'S POST 80s GENERATION UNIVERSITY TEACHERS' CO》黄艳著 2013
- 《解读好莱坞:电影的空间与意义》Deborah Thomas著;李达义,曹玉玲译 2004
- 《会说话的星图 星座篇》徐历涛著 2014
- 《可靠性工程与风险管理 第3辑 英文版》赵衍刚编 2012
- 《竞争战略 全译珍藏版》(美)迈克尔·波特(Michael E. Porter)著 2012
- 《中国材料名师讲坛 第1辑》谢建新主编 2012
- 《翻译能力的培养》舍夫娜,阿达巴编 2012
- 《大学生外语口语焦虑 自我图式的视角 for university students: in the view of self-schema》巫文胜著 2014
- 《都柏林大学的教育内涵与实践 探索世界高水平大学发展之路 explore the development of the world high-level university》李全宏编著 2013
- 《物理学 卷1 力学和热学 医学、生物等专业适用 英文改编版原书第4版》AlanGiambattista,BettyMcCarthyRichardson著 2013