大数据分析 R语言实现PDF电子书下载
- 电子书积分:15 积分如何计算积分?
- 作 者:(英)西蒙?沃克威克
- 出 版 社:南京:东南大学出版社
- 出版年份:2017
- ISBN:9787564173616
- 页数:490 页
Preface 1
Chapter 1:The Era of Big Data 7
Big Data-The monster re-defined 7
Big Data toolbox-dealing with the giant 11
Hadoop-the elephant in the room 12
Databases 15
Hadoop Spark-ed up 16
R-The unsung Big Data hero 17
Summary 24
Chapter 2:Introduction to R Programming Language and Statistical Environment 25
Learning R 25
Revisiting R basics 28
Getting R and RStudio ready 28
Setting the URLs to R repositories 30
R data structures 32
Vectors 32
Scalars 35
Matrices 35
Arrays 37
Data frames 38
Lists 41
Exporting R data objects 42
Applied data science with R 47
Importing data from different formats 48
Exploratory Data Analysis 50
Data aggregations and contingency tables 53
Hypothesis testing and statistical inference 56
Tests of differences 57
Independent t-test example(with power and effect size estimates) 57
ANOVA example 60
Tests of relationships 63
An example of Pearson's r correlations 63
Multiple regression example 65
Data visualization packages 70
Summary 71
Chapter 3:Unleashing the Power of R from Within 73
Traditional limitations of R 74
Out-of-memory data 74
Processing speed 75
To the memory limits and beyond 76
Data transformations and aggregations with the ff and ffbase packages 76
Generalized linear models with the ff and ffbase packages 87
Logistic regression example with ffbase and biglm 89
Expanding memory with the bigmemory package 97
Parallel R 106
From bigmemory to faster computations 107
An apply()example with the big.matrix object 108
A for()loop example with the ffdf object 108
Using apply()and for()loop examples on a data.frame 109
A parallel package example 110
A foreach package example 113
The future of parallel processing in R 115
Utilizing Graphics Processing Units with R 115
Multi-threading with Microsoft R Open distribution 117
Parallel machine learning with H2O and R 118
Boosting R performance with the data.table package and other tools 118
Fast data import and manipulation with the data.table package 118
Data import with data.table 119
Lightning-fast subsets and aggregations on data.table 120
Chaining,more complex aggregations,and pivot tables with data.table 123
Writing better R code 126
Summary 127
Chapter 4:Hadoop and MapReduce Framework for R 129
Hadoop architecture 130
Hadoop Distributed File System 130
MapReduce framework 131
A simple MapReduce word count example 132
Other Hadoop native tools 134
Learning Hadoop 136
A single-node Hadoop in Cloud 137
Deploying Hortonworks Sandbox on Azure 138
A word count example in Hadoop using Java 159
A word count example in Hadoop using the R language 169
RStudio Server on a Linux RedHat/CentOS virtual machine 169
Installing and configuring RHadoop packages 177
HDFS management and MapReduce in R-a word count example 179
HDInsight-a multi-node Hadoop cluster on Azure 194
Creating your first HDInsight cluster 194
Creating a new Resource Group 195
Deploying a Virtual Network 197
Creating a Network Security Group 200
Setting up and configuring an HDInsight cluster 203
Starting the cluster and exploring Ambari 211
Connecting to the HDInsight cluster and installing RStudio Server 215
Adding a new inbound security rule for port 8787 218
Editing the Virtual Network's public IP address for the head node 221
Smart energy meter readings analysis example-using R on HDInsight cluster 229
Summary 241
Chapter 5:R with Relational Database Management Systems(RDBMSs) 243
Relational Database Management Systems(RDBMSs) 244
A short overview of used RDBMSs 244
Structured Query Language(SQL) 245
SQLite with R 247
Preparing and importing data into a local SQLite database 248
Connecting to SQLite from RStudio 250
MariaDB with R on a Amazon EC2 instance 255
Preparing the EC2 instance and RStudio Server for use 255
Preparing MariaDB and data for use 257
Working with MariaDB from RStudio 266
PostgreSQL with R on Amazon RDS 281
Launching an Amazon RDS database instance 281
Preparing and uploading data to Amazon RDS 290
Remotely querying PostgreSQL on Amazon RDS from RStudio 304
Summary 314
Chapter 6:R with Non-Relational(NoSQL)Databases 315
Introduction to NoSQL databases 315
Review of leading non-relational databases 316
MongoDB with R 319
Introduction to MongoDB 319
MongoDB data models 319
Installing MongoDB with R on Amazon EC2 322
Processing Big Data using MongoDB with R 325
Importing data into MongoDB and basic MongoDB commands 326
MongoDB with R using the rmongodb package 333
MongoDB with R using the RMongo package 346
MongoDB with R using the mongolite package 350
HBase with R 355
Azure HDInsight with HBase and RStudio Server 355
Importing the data to HDFS and HBase 363
Reading and querying HBase using the rhbase package 367
Summary 372
Chapter 7:Faster than Hadoop-Spark with R 373
Spark for Big Data analytics 374
Spark with R on a multi-node HDInsight cluster 375
Launching HDInsight with Spark and R/RStudio 375
Reading the data into HDFS and Hive 383
Getting the data into HDFS 385
Importing data from HDFS to Hive 386
Bay Area Bike Share analysis using SparkR 393
Summary 411
Chapter 8:Machine Learning Methods for Big Data in R 413
What is machine learning? 414
Supervised and unsupervised machine learning methods 415
Classification and clustering algorithms 416
Machine learning methods with R 417
Big Data machine learning tools 418
GLM example with Spark and R on the HDInsight cluster 419
Preparing the Spark cluster and reading the data from HDFS 419
Logistic regression in Spark with R 425
Naive Bayes with H2O on Hadoop with R 437
Running an H2O instance on Hadoop with R 437
Reading and exploring the data in H2O 441
Naive Bayes on H2O with R 446
Neural Networks with H2O on Hadoop with R 458
How do Neural Networks work? 458
Running Deep Learning models on H2O 461
Summary 469
Chapter 9:The Future of R-Big,Fast,and Smart Data 471
The current state of Big Data analytics with R 471
Out-of-memory data on a single machine 471
Faster data processing with R 473
Hadoop with R 475
Spark with R 476
R with databases 477
Machine learning with R 478
The future of R 478
Big Data 479
Fast data 480
Smart data 481
Where to go next 482
Summary 482
Index 483
- 《水面舰艇编队作战运筹分析》谭安胜著 2009
- 《SQL与关系数据库理论》(美)戴特(C.J.Date) 2019
- 《分析化学》陈怀侠主编 2019
- 《数据库技术与应用 Access 2010 微课版 第2版》刘卫国主编 2020
- 《区块链DAPP开发入门、代码实现、场景应用》李万胜著 2019
- 《影响葡萄和葡萄酒中酚类特征的因素分析》朱磊 2019
- 《仪器分析技术 第2版》曹国庆 2018
- 《程序逻辑及C语言编程》卢卫中,杨丽芳主编 2019
- 《全国普通高等中医药院校药学类专业十三五规划教材 第二轮规划教材 分析化学实验 第2版》池玉梅 2018
- 《大数据Hadoop 3.X分布式处理实战》吴章勇,杨强 2020
- 《古代巴比伦》(英)莱昂纳德·W.金著 2019
- 《BBC人体如何工作》(英)爱丽丝.罗伯茨 2019
- 《一个数学家的辩白》(英)哈代(G.H.Hardy)著;李文林,戴宗铎,高嵘译 2019
- 《莎士比亚全集 2》(英)莎士比亚著,朱生豪等译 2002
- 《莎士比亚戏剧精选集》(英)威廉·莎士比亚(William Shakespeare)著 2020
- 《莎士比亚 叙事诗·抒情诗·戏剧》(英)威廉·莎士比亚著 2019
- 《亚历山大继业者战争 上 将领与战役》(英)鲍勃·本尼特,(英)麦克·罗伯茨著;张晓媛译 2019
- 《孩子们的音乐之旅 1 宝宝睡觉 幼儿版》包菊英主编 2016
- 《超级参与者》王金强责编;赵磊译者;(澳)杰里米·海曼斯,(英)亨利·蒂姆斯 2020
- 《物联网导论》张翼英主编 2020