R语言机器学习 第2版 影印版PDF电子书下载
- 电子书积分:14 积分如何计算积分?
- 作 者:Brett Lantz
- 出 版 社:南京:东南大学出版社
- 出版年份:2017
- ISBN:9787564170714
- 页数:427 页
Chapter 1:Introducing Machine Learning 1
The origins of machine learning 2
Uses and abuses of machine learning 4
Machine learning successes 5
The limits of machine learning 5
Machine learning ethics 7
How machines learn 9
Data storage 10
Abstraction 11
Generalization 13
Evaluation 14
Machine learning in practice 16
Types of input data 17
Types of machine learning algorithms 19
Matching input data to algorithms 21
Machine learning with R 22
Installing R packages 23
Loading and unloading R packages 24
Summary 25
Chapter 2:Managing and Understanding Data 27
R data structures 28
Vectors 28
Factors 30
Lists 32
Data frames 35
Matrixes and arrays 37
Managing data with R 39
Saving,loading,and removing R data structures 39
Importing and saving data from CSV files 41
Exploring and understanding data 42
Exploring the structure of data 43
Exploring numeric variables 44
Measuring the central tendency-mean and median 45
Measuring spread-quartiles and the five-number summary 47
Visualizing numeric variables-boxplots 49
Visualizing numeric variables-histograms 51
Understanding numeric data-uniform and normal distributions 53
Measuring spread-variance and standard deviation 54
Exploring categorical variables 56
Measuring the central tendency-the mode 58
Exploring relationships between variables 59
Visualizing relationships-scatterplots 59
Examining relationships-two-way cross-tabulations 61
Summary 64
Chapter 3:Lazy Learning-Classification Using Nearest Neighbors 65
Understanding nearest neighbor classification 66
The k-NN algorithm 66
Measuring similarity with distance 69
Choosing an appropriate k 70
Preparing data for use with k-NN 72
Why is the k-NN algorithm lazy? 74
Example-diagnosing breast cancer with the k-NN algorithm 75
Step 1-collecting data 76
Step 2-exploring and preparing the data 77
Transformation-normalizing numeric data 79
Data preparation-creating training and test datasets 80
Step 3-training a model on the data 81
Step 4-evaluating model performance 83
Step 5-improving model performance 84
Transformation-z-score standardization 85
Testing alternative values of k 86
Summary 87
Chapter 4:Probabilistic Learning-Classification Using Naive Bayes 89
Understanding Naive Bayes 90
Basic concepts of Bayesian methods 90
Understanding probability 91
Understanding joint probability 92
Computing conditional probability with Bayes'theorem 94
The Naive Bayes algorithm 97
Classification with Naive Bayes 98
The Laplace estimator 100
Using numeric features with Naive Bayes 102
Example-filtering mobile phone spam with the Naive Bayes algorithm 103
Step 1-collecting data 104
Step 2-exploring and preparing the data 105
Data preparation-cleaning and standardizing text data 106
Data preparation-splitting text documents into words 112
Data preparation-creating training and test datasets 115
Visualizing text data-word clouds 116
Data preparation-creating indicator features for frequent words 119
Step 3-training a model on the data 121
Step 4-evaluating model performance 122
Step 5-improving model performance 123
Summary 124
Chapter 5:Divide and Conquer-Classification Using Decision Trees and Rules 125
Understanding decision trees 126
Divide and conquer 127
The C5.0 decision tree algorithm 131
Choosing the best split 133
Pruning the decision tree 135
Example-identifying risky bank loans using C5.0 decision trees 136
Step 1-collecting data 136
Step 2-exploring and preparing the data 137
Data preparation-creating random training and test datasets 138
Step 3-training a model on the data 140
Step 4-evaluating model performance 144
Step 5-improving model performance 145
Boosting the accuracy of decision trees 145
Making mistakes more costlier than others 147
Understanding classification rules 149
Separate and conquer 150
The 1R algorithm 153
The RIPPER algorithm 155
Rules from decision trees 157
What makes trees and rules greedy? 158
Example-identifying poisonous mushrooms with rule learners 160
Step 1-collecting data 160
Step 2-exploring and preparing the data 161
Step 3-training a model on the data 162
Step 4-evaluating model performance 165
Step 5-improving model performance 166
Summary 169
Chapter 6:Forecasting Numeric Data-Regression Methods 171
Understanding regression 172
Simple linear regression 174
Ordinary least squares estimation 177
Correlations 179
Multiple linear regression 181
Example-predicting medical expenses using linear regression 186
Step 1-collecting data 186
Step 2-exploring and preparing the data 187
Exploring relationships among features-the correlation matrix 189
Visualizing relationships among features-the scatterplot matrix 190
Step 3-training a model on the data 193
Step 4-evaluating model performance 196
Step 5-improving model performance 197
Model specification-adding non-linear relationships 198
Transformation-converting a numeric variable to a binary indicator 198
Model specification-adding interaction effects 199
Putting it all together-an improved regression model 200
Understanding regression trees and model trees 201
Adding regression to trees 202
Example-estimating the quality of wines with regression trees and model trees 205
Step 1-collecting data 205
Step 2-exploring and preparing the data 206
Step 3-training a model on the data 208
Visualizing decision trees 210
Step 4-evaluating model performance 212
Measuring performance with the mean absolute error 213
Step 5-improving model performance 214
Summary 218
Chapter 7:Black Box Methods-Neural Networks and Support Vector Machines 219
Understanding neural networks 220
From biological to artificial neurons 221
Activation functions 223
Network topology 225
The number of layers 226
The direction of information travel 227
The number of nodes in each layer 228
Training neural networks with backpropagation 229
Example-Modeling the strength of concrete with ANNs 231
Step 1-collecting data 232
Step 2-exploring and preparing the data 232
Step 3-training a model on the data 234
Step 4-evaluating model performance 237
Step 5-improving model performance 238
Understanding Support Vector Machines 239
Classification with hyperplanes 240
The case of linearly separable data 242
The case of nonlinearly separable data 244
Using kernels for non-linear spaces 245
Example-performing OCR with SVMs 248
Step 1-collecting data 249
Step 2-exploring and preparing the data 250
Step 3-training a model on the data 252
Step 4-evaluating model performance 254
Step 5-improving model performance 256
Chapter 8:Finding Patterns-Market Basket Analysis Using Association Rules 259
Understanding association rules 260
The Apriori algorithm for association rule learning 261
Measuring rule interest-support and confidence 263
Building a set of rules with the Apriori principle 265
Example-identifying frequently purchased groceries with association rules 266
Step 1-collecting data 266
Step 2-exploring and preparing the data 267
Data preparation-creating a sparse matrix for transaction data 268
Visualizing item support-item frequency plots 272
Visualizing the transaction data-plotting the sparse matrix 273
Step 3-training a model on the data 274
Step 4-evaluating model performance 277
Step 5-improving model performance 280
Sorting the set of association rules 280
Taking subsets of association rules 281
Saving association rules to a file or data f?ame 283
Summary 284
Chapter 9:Finding Groups of Data-Clustering with k-means 285
Understanding clustering 286
Clustering as a machine learning task 286
The k-means clustering algorithm 289
Using distance to assign and update clusters 290
Choosing the appropriate number of clusters 294
Example-finding teen market segments using k-means clustering 296
Step 1-collecting data 297
Step 2-exploring and preparing the data 297
Data preparation-dummy coding missing values 299
Data preparation-imputing the missing values 300
Step 3-training a model on the data 302
Step 4-evaluating model performance 304
Step 5-improving model performance 308
Summary 310
Chapter 10:Evaluating Model Performance 311
Measuring performance for classification 312
Working with classification prediction data in R 313
A closer look at confusion matrices 317
Using confusion matrices to measure performance 319
Beyond accuracy-other measures of performance 321
The kappa statistic 323
Sensitivity and specificity 326
Precision and recall 328
The F-measure 330
Visualizing performance trade-offs 331
ROC curves 332
Estimating future performance 336
The holdout method 336
Cross-validation 340
Bootstrap sampling 343
Summary 344
Chapter 11:Improving Model Performance 347
Tuning stock models for better performance 348
Using caret for automated parameter tuning 349
Creating a simple tuned model 352
Customizing the tuning process 355
Improving model performance with meta-learning 359
Understanding ensembles 359
Bagging 362
Boosting 366
Random forests 369
Training random forests 370
Evaluating random forest performance 373
Summary 375
Chapter 12:Specialized Machine Learning Topics 377
Working with proprietary files and databases 378
Reading from and writing to Microsoff Excel,SAS,SPSS,and Stata files 378
Querying data in SQL databases 379
Working with online data and services 381
Downloading the complete text of web pages 382
Scraping data from web pages 383
Parsing XML documents 387
Parsing JSON from web APIs 388
Working with domain-specific data 392
Analyzing bioinformatics data 393
Analyzing and visualizing network data 393
Improving the performance of R 398
Managing very large datasets 398
Generalizing tabular data structures with dplyr 399
Making data frames faster with data.table 401
Creating disk-based data frames with ff 402
Using massive matrices with bigmemory 404
Learning faster with parallel computing 404
Measuring execution time 406
Working in parallel with multicore and snow 406
Taking advantage of parallel with foreach and doParallel 410
Parallel cloud computing with MapReduce and Hadoop 411
GPU computing 412
Deploying optimized learning algorithms 413
Building bigger regression models with biglm 414
Growing bigger and faster random forests with bigrf 414
Training and evaluating models in parallel with caret 414
Summary 416
Index 417
- 《党员干部理论学习培训教材 理论热点问题党员干部学习辅导》(中国)胡磊 2018
- 《程序逻辑及C语言编程》卢卫中,杨丽芳主编 2019
- 《幼儿园课程资源丛书 幼儿园语言教育资源》周兢编 2015
- 《深度学习与飞桨PaddlePaddle Fluid实战》于祥 2019
- 《全国普通高等中医药院校药学类专业“十三五”规划教材 第二轮规划教材 有机化学学习指导 第2版》赵骏 2018
- 《微笑 影印本》N.达列基作 1947
- 《高等学校“十三五”规划教材 C语言程序设计》翟玉峰责任编辑;(中国)李聪,曾志华,江伟 2019
- 《智能制造高技能人才培养规划丛书 ABB工业机器人虚拟仿真教程》(中国)工控帮教研组 2019
- 《音乐语言的根基》张艺编著 2019
- 《全国职业院校工业机器人技术专业规划教材 工业机器人现场编程》(中国)项万明 2019
- 《Java 5.0 Tiger程序高手秘笈》Brett McLaughlin,David Flanagan著;OReilly Taiwan公司编译 2005
- 《《世界国家公园奇观》全系列 美国国家公园》Michael Brett作;张静芬译 2007
- 《R语言机器学习 第2版 影印版》Brett Lantz 2017
- 《英国宪政史谭》布勒德, S.Ree.Brett著;陈世第译 1936
- 《房地产市场分析 案例研究方法》(美)阿德里安娜·施米茨(Adrienne Schmitz),(美)德博拉·L.布雷特(Deborah L.Brett)著;张红译 2003
- 《Java与XML数据绑定》Brett McLaugblin著;李二勇,祁力译 2003
- 《THE PSYCHOLOGY OF TRADING:TOOLS AND TECHNIQUES FOR MINDING THE MARKETS》BRETT N.STEENBARGER 2002
- 《RESEARCH HANDBOOK ON THE ECONOMICS OF CORPORTE LAW》CLAIRE A.HILL BRETT H.MCDONNELL 2012
- 《CROSS-CULTURAL PRACTICE:SOCIAL WORK WITH DIVERSE POPULATIONS》KAREN V.HARPER JIM LANTZ 1996
- 《UNDERSTANDING COMPUTERS TODAY AND TOMORROW 2003 ENHANCED EDITION》CHARLES S.PARKER DEBORRH MORLEY BRETT MIKETTA 2003