《R语言机器学习 第2版 影印版》PDF下载

  • 购买积分:14 如何计算积分?
  • 作  者:Brett Lantz
  • 出 版 社:南京:东南大学出版社
  • 出版年份:2017
  • ISBN:9787564170714
  • 页数:427 页
图书介绍:本书与时俱进,携最新的库和最现代的编程思维为你丝丝入扣地介绍了专业数据科学必不可少的技能。不用再惧怕理论知识,书中提供了编写算法和处理数据所需的最关键的实用知识,只要有最基本的经验就可以了。你可以在书中找到洞悉复杂的数据所需的全部分析工具,还能学到如何选择正确的算法来解决特定的问题。通过与各种真实问题的亲密接触,你将学会如何应用机器学习方法来处理常见的任务,包括分类、预测、市场分析以及聚类。

Chapter 1:Introducing Machine Learning 1

The origins of machine learning 2

Uses and abuses of machine learning 4

Machine learning successes 5

The limits of machine learning 5

Machine learning ethics 7

How machines learn 9

Data storage 10

Abstraction 11

Generalization 13

Evaluation 14

Machine learning in practice 16

Types of input data 17

Types of machine learning algorithms 19

Matching input data to algorithms 21

Machine learning with R 22

Installing R packages 23

Loading and unloading R packages 24

Summary 25

Chapter 2:Managing and Understanding Data 27

R data structures 28

Vectors 28

Factors 30

Lists 32

Data frames 35

Matrixes and arrays 37

Managing data with R 39

Saving,loading,and removing R data structures 39

Importing and saving data from CSV files 41

Exploring and understanding data 42

Exploring the structure of data 43

Exploring numeric variables 44

Measuring the central tendency-mean and median 45

Measuring spread-quartiles and the five-number summary 47

Visualizing numeric variables-boxplots 49

Visualizing numeric variables-histograms 51

Understanding numeric data-uniform and normal distributions 53

Measuring spread-variance and standard deviation 54

Exploring categorical variables 56

Measuring the central tendency-the mode 58

Exploring relationships between variables 59

Visualizing relationships-scatterplots 59

Examining relationships-two-way cross-tabulations 61

Summary 64

Chapter 3:Lazy Learning-Classification Using Nearest Neighbors 65

Understanding nearest neighbor classification 66

The k-NN algorithm 66

Measuring similarity with distance 69

Choosing an appropriate k 70

Preparing data for use with k-NN 72

Why is the k-NN algorithm lazy? 74

Example-diagnosing breast cancer with the k-NN algorithm 75

Step 1-collecting data 76

Step 2-exploring and preparing the data 77

Transformation-normalizing numeric data 79

Data preparation-creating training and test datasets 80

Step 3-training a model on the data 81

Step 4-evaluating model performance 83

Step 5-improving model performance 84

Transformation-z-score standardization 85

Testing alternative values of k 86

Summary 87

Chapter 4:Probabilistic Learning-Classification Using Naive Bayes 89

Understanding Naive Bayes 90

Basic concepts of Bayesian methods 90

Understanding probability 91

Understanding joint probability 92

Computing conditional probability with Bayes'theorem 94

The Naive Bayes algorithm 97

Classification with Naive Bayes 98

The Laplace estimator 100

Using numeric features with Naive Bayes 102

Example-filtering mobile phone spam with the Naive Bayes algorithm 103

Step 1-collecting data 104

Step 2-exploring and preparing the data 105

Data preparation-cleaning and standardizing text data 106

Data preparation-splitting text documents into words 112

Data preparation-creating training and test datasets 115

Visualizing text data-word clouds 116

Data preparation-creating indicator features for frequent words 119

Step 3-training a model on the data 121

Step 4-evaluating model performance 122

Step 5-improving model performance 123

Summary 124

Chapter 5:Divide and Conquer-Classification Using Decision Trees and Rules 125

Understanding decision trees 126

Divide and conquer 127

The C5.0 decision tree algorithm 131

Choosing the best split 133

Pruning the decision tree 135

Example-identifying risky bank loans using C5.0 decision trees 136

Step 1-collecting data 136

Step 2-exploring and preparing the data 137

Data preparation-creating random training and test datasets 138

Step 3-training a model on the data 140

Step 4-evaluating model performance 144

Step 5-improving model performance 145

Boosting the accuracy of decision trees 145

Making mistakes more costlier than others 147

Understanding classification rules 149

Separate and conquer 150

The 1R algorithm 153

The RIPPER algorithm 155

Rules from decision trees 157

What makes trees and rules greedy? 158

Example-identifying poisonous mushrooms with rule learners 160

Step 1-collecting data 160

Step 2-exploring and preparing the data 161

Step 3-training a model on the data 162

Step 4-evaluating model performance 165

Step 5-improving model performance 166

Summary 169

Chapter 6:Forecasting Numeric Data-Regression Methods 171

Understanding regression 172

Simple linear regression 174

Ordinary least squares estimation 177

Correlations 179

Multiple linear regression 181

Example-predicting medical expenses using linear regression 186

Step 1-collecting data 186

Step 2-exploring and preparing the data 187

Exploring relationships among features-the correlation matrix 189

Visualizing relationships among features-the scatterplot matrix 190

Step 3-training a model on the data 193

Step 4-evaluating model performance 196

Step 5-improving model performance 197

Model specification-adding non-linear relationships 198

Transformation-converting a numeric variable to a binary indicator 198

Model specification-adding interaction effects 199

Putting it all together-an improved regression model 200

Understanding regression trees and model trees 201

Adding regression to trees 202

Example-estimating the quality of wines with regression trees and model trees 205

Step 1-collecting data 205

Step 2-exploring and preparing the data 206

Step 3-training a model on the data 208

Visualizing decision trees 210

Step 4-evaluating model performance 212

Measuring performance with the mean absolute error 213

Step 5-improving model performance 214

Summary 218

Chapter 7:Black Box Methods-Neural Networks and Support Vector Machines 219

Understanding neural networks 220

From biological to artificial neurons 221

Activation functions 223

Network topology 225

The number of layers 226

The direction of information travel 227

The number of nodes in each layer 228

Training neural networks with backpropagation 229

Example-Modeling the strength of concrete with ANNs 231

Step 1-collecting data 232

Step 2-exploring and preparing the data 232

Step 3-training a model on the data 234

Step 4-evaluating model performance 237

Step 5-improving model performance 238

Understanding Support Vector Machines 239

Classification with hyperplanes 240

The case of linearly separable data 242

The case of nonlinearly separable data 244

Using kernels for non-linear spaces 245

Example-performing OCR with SVMs 248

Step 1-collecting data 249

Step 2-exploring and preparing the data 250

Step 3-training a model on the data 252

Step 4-evaluating model performance 254

Step 5-improving model performance 256

Chapter 8:Finding Patterns-Market Basket Analysis Using Association Rules 259

Understanding association rules 260

The Apriori algorithm for association rule learning 261

Measuring rule interest-support and confidence 263

Building a set of rules with the Apriori principle 265

Example-identifying frequently purchased groceries with association rules 266

Step 1-collecting data 266

Step 2-exploring and preparing the data 267

Data preparation-creating a sparse matrix for transaction data 268

Visualizing item support-item frequency plots 272

Visualizing the transaction data-plotting the sparse matrix 273

Step 3-training a model on the data 274

Step 4-evaluating model performance 277

Step 5-improving model performance 280

Sorting the set of association rules 280

Taking subsets of association rules 281

Saving association rules to a file or data f?ame 283

Summary 284

Chapter 9:Finding Groups of Data-Clustering with k-means 285

Understanding clustering 286

Clustering as a machine learning task 286

The k-means clustering algorithm 289

Using distance to assign and update clusters 290

Choosing the appropriate number of clusters 294

Example-finding teen market segments using k-means clustering 296

Step 1-collecting data 297

Step 2-exploring and preparing the data 297

Data preparation-dummy coding missing values 299

Data preparation-imputing the missing values 300

Step 3-training a model on the data 302

Step 4-evaluating model performance 304

Step 5-improving model performance 308

Summary 310

Chapter 10:Evaluating Model Performance 311

Measuring performance for classification 312

Working with classification prediction data in R 313

A closer look at confusion matrices 317

Using confusion matrices to measure performance 319

Beyond accuracy-other measures of performance 321

The kappa statistic 323

Sensitivity and specificity 326

Precision and recall 328

The F-measure 330

Visualizing performance trade-offs 331

ROC curves 332

Estimating future performance 336

The holdout method 336

Cross-validation 340

Bootstrap sampling 343

Summary 344

Chapter 11:Improving Model Performance 347

Tuning stock models for better performance 348

Using caret for automated parameter tuning 349

Creating a simple tuned model 352

Customizing the tuning process 355

Improving model performance with meta-learning 359

Understanding ensembles 359

Bagging 362

Boosting 366

Random forests 369

Training random forests 370

Evaluating random forest performance 373

Summary 375

Chapter 12:Specialized Machine Learning Topics 377

Working with proprietary files and databases 378

Reading from and writing to Microsoff Excel,SAS,SPSS,and Stata files 378

Querying data in SQL databases 379

Working with online data and services 381

Downloading the complete text of web pages 382

Scraping data from web pages 383

Parsing XML documents 387

Parsing JSON from web APIs 388

Working with domain-specific data 392

Analyzing bioinformatics data 393

Analyzing and visualizing network data 393

Improving the performance of R 398

Managing very large datasets 398

Generalizing tabular data structures with dplyr 399

Making data frames faster with data.table 401

Creating disk-based data frames with ff 402

Using massive matrices with bigmemory 404

Learning faster with parallel computing 404

Measuring execution time 406

Working in parallel with multicore and snow 406

Taking advantage of parallel with foreach and doParallel 410

Parallel cloud computing with MapReduce and Hadoop 411

GPU computing 412

Deploying optimized learning algorithms 413

Building bigger regression models with biglm 414

Growing bigger and faster random forests with bigrf 414

Training and evaluating models in parallel with caret 414

Summary 416

Index 417