1.A Review of Machine Learning 1
The Learning Machines 1
How Can Machines Learn? 2
Biological Inspiration 4
What Is Deep Learning? 6
Going Down the Rabbit Hole 7
Framing the Questions 8
The Math Behind Machine Learning:Linear Algebra 8
Scalars 9
Vectors 9
Matrices 10
Tensors 10
Hyperplanes 10
Relevant Mathematical Operations 11
Converting Data Into Vectors 11
Solving Systems of Equations 13
The Math Behind Machine Learning:Statistics 15
Probability 16
Conditional Probabilities 18
Posterior Probability 19
Distributions 19
Samples Versus Population 22
Resampling Methods 22
Selection Bias 22
Likelihood 23
How Does Machine Learning Work? 23
Regression 23
Classification 25
Clustering 26
Underfitting and Overfitting 26
Optimization 27
Convex Optimization 29
Gradient Descent 30
Stochastic Gradient Descent 32
Quasi-Newton Optimization Methods 33
Generative Versus Discriminative Models 33
Logistic Regression 34
The Logistic Function 35
Understanding Logistic Regression Output 35
Evaluating Models 36
The Confusion Matrix 36
Building an Understanding of Machine Learning 40
2.Foundations of Neural Networks and Deep Learning 41
Neural Networks 41
The Biological Neuron 43
The Perceptron 45
Multilayer Feed-Forward Networks 50
Training Neural Networks 56
Backpropagation Learning 57
Activation Functions 65
Linear 66
Sigmoid 66
Tanh 67
Hard Tanh 68
Softmax 68
Rectified Linear 69
Loss Functions 71
Loss Function Notation 71
Loss Functions for Regression 72
Loss Functions for Classification 75
Loss Functions for Reconstruction 77
Hyperparameters 78
Learning Rate 78
Regularization 79
Momentum 79
Sparsity 80
3.Fundamentals of Deep Networks 81
Defining Deep Learning 81
What Is Deep Learning? 81
Organization of This Chapter 91
Common Architectural Principles of Deep Networks 92
Parameters 92
Layers 93
Activation Functions 93
Loss Functions 95
Optimization Algorithms 96
Hyperparameters 100
Summary 105
Building Blocks of Deep Networks 105
RBMs 106
Autoencoders 112
Variational Autoencoders 114
4.Major Architectures of Deep Networks 117
Unsupervised Pretrained Networks 118
Deep Belief Networks 118
Generative Adversarial Networks 121
Convolutional Neural Networks(CNNs) 125
Biological Inspiration 126
Intuition 126
CNN Architecture Overview 128
Input Layers 130
Convolutional Layers 130
Pooling Layers 140
Fully Connected Layers 140
Other Applications of CNNs 141
CNNs of Note 141
Summary 142
Recurrent Neural Networks 143
Modeling the Time Dimension 143
3D Volumetric Input 146
Why Not Markov Models? 148
General Recurrent Neural Network Architecture 149
LSTM Networks 150
Domain-Specific Applications and Blended Networks 159
Recursive Neural Networks 160
Network Architecture 160
Varieties of Recursive Neural Networks 161
Applications of Recursive Neural Networks 161
Summary and Discussion 162
Will Deep Learning Make Other Algorithms Obsolete? 162
Different Problems Have Different Best Methods 162
When Do I Need Deep Learning? 163
5.Building Deep Networks 165
Matching Deep Networks to the Right Problem 165
Columnar Data and Multilayer Perceptrons 166
Images and Convolutional Neural Networks 166
Time-series Sequences and Recurrent Neural Networks 167
Using Hybrid Networks 169
The DL4J Suite of Tools 169
Vectorization and DataVec 170
Runtimes and ND4J 170
Basic Concepts of the DL4J API 172
Loading and Saving Models 172
Getting Input for the Model 173
Setting Up Model Architecture 173
Training and Evaluation 174
Modeling CSV Data with Multilayer Perceptron Networks 175
Setting Up Input Data 178
Determining Network Architecture 178
Training the Model 181
Evaluating the Model 181
Modeling Handwritten Images Using CNNs 182
Java Code Listing for the LeNet CNN 183
Loading and Vectorizing the Input Images 185
Network Architecture for LeNet in DL4J 186
Training the CNN 190
Modeling Sequence Data by Using Recurrent Neural Networks 191
Generating Shakespeare via LSTMs 191
Classifying Sensor Time-series Sequences Using LSTMs 200
Using Autoencoders for Anomaly Detection 207
Java Code Listing for Autoencoder Example 207
Setting Up Input Data 211
Autoencoder Network Architecture and Training 211
Evaluating the Model 213
Using Variational Autoencoders to Reconstruct MNIST Digits 214
Code Listing to Reconstruct MNIST Digits 214
Examining the VAE Model 217
Applications of Deep Learning in Natural Language Processing 221
Learning Word Embedding Using Word2Vec 221
Distributed Representations of Sentences with Paragraph Vectors 227
Using Paragraph Vectors for Document Classification 231
6.Tuning Deep Networks 237
Basic Concepts in Tuning Deep Networks 237
An Intuition for Building Deep Networks 238
Building the Intuition as a Step-by-Step Process 239
Matching Input Data and Network Architectures 240
Summary 241
Relating Model Goal and Output Layers 242
Regression Model Output Layer 242
Classification Model Output Layer 243
Working with Layer Count,Parameter Count,and Memory 246
Feed-Forward Multilayer Neural Networks 246
Controlling Layer and Parameter Counts 247
Estimating Network Memory Requirements 250
Weight Initialization Strategies 251
Using Activation Functions 253
Summary Table for Activation Functions 255
Applying Loss Functions 256
Understanding Learning Rates 258
Using the Ratio of Updates-to-Parameters 259
Specific Recommendations for Learning Rates 260
How Sparsity Affects Learning 263
Applying Methods of Optimization 263
SGD Best Practices 265
Using Parallelization and GPUs for Faster Training 265
Online Learning and Parallel Iterative Algorithms 266
Parallelizing SGD in DL4J 269
GPUs 272
Controlling Epochs and Mini-Batch Size 273
Understanding Mini-Batch Size Trade-Offs 274
How to Use Regularization 275
Priors as Regularizers 275
Max-Norm Regularization 276
Dropout 277
Other Regularization Topics 279
Working with Class Imbalance 280
Methods for Sampling Classes 282
Weighted Loss Functions 282
Dealing with Overfitting 283
Using Network Statistics from the Tuning UI 284
Detecting Poor Weight Initialization 287
Detecting Nonshuffled Data 288
Detecting Issues with Regularization 290
7.Tuning Specific Deep Network Architectures 293
Convolutional Neural Networks(CNNs) 293
Common Convolutional Architectural Patterns 294
Configuring Convolutional Layers 297
Configuring Pooling Layers 303
Transfer Learning 304
Recurrent Neural Networks 306
Network Input Data and Input Layers 307
Output Layers and RnnOutputLayer 308
Training the Network 309
Debugging Common Issues with LSTMs 311
Padding and Masking 312
Evaluation and Scoring With Masking 313
Variants of Recurrent Network Architectures 314
Restricted Boltzmann Machines 314
Hidden Units and Modeling Available Information 315
Using Different Units 316
Using Regularization with RBMs 317
DBNs 317
Using Momentum 318
Using Regularization 319
Determining Hidden Unit Count 320
8.Vectorization 321
Introduction to Vectorization in Machine Learning 321
Why Do We Need to Vectorize Data? 322
Strategies for Dealing with Columnar Raw Data Attributes 325
Feature Engineering and Normalization Techniques 327
Using Data Vec for ETL and Vectorization 334
Vectorizing Image Data 336
Image Data Representation in DL4J 337
Image Data and Vector Normalization with DataVec 339
Working with Sequential Data in Vectorization 340
Major Variations of Sequential Data Sources 340
Vectorizing Sequential Data with DataVec 341
Working with Text in Vectorization 347
Bag of Words 348
TF-IDF 349
Comparing Word2Vec and VSM Comparison 353
Working with Graphs 354
9.Using Deep Learning and DL4J on Spark 357
Introduction to Using DL4J with Spark and Hadoop 357
Operating Spark from the Command Line 360
Configuring and Tuning Spark Execution 362
Running Spark on Mesos 363
Running Spark on YARN 364
General Spark Tuning Guide 367
Tuning DL4J Jobs on Spark 371
Setting Up a Maven Project Object Model for Spark and DL4J 372
A pom.xml File Dependency Template 374
Setting Up a POM File for CDH 5.X 378
Setting Up a POM File for HDP 2.4 378
Troubleshooting Spark and Hadoop 379
Common Issues with ND4J 380
DL4J Parallel Execution on Spark 381
A Minimal Spark Training Example 383
DL4J API Best Practices for Spark 385
Multilayer Perceptron Spark Example 387
Setting Up MLP Network Architecture for Spark 390
Distributed Training and Model Evaluation 390
Building and Executing a DL4J Spark Job 392
Generating Shakespeare Text with Spark and Long Short-Term Memory 392
Setting Up the LSTM Network Architecture 395
Training,Tracking Progress,and Understanding Results 396
Modeling MNIST with a Convolutional Neural Network on Spark 397
Configuring the Spark Job and Loading MNIST Data 400
Setting Up the LeNet CNN Architecture and Training 401
A.What Is Artificial Intelligence? 405
B.RL4J and Reinforcement Learning 417
C.Numbers Everyone Should Know 441
D.Neural Networks and Backpropagation:A Mathematical Approach 443
E.Using the ND4J API 449
F.Using DataVec 463
G.Working with DL4J from Source 475
H.Setting Up DL4J Projects 477
I.Setting Up GPUs for DL4J Projects 483
J.Troubleshooting DL4J Installations 487
Index 495