1.APPLIED MULTIVARIATE METHODS 1
1.1 An Overview of Multivariate Methods 1
Contents 1
Variable-and Individual-Directed Techniques 2
Creating New Variables 2
Principal Components Analysis 3
Factor Analysis 3
Discriminant Analysis 4
Cluster Analysis 5
Canonical Discriminant Analysis 5
Logistic Regression 5
Multivariate Analysis of Variance 6
Canonical Variates Analysis 7
Canonical Correlation Analysis 7
Where to Find the Preceding Topics 7
1.2 Two Examples 8
1.3 Types of Variables 11
Independence of Experimental Units 11
1.4 Data Matrices and Vectors 12
Variable Notation 13
Data Matrix 13
Data Vectors 13
Data Subscripts 14
1.5 The Multivariate Normal Distribution 15
Some Definitions 15
Summarizing Multivariate Distributions 16
Mean Vectors and Variance-Covariance Matrices 16
Correlations and Correlation Matrices 17
The Multivariate Normal Probability Density Function 19
Bivariate Normal Distributions 19
1.6 Statistical Computing 22
Cautions About Computer Usage 22
Missing Values 22
Removing Rows of the Data Matrix 23
Replacing Missing Values by Averages 23
Replacing Missing Values by Zeros 23
Sampling Strategies 24
Data Entry Errors and Data Verification 24
1.7 Multivariate Outliers 25
Locating Outliers 25
Dealing with Outliers 25
Outliers May Be Influential 26
1.8 Multivariate Summary Statistics 26
1.9 Standardized Data and/or Z Scores 27
Exercises 28
2.SAMPLE CORRELATIONS 35
2.1 Statistical Tests and Confidence Intervals 35
Are the Correlations Large Enough to Be Useful? 36
Confidence Intervals by the Chart Method 36
Confidence Intervals by Fisher's Approximation 38
Confidence Intervals by Ruben's Approximation 39
Variable Groupings Based on Correlations 40
Relationship to Factor Analysis 46
2.2 Summary 46
Exercises 47
3.MULTIVARIATE DATA PLOTS 55
3.1 Three-Dimensional Data Plots 55
3.2 Plots of Higher Dimensional Data 59
Chernoff Faces 61
Star Plots and Sun-Ray Plots 63
Andrews'Plots 65
Side-by-Side Scatter Plots 66
3.3 Plotting to Check for Multivariate Normality 67
Summary 73
Exercises 73
4.EIGENVALUES AND EIGENVECTORS 77
4.1 Trace and Determinant 77
Examples 78
4.2 Eigenvalues 78
4.3 Eigenvectors 79
Positive Definite and Positive Semidefinite Matrices 80
4.4 Geometric Descriptions(p=2) 82
Vectors 82
Bivariate Normal Distributions 83
4.5 Geometric Descriptions(p=3) 87
Vectors 87
Trivariate Normal Distributions 87
4.6 Geometric Descriptions(p>3) 90
Exercises 91
Summary 91
5.PRINCIPAL COMPONENTS ANALYSIS 93
5.1 Reasons for Using Principal Components Analysis 93
Data Screening 93
Clustering 95
Discriminant Analysis 95
Regression 95
5.3 Principal Components Analysis on the Variance-Covariance Matrix ∑ 96
5.2 Objectives of Principal Components Analysis 96
Principal Component Scores 98
Component Loading Vectors 98
5.4 Estimation of Principal Components 99
Estimation of Principal Component Scores 99
5.5 Determining the Number of Principal Components 99
Method 1 100
Method 2 100
5.6 Caveats 107
5.7 PCA on the Correlation Matrix P 109
Principal Component Scores 110
Component Correlation Vectors 110
Sample Correlation Matrix 110
Determining the Number of Principal Components 110
5.8 Testing for Independence of the Original Variables 111
5.9 Structural Relationships 111
SASR PRINCOMP Procedure 112
5.10 Statistical Computing Packages 112
Principal Components Analysis Using Factor Analysis Programs 118
PCA with SPSS's FACTOR Procedure 124
Summary 142
Exercises 142
6.FACTOR ANALYSIS 147
6.1 Objectives of Factor Analysis 147
6.3 Some History of Factor Analysis 148
6.2 Caveats 148
6.4 The Factor Analysis Model 150
Assumptions 150
Matrix Form of the Factor Analysis Model 151
Definitions of Factor Analysis Terminology 151
6.5 Factor Analysis Equations 151
Nonuniqueness of the Factors 152
6.6 Solving the Factor Analysis Equations 153
6.7 Choosing the Appropriate Number of Factors 155
Objective Criteria 156
Subjective Criteria 156
6.8 Computer Solutions of the Factor Analysis Equations 157
Principal Factor Method on R 158
Principal Factor Method with Iteration 159
6.9 Rotating Factors 170
Examples(m=2) 171
Rotation Methods 172
The Varimax Rotation Method 173
6.10 Oblique Rotation Methods 174
6.11 Factor Scores 180
Bartlett's Method or the Weighted Least-Squares Method 181
Thompson's Method or the Regression Method 181
Ad Hoc Methods 181
Summary 212
Exercises 213
7.DISCRIMINANT ANALYSIS 217
7.1 Discrimination for Two Multivariate Normal Populations 217
A Posterior Probability Rule 218
A Mahalanobis Distance Rule 218
The Linear Discriminant Function Rule 218
A Likelihood Rule 218
Sample Discriminant Rules 219
Estimating Probabilities of Misclassification 220
Resubstitution Estimates 220
Estimates from Holdout Data 220
Cross-Validation Estimates 221
7.2 Cost Functions and Prior Probabilities(Two Populations) 229
7.3 A General Discriminant Rule(Two Populations) 231
A Cost Function 232
Prior Probabilities 232
Average Cost of Misclassification 232
A Bayes Rule 233
Classification Functions 233
Unequal Covariance Matrices 233
Tricking Computing Packages 234
7.4 Discriminant Rules(More than Two Populations) 235
Basic Discrimination 238
7.5 Variable Selection Procedures 245
Forward Selection Procedure 245
Backward Elimination Procedure 246
Stepwise Selection Procedure 246
Recommendations 247
Caveats 247
7.6 Canonical Discriminant Functions 255
The First Canonical Function 256
A Second Canonical Function 257
Determining the Dimensionality of the Canonical Space 260
Discriminant Analysis with Categorical Predictor Variables 273
7.7 Nearest Neighbor Discriminant Analysis 275
7.8 Classification Trees 283
Summary 283
Exercises 283
8.1 Logistic Regression Model 287
8.2 The Logit Transformation 287
8.LOGISTIC REGRESSION METHODS 287
Model Fitting 288
8.3 Variable Selection Methods 296
8.4 Logistic Discriminant Analysis(More Than Two Populations) 301
Logistic Regression Models 301
Model Fitting 302
Another SAS LOGISTIC Analysis 314
Exercises 316
Ruler Distance 319
9.CLUSTER ANALYSIS 319
9.1 Measures of Similarity and Dissimilarity 319
Standardized Ruler Distance 320
A Mahalanobis Distance 320
Dissimilarity Measures 320
9.2 Graphical Aids in Clustering 321
Scatter Plots 321
9.3 Clustering Methods 322
Other Methods 322
Andrews'Plots 322
Using Principal Components 322
Nonhierarchical Clustering Methods 323
Hierarchical Clustering 323
Nearest Neighbor Method 323
A Hierarchical Tree Diagram 325
Other Hierarchical Clustering Methods 326
Verification of Clustering Methods 327
How Many Clusters? 327
Comparisons of Clustering Methods 327
Beale's F-Type Statistic 328
A Pseudo Hotelling's T2 Test 329
The Cubic Clustering Criterion 329
Clustering Order 334
Estimating the Number of Clusters 339
Principal Components Plots 348
Clustering with SPSS 355
SAS's FASTCLUS Procedure 369
9.4 Multidimensional Scaling 385
Exercises 395
10.MEAN VECTORS AND VARIANCE-COVARIANCE MATRICES 397
10.1 Inference Procedures for Variance-Covariance Matrices 397
A Test for a Specific Variance-Covariance Matrix 398
A Test for Sphericity 400
A Test for Compound Symmetry 403
A Test for the Huynh-Feldt Conditions 405
A Test for Independence 406
A Test for Independence of Subsets of Variables 407
A Test for the Equality of Several Variance-Covariance Matrices 408
10.2 Inference Procedures for a Mean Vector 408
Hotelling's T2 Statistic 409
Hypothesis Test for μ 409
Confidence Region for μ 409
A More General Result 411
Special Case—A Test of Symmetry 412
Fitting a Line to Repeated Measures 418
A Test for Linear Trend 418
Multivariate Quality Control 419
10.3 Two Sample Procedures 420
Repeated Measures Experiments 420
10.4 Profile Analyses 431
10.5 Additional Two-Group Analyses 432
Paired Samples 432
Small Sample Sizes 433
Large Sample Sizes 433
Unequal Variance-Covariance Matrices 433
Summary 434
Exercises 434
11.MULTIVARIATE ANALYSIS OF VARIANCE 439
11.1 MANOVA 439
MANOVA Assumptions 440
Test Statistics 440
Test Comparisons 441
Why Do We Use MANOVAs? 441
A Conservative Approach to Multiple Comparisons 442
11.2 Dimensionality of the Alternative Hypothesis 455
11.3 Canonical Variates Analysis 456
The First Canonical Variate 456
The Second Canonical Variate 457
Other Canonical Variates 457
11.4 Confidence Regions for Canonical Variates 458
Summary 485
Exercises 485
12.1 Multiple Regression 489
12.PREDICTION MODELS AND MULTIVARIATE REGRESSION 489
12.2 Canonical Correlation Analysis 494
Two Sets of Variables 494
The First Canonical Correlation 495
The Second Canonical Correlation 495
Number of Canonical Correlations 496
Estimates 496
Hypothesis Tests on the Canonical Correlations 497
Interpreting Canonical Functions 508
Canonical Correlation Analysis with SPSS 511
12.3 Factor Analysis and Regression 515
Summary 522
Exercises 522
APPENDIX A:MATRIX RESULTS 525
A.1 Basic Definitions and Rules of Matrix Algebra 525
A.2 Quadratic Forms 527
A.3 Eigenvalues and Eigenvectors 528
A.5 Miscellaneous Results 529
A.4 Distances and Angles 529
APPENDIX B:WORK ATTITUDES SURVEY 531
B.1 Data File Structure 536
B.2 SPSS Data Entry Commands 538
B.3 SAS Data Entry Commands 543
APPENDIX C:FAMILY CONTROL STUDY 547
REFERENCES 555
Index 563