Part One Introduction 1
Introducing the Study 2
0.1 Introductory remarks 2
0.2 Need for this study 3
0.2.1 Theoretical considerations 3
0.2.2 Practical considerations 7
0.3 Description of the study 10
0.4 Organization of the study 11
0.5 Summary 12
Part Two Literature Review 13
Chapter 1 A Review of Existing Computer-Assisted Essay Scoring Systems 14
1.1 Introduction 14
1.2 Key concepts 14
1.2.1 Computer-assisted essay scoring 14
1.2.2 EFL writing assessment 16
1.3 Existing computer-assisted essay scoring systems 17
1.3.1 Project Essay Grade(PEG):A form-focused system 17
1.3.2 Intelligent Essay Assessor(IEA):A content-focused system 20
1.3.3 E-rater:A hybrid system with a modular structure 22
1.3.4 An appraisal of the three existing systems 25
1.4 Lessons from existing essay scoring systems 28
1.5 Summary 31
Chapter 2 Studies on Measures of Writing Quality 33
2.1 Introduction 33
2.2 Measures of writing quality in the literature 33
2.2.1 Measures of the quality of language 34
2.2.2 Measures of the quality of content and organization 51
2.3 An overview of the measures in the literature 57
2.4 A conceptual model for the computer-assisted scoring of EFL essays 61
2.5 Proposed measures of EFL writing quality 65
2.5.1 Proposed measures of the quality of language in EFL writing 65
2.5.2 Proposed measures of the quality of content in EFL writing 69
2.5.3 Proposed measures of the quality of organization in EFL writing 71
2.6 Summary 75
Part Three Methodology 77
Chapter 3 Research Questions and Data Preparation 78
3.1 Introduction 78
3.2 Research questions 78
3.3 The corpus 80
3.4 The rating scheme 82
3.4.1 Selecting a rating scale 82
3.4.2 The revised rating scale 84
3.4.3 The evaluation of content 87
3.4.4 The weighting scheme 90
3.5 Rating 91
3.5.1 Rater selection 92
3.5.2 Rater training 92
3.5.3 The rating sessions 93
3.6 Score reliability 94
3.7 Summary 96
Chapter 4 Text Analysis and Statistical Analysis 97
4.1 Introduction 97
4.2 Tools 97
4.3 Essay feature extraction 99
4.3.1 Language features 100
4.3.2 Content features 103
4.3.3 Organizational features 110
4.4 Data analysis 111
4.4.1 Correlation analysis 111
4.4.2 Multiple regression analysis 112
4.4.3 Stages of data analysis 113
4.5 Summary 117
Part Four Results and Discussion 119
Chapter 5 Identifying Predictors of EFL Writing Quality 120
5.1 Introduction 120
5.2 Linguistic features and writing quality 120
5.2.1 Fluency and writing quality 123
5.2.2 Complexity of language and writing quality 126
5.2.3 Measures of linguistic idiomaticity and appropriateness 138
5.3 Results of content analysis 144
5.3.1 Results of Latent Semantic Analysis 145
5.3.2 Procedural vocabulary and essay score 149
5.4 Essay organization and writing quality 151
5.4.1 Paragraphing and writing quality 152
5.4.2 Discourse conjuncts and writing quality 159
5.4.3 Demonstratives,pronouns,connective and writing quality 159
5.5 Power of the predictors proposed in this study 159
5.6 Summary 161
Chapter 6 A Statistical Model for Computer-Assisted Essay Scoring 164
6.1 Introduction 164
6.2 Diagnosing the preliminary model 165
6.3 The refined model 168
6.4 Predictors and aspects of writing quality measured 172
6.4.1 Predictors in the language module 173
6.4.2 Predictors in the content module 178
6.4.3 Predictors in the organization module 181
6.4.4 Interdependence of the modules 183
6.5 Implementing the model 185
6.6 Summary 187
Chapter 7 Validating the Model 188
7.1 Introduction 188
7.2 Cross-validating the model 188
7.3 Reliability of computer scores in cross-validation 191
7.3.1 Aspects of reliability 191
7.3.2 Consistency estimates 193
7.3.3 Consensus estimates 195
7.4 Double cross-validation 198
7.4.1 Constructing the model 198
7.4.2 Model statistics and estimated equation 199
7.5 Reliability of computer scores in double cross-validation 201
7.6 Comparison with existing essay scoring systems 204
7.6.1 Comparison with PEG 205
7.6.2 Comparison with IEA 208
7.6.3 Comparison with E-rater 212
7.7 Summary 214
Part Five Conclusion 215
Chapter 8 Conclusion 216
8.1 Major findings 216
8.1.1 A model for the computer-assisted scoring of EFL essays 216
8.1.2 Predictors of EFL writing quality 220
8.2 Limitations of the study 223
8.2 Future work 224
References 226
Appendices 249
Appendix Ⅰ PEG's proxes and their beta values(Page 1968) 249
Appendix Ⅱ Page's(1995)model and variables 251
Appendix Ⅲ Argument weight 253
Appendix Ⅳ Examples of good openings and endings 255
Appendix Ⅴ Scoring table(Organization & Content) 256
Appendix Ⅵ Scoring table(Language) 257
Appendix Ⅶ List of stopwords 258
Appendix Ⅷ Lemma list(excerpt) 262
Appendix Ⅸ List of content words 266
Appendix Ⅹ Sample essays 283
Appendix Ⅺ POS-tagged samples 286
Chapter 1
Table 1.1 Comparison of strengths and weaknesses of existing essay scoring systems 26
Table 1.2 Approaches and measured constructs 28
Chapter 2
Table 2.1 Measures of writing quality in previous studies 58
Chapter 3
Table 3.1 Comparison of holistic and analytic scales(from Weigle 2002) 83
Table 3.2 Jacobs et al.'s(1981)scale:Aspects of quality and their emphasis 85
Table 3.3 Modified scheme:Aspects of writing quality 86
Table 3.4 Aspects of writing quality and their emphasis in the revised scale 91
Table 3.5 Inter-rater correlations(Training set) 95
Table 3.6 Mean and standard deviation of scores (Training set) 95
Table 3.7 Inter-rater correlations(Validation set) 95
Table 3.8 Mean and standard deviation of scores(Validation set) 95
Chapter 4
Table 4.1 Directly extracted language features 100
Table 4.2 Computed language features 100
Chapter 5
Table 5.1 Measures of the quality of language explored 122
Table 5.2 Correlations between fluency measures and essay scores 123
Table 5.3 Correlations between general lexical features and essay scores 127
Table 5.4 Correlations between TTR,Index of Guiraud and essay scores 130
Table 5.5 Correlations between the number of words in VFP lists and essay scores 131
Table 5.6 Correlation between uncommon-common word ratio and essay scores 134
Table 5.7 Correlations between measures of syntactic complexity and essay scores 135
Table 5.8 Examples of recurrent word combinations 140
Table 5.9 Correlation between the number of RWCs and essay scores 140
Table 5.10 Correlations between the use of prepositions,the use of the definite article and essay scores 143
Table 5.11 Correlations between standard SVD measures,revised SVD measures and essay scores 146
Table 5.12 Correlation between the number of PV items and essay scores 149
Table 5.13 Correlation between paragraphing and essay scores 154
Table 5.14 Categories of discourse conjuncts 156
Table 5.15 Correlation between discourse conjuncts and essay scores 157
Table 5.16 Power of the predictors proposed in this study 160
Table 5.17 Variables and aspects of writing quality they measure 161
Chapter 6
Table 6.1 Summary for the preliminary model 165
Table 6.2 Problematic variables in the model 167
Table 6.3 Predicting power of the model 168
Table 6.4 Predictors and their beta weights 170
Table 6.5 Predictors in the language module 173
Table 6.6 Predicting power of the language module 176
Table 6.7 Coefficients of predictors in the language module 177
Table 6.8 Predictors in the content module 178
Table 6.9 Predicting power of the content module 180
Table 6.10 Coefficients of predictors in the content module 180
Table 6.11 Predictors in the organization module 181
Table 6.12 Predicting power of the organization module 182
Table 6.13 Coefficients of predictors in the organization module 183
Table 6.14 The unique power of the content module 183
Table 6.15 The unique power of the organization module 184
Table 6.16 The unique power of the language module 185
Chapter 7
Table 7.1 Pearson correlations between human raters and the computer 194
Table 7.2 Cronbach's alpha coefficients 195
Table 7.3 Exact agreement between human raters and computer 196
Table 7.4 Exact-plus-adjacent agreement between human raters and computer 197
Table 7.5 A summary of reliability estimates 198
Table 7.6 Model summary(double cross-validation) 199
Table 7.7 Regression coefficients(double cross-validation) 200
Table 7.8 Consistency coefficients of reliability(double cross-validation) 202
Table 7.9 Consensus estimates of reliability(double cross-validation) 203
Table 7.10 Pearson correlations and exact-plus-adjacent agreement 205
Table 7.11 Reliability of PEG's first experiment(from Page 2003) 205
Table 7.12 Reliability of PEG's NAEP experiment(Page 1994) 206
Table 7.13 Reliability of PEG's Praxis experiment(from Page 2003) 207
Table 7.14 Major experiments with PEG 207
Table 7.15 Representative experiments with LSA 210
Table 7.16 E-rater's mean agreement with human raters(Burstein et al. 1998a) 212
Table 7.17 E-rater's reliability reported in Burstein et al.(2001) 213
Chapter 8
Table 8.1 A list of reconfirmed predictors 221
Table 8.2 Revised predictors and their correlations with essay quality 222
Chapter 1
Figure 1.1 Modularity in the computer-assisted essay scoring model 30
Chapter 2
Figure 2.1 Relationship between the No. of types and the No. of tokens in a text 44
Figure 2.2 A conceptual model for the computer-assisted scoring of EFL essays 62
Chapter 4
Figure 4.1 Term-by-document matrix 105
Figure 4.2 Weighted term-by-document matrix 106
Figure 4.3 Singular Value Decomposition(SVD) 107
Figure 4.4 Matrix reconstruction 107
Figure 4.5 Reconstructed matrix 108
Figure 4.6 The reference in the revised approach of LSA 109
Figure 4.7 Flow chart of the model-training stage 113
Figure 4.8 Flow chart of the cross-validation phase 114
Figure 4.9 Flow chart of the double cross-validation phase 116
Chapter 5
Figure 5.1 Relationship between the number of paragraphs and essay scores 152
Chapter 6
Figure 6.1 Relationship between the standardized predicted value and the dependent variable 169
Figure 6.2 Estimated Equation 1 172
Figure 6.3 Implementing the model 187
Chapter 7
Figure 7.1 Computing essay scores 190
Figure 7.2 Variables and computer-generated scores 191
Figure 7.3 Estimated equation 2(double cross-validation) 201