1 Introduction 1
1.1 What This Book Is About 1
1.1.1 Why Do Spoken Language Translation? 2
1.1.2 What Are the Basic Problems? 2
1.1.3 What Is It Realistic to Attempt Today? 4
1.1.4 What Have We Achieved? 5
1.2 Overall System Architecture 6
1.3 An Illustrative Example 9
1.4 In Defence of Hand-Coded Grammars 12
1.5 Hybrid Transfer 16
1.5.1 The Need for Grammatical Knowledge 16
1.5.2 The Need for Preferences 17
1.6 Speech Processing 20
1.7 Corpora 21
Part Ⅰ Language Processing and Corpora 25
2 Translation Using the Core Language Engine 25
2.1 Introduction:Multi-Engine Translation 25
2.2 Word-to-Word Translation 26
2.3 Quasi Logical Form 27
2.3.1 Introduction 27
2.3.2 Structure of QLF 28
2.3.3 QLF as a Transfer Formalism:Examples 32
2.3.4 Head-Head Relations in QLF 33
2.4 Unification Grammar and QLFs 35
2.4.1 The CLE Unification Grammar Formalism 35
2.4.2 Unification Grammar Example:French Noun Phrases 37
2.4.3 Example 2a:Clauses in Swedish 41
2.4.4 Example 2b:Relative Clauses in Swedish 42
2.5 Orthographic Analysis and the Lexicon 45
2.6 Transfer Rules 48
2.6.1 Pre-and Posttransfer 50
2.7 The QLF-Based Processing Path 51
2.7.1 Linguistic Analysis 51
2.7.2 Transfer and Transfer Preferences 53
2.7.3 Generation 55
2.8 Summary 55
3 Grammar Specialisation 57
3.1 Introduction 57
3.2 Explanation-Based Learning for Grammar Specialisation 58
3.2.1 A Definition of Explanation-Based Learning 58
3.2.2 Explanation-Based Learning on Unification Grammars 61
3.2.3 Category Specialisation 62
3.2.4 Elaborate Cutting-Up Criteria 65
3.3 An LR Parsing Method for Specialised Grammars 67
3.3.1 Basic LR Parsing 67
3.3.2 Prefix Merging 67
3.3.3 Abstraction 68
3.4 Empirical Results 69
3.4.1 Experimental Setup 69
3.4.2 Discussion of Results 71
3.5 Conclusions 77
4 Choosing among Interpretations 78
4.1 Properties and Discriminants 78
4.2 Constituent Pruning 82
4.2.1 Discriminants for Pruning 82
4.2.2 Deciding Which Edges to Prune 85
4.2.3 Probability Estimates for Discriminants 85
4.2.4 Relation to Other Pruning Methods 89
4.3 Choosing among QLF Analyses 90
4.3.1 Analysis Choice:An Example 90
4.3.2 Further Advantages of a Discriminant Scheme 91
4.3.3 Numerical Metrics 92
4.4 Choosing among Transferred QLFs 94
4.5 Choosing Paths in the Chart 95
5 The TreeBanker 98
5.1 Motivation 98
5.2 Representational Issues 99
5.3 Overview of the TreeBanker 100
5.4 The Supervised Training Process 100
5.4.1 Properties and Discriminants in Training 101
5.4.2 Additional Functionality 105
5.5 Training for Transfer Choice 106
5.6 Evaluation and Conclusions 108
6 Acquisition of Lexical Entries 110
6.1 The Lexical Acquisition Tool,LexMake 110
6.2 Acquiring Word-to-Word Transfer Rules 114
6.3 Evaluation and Conclusions 115
7 Spelling and Morphology 116
7.1 Introduction 116
7.2 The Description Language 118
7.2.1 Morphophonology 118
7.2.2 Word Formation and Interfacing to Syntax 120
7.3 Compilation 121
7.3.1 Compiling Spelling Patterns 121
7.3.2 Representing Lexical Roots 122
7.3.3 Applying Obligatory Rules 123
7.3.4 Interword Rules 124
7.3.5 Timings 124
7.4 Some Examples 125
7.4.1 Multiple-Letter Spelling Changes 125
7.4.2 Using Features to Control Rule Application 126
7.4.3 Interword Spelling Changes 127
7.5 Debugging the Rules 128
7.6 Conclusions and Further Work 130
8 Corpora and Data Collection 131
8.1 Rationale and Requirements 131
8.2 Simulation Methodology 133
8.2.1 Wizard-of-Oz Simulations 133
8.2.2 American ATIS Simulations 133
8.2.3 Swedish ATIS Simulations 134
8.3 Translations of American WOZ Material 135
8.3.1 Translations:A First Step 135
8.3.2 Email Corpus 136
8.4 A Comparison of the Corpora 137
8.5 Concluding Remarks on Corpus Collection 139
8.6 Representative Corpora and Rational Development 141
Part Ⅱ Linguistic Coverage 147
9 English Coverage 147
9.1 Overview of English Linguistic Coverage 147
9.2 Lexical Items 148
9.3 The English Grammar 148
9.3.1 Noun Phrases 148
9.3.2 Nonrecursive NPs 149
9.3.3 Recursive NPs 153
9.3.4 Prepositional Phrases 156
9.3.5 Numbers 157
9.3.6 Verb Phrases 157
9.3.7 Clauses and Top-Level Utterances 161
9.4 Coverage Failures 164
9.5 Comparison with French and Swedish Grammars 166
10 French Coverage 168
10.1 Introduction 168
10.2 Question Formation 169
10.2.1 Constraints on Question Formation 169
10.2.2 Implementation of the Rules for Question Formation 174
10.2.3 Empirical Evaluation Using a Multidimensional Test Suite 175
10.3 Clitics 176
10.4 Agreement 178
10.5 Conclusions 179
11 Swedish Coverage 180
11.1 Introduction 180
11.2 Clausal Constructions 181
11.2.1 Inverted Word Order and Verb-Second Phenomena 181
11.2.2 Adverbs and Negation 183
11.2.3 Other Clausal Constructions 185
11.3 Verbs and Verbal Constructions 186
11.4 NP Constructions 188
11.4.1 Compound Nominals 190
12 Transfer Coverage 192
12.1 Introduction 192
12.2 Statistical Breakdown of Rule Types 193
12.3 Overview of the Rules 194
12.3.1 Identity Rules 194
12.3.2 Lexical Rules Translating Atoms into Atoms 195
12.3.3 Lexical Rules Translating Nonatomic Fixed Structures 196
12.3.4 Date,Time,and Code Expressions 197
12.3.5 Nominals 198
12.3.6 Verbs 199
12.3.7 Adjectives 201
12.3.8 Prepositional Phrases 201
12.3.9 Tense,Aspect,Mood,and Voice 202
12.3.10 Determiners and Pronouns 203
12.3.11 Conjunction 205
12.4 Adequacy of the Transfer Formalism 206
12.4.1 Expressiveness of the Rule Formalism 207
12.4.2 Formal Properties 210
12.5 Summary 211
13 Rational Reuse of Linguistic Data 212
13.1 Introduction 212
13.2 Porting Grammars and Lexica among Closely Related Languages 213
13.3 Transfer Composition 216
13.3.1 Introduction 216
13.3.2 Transfer Composition as a Program Transformation 217
13.3.3 Procedural Realisation of Transfer Rule Composition 220
13.3.4 Composing Transfer Preferences 222
13.3.5 Improving Automatically Composed Rule Sets 222
13.4 Experiments 223
13.4.1 Swedish→English→French 224
13.4.2 English→Swedish→Danish 226
13.5 Evaluation 227
13.6 Conclusions 228
Part Ⅲ Speech Processing 231
14 Speech Recognition 231
14.1 Speech Recognition Based on Statistical Methods 231
14.2 Hidden Markov Models 232
14.2.1 Definition 232
14.2.2 Observation-Probability Computation 234
14.2.3 Estimation of the Hidden-State Sequence 236
14.2.4 Estimation of Model Parameters 237
14.3 The Speech Part of the Book 239
15 Acoustic Modelling 240
15.1 Introduction:Discrete or Continuous?That's the Question 240
15.2 Continuous-Density HMMs and Genones 241
15.3 Efficiency Issues 244
15.3.1 Baseline Experiments 244
15.3.2 Speed Optimisation 245
15.4 Discrete-Mixture HMMs 247
15.5 Conclusions 249
16 Language Modelling for Multilingual Speech Translation 250
16.1 Introduction 250
16.2 Fabricating Domain-Specific Data 251
16.3 Better Use of Domain-General Data 253
16.4 Unsupervised Language-Model Adaptation 255
16.5 Phrase-Based Language Models 256
16.6 Multilingual Language Modelling 262
16.7 Conclusions 263
17 Porting a Recogniser to a New Language 265
17.1 The Swedish Speech Corpus 265
17.1.1 Read-Text Corpora 265
17.1.2 WOZ Corpora 266
17.2 The Swedish Lexicon 267
17.2.1 Phone Set 267
17.2.2 Phonetic Transcription 268
17.2.3 Lexicon Statistics 270
17.3 Acoustic Models 270
17.3.1 SLT 2 Models 270
17.3.2 SLT-3 Models 272
17.4 Conclusions 273
18 Multiple Dialects and Languages 274
18.1 Introduction 274
18.2 Dialect Adaptation 274
18.2.1 Dialect Adaptation Methods 275
18.2.2 Experimental Results 277
18.3 The Multilingual Speech-Recognition System 280
18.3.1 Multilingual Recognition Experiments 281
18.3.2 Language Identification 282
18.4 Conclusions 283
19 Common Speech/Language Issues 284
19.1 The Speech/Language Interface 284
19.2 Split verses Unsplit Compounds in Speech Understanding 285
19.2.1 Introduction 285
19.2.2 Speech Recognition Experiments 286
19.2.3 Conclusions 289
19.3 Prosody Translation 290
19.3.1 Detection of Focal Accent 290
19.3.2 Prosody Transfer 293
Part Ⅳ Evaluation and Conclusions 297
20 Evaluation 297
20.1 Methodological Issues 297
20.2 Evaluation of Speech-to-Text Translation 298
20.3 Evaluation of Speech-to-Speech Translation 299
20.4 Speech-to-Text Evaluation Results 302
20.5 Pipeline Synergy 309
21 Conclusions 313
A Appendix:The Mathematics of Discriminant Scores 315
B Appendix:Notation for QLF-Based Processing 318
B.1 QLFs 318
B.2 Grammar Rules 320
B.3 Lexicon 322
References 323
Index 333