《口语机器翻译》PDF下载

  • 购买积分:12 如何计算积分?
  • 作  者:MannyRayner,DavidCarter等著
  • 出 版 社:北京市:北京大学出版社
  • 出版年份:2010
  • ISBN:9787301171561
  • 页数:337 页
图书介绍:口语翻译(Spoken Language Translation, SLT )是指让计算机实现从一种语言的语音到另一种语言的语音自动翻译的过程。其理想目标是,让计算机像人一样充当持不同语言的说话人之间的翻译角色。会议演讲、交谈(通过电话、网络或面对面)、广播等场景下的话语翻译都是口语翻译应用的重要领域。由于多数情况下说话人的话语都以口语风格为主,人们尤其希望翻译系统可以接受并实现任意口语化的、自由交谈式的对话语音直接翻译。本书全面、系统地介绍了SLT项目研究的主要成果,内容包括语言处理与语料收集、语言覆盖性、语音处理和系统评估等各个方面,其中语言处理部分是本书的主要内容。

1 Introduction 1

1.1 What This Book Is About 1

1.1.1 Why Do Spoken Language Translation? 2

1.1.2 What Are the Basic Problems? 2

1.1.3 What Is It Realistic to Attempt Today? 4

1.1.4 What Have We Achieved? 5

1.2 Overall System Architecture 6

1.3 An Illustrative Example 9

1.4 In Defence of Hand-Coded Grammars 12

1.5 Hybrid Transfer 16

1.5.1 The Need for Grammatical Knowledge 16

1.5.2 The Need for Preferences 17

1.6 Speech Processing 20

1.7 Corpora 21

Part Ⅰ Language Processing and Corpora 25

2 Translation Using the Core Language Engine 25

2.1 Introduction:Multi-Engine Translation 25

2.2 Word-to-Word Translation 26

2.3 Quasi Logical Form 27

2.3.1 Introduction 27

2.3.2 Structure of QLF 28

2.3.3 QLF as a Transfer Formalism:Examples 32

2.3.4 Head-Head Relations in QLF 33

2.4 Unification Grammar and QLFs 35

2.4.1 The CLE Unification Grammar Formalism 35

2.4.2 Unification Grammar Example:French Noun Phrases 37

2.4.3 Example 2a:Clauses in Swedish 41

2.4.4 Example 2b:Relative Clauses in Swedish 42

2.5 Orthographic Analysis and the Lexicon 45

2.6 Transfer Rules 48

2.6.1 Pre-and Posttransfer 50

2.7 The QLF-Based Processing Path 51

2.7.1 Linguistic Analysis 51

2.7.2 Transfer and Transfer Preferences 53

2.7.3 Generation 55

2.8 Summary 55

3 Grammar Specialisation 57

3.1 Introduction 57

3.2 Explanation-Based Learning for Grammar Specialisation 58

3.2.1 A Definition of Explanation-Based Learning 58

3.2.2 Explanation-Based Learning on Unification Grammars 61

3.2.3 Category Specialisation 62

3.2.4 Elaborate Cutting-Up Criteria 65

3.3 An LR Parsing Method for Specialised Grammars 67

3.3.1 Basic LR Parsing 67

3.3.2 Prefix Merging 67

3.3.3 Abstraction 68

3.4 Empirical Results 69

3.4.1 Experimental Setup 69

3.4.2 Discussion of Results 71

3.5 Conclusions 77

4 Choosing among Interpretations 78

4.1 Properties and Discriminants 78

4.2 Constituent Pruning 82

4.2.1 Discriminants for Pruning 82

4.2.2 Deciding Which Edges to Prune 85

4.2.3 Probability Estimates for Discriminants 85

4.2.4 Relation to Other Pruning Methods 89

4.3 Choosing among QLF Analyses 90

4.3.1 Analysis Choice:An Example 90

4.3.2 Further Advantages of a Discriminant Scheme 91

4.3.3 Numerical Metrics 92

4.4 Choosing among Transferred QLFs 94

4.5 Choosing Paths in the Chart 95

5 The TreeBanker 98

5.1 Motivation 98

5.2 Representational Issues 99

5.3 Overview of the TreeBanker 100

5.4 The Supervised Training Process 100

5.4.1 Properties and Discriminants in Training 101

5.4.2 Additional Functionality 105

5.5 Training for Transfer Choice 106

5.6 Evaluation and Conclusions 108

6 Acquisition of Lexical Entries 110

6.1 The Lexical Acquisition Tool,LexMake 110

6.2 Acquiring Word-to-Word Transfer Rules 114

6.3 Evaluation and Conclusions 115

7 Spelling and Morphology 116

7.1 Introduction 116

7.2 The Description Language 118

7.2.1 Morphophonology 118

7.2.2 Word Formation and Interfacing to Syntax 120

7.3 Compilation 121

7.3.1 Compiling Spelling Patterns 121

7.3.2 Representing Lexical Roots 122

7.3.3 Applying Obligatory Rules 123

7.3.4 Interword Rules 124

7.3.5 Timings 124

7.4 Some Examples 125

7.4.1 Multiple-Letter Spelling Changes 125

7.4.2 Using Features to Control Rule Application 126

7.4.3 Interword Spelling Changes 127

7.5 Debugging the Rules 128

7.6 Conclusions and Further Work 130

8 Corpora and Data Collection 131

8.1 Rationale and Requirements 131

8.2 Simulation Methodology 133

8.2.1 Wizard-of-Oz Simulations 133

8.2.2 American ATIS Simulations 133

8.2.3 Swedish ATIS Simulations 134

8.3 Translations of American WOZ Material 135

8.3.1 Translations:A First Step 135

8.3.2 Email Corpus 136

8.4 A Comparison of the Corpora 137

8.5 Concluding Remarks on Corpus Collection 139

8.6 Representative Corpora and Rational Development 141

Part Ⅱ Linguistic Coverage 147

9 English Coverage 147

9.1 Overview of English Linguistic Coverage 147

9.2 Lexical Items 148

9.3 The English Grammar 148

9.3.1 Noun Phrases 148

9.3.2 Nonrecursive NPs 149

9.3.3 Recursive NPs 153

9.3.4 Prepositional Phrases 156

9.3.5 Numbers 157

9.3.6 Verb Phrases 157

9.3.7 Clauses and Top-Level Utterances 161

9.4 Coverage Failures 164

9.5 Comparison with French and Swedish Grammars 166

10 French Coverage 168

10.1 Introduction 168

10.2 Question Formation 169

10.2.1 Constraints on Question Formation 169

10.2.2 Implementation of the Rules for Question Formation 174

10.2.3 Empirical Evaluation Using a Multidimensional Test Suite 175

10.3 Clitics 176

10.4 Agreement 178

10.5 Conclusions 179

11 Swedish Coverage 180

11.1 Introduction 180

11.2 Clausal Constructions 181

11.2.1 Inverted Word Order and Verb-Second Phenomena 181

11.2.2 Adverbs and Negation 183

11.2.3 Other Clausal Constructions 185

11.3 Verbs and Verbal Constructions 186

11.4 NP Constructions 188

11.4.1 Compound Nominals 190

12 Transfer Coverage 192

12.1 Introduction 192

12.2 Statistical Breakdown of Rule Types 193

12.3 Overview of the Rules 194

12.3.1 Identity Rules 194

12.3.2 Lexical Rules Translating Atoms into Atoms 195

12.3.3 Lexical Rules Translating Nonatomic Fixed Structures 196

12.3.4 Date,Time,and Code Expressions 197

12.3.5 Nominals 198

12.3.6 Verbs 199

12.3.7 Adjectives 201

12.3.8 Prepositional Phrases 201

12.3.9 Tense,Aspect,Mood,and Voice 202

12.3.10 Determiners and Pronouns 203

12.3.11 Conjunction 205

12.4 Adequacy of the Transfer Formalism 206

12.4.1 Expressiveness of the Rule Formalism 207

12.4.2 Formal Properties 210

12.5 Summary 211

13 Rational Reuse of Linguistic Data 212

13.1 Introduction 212

13.2 Porting Grammars and Lexica among Closely Related Languages 213

13.3 Transfer Composition 216

13.3.1 Introduction 216

13.3.2 Transfer Composition as a Program Transformation 217

13.3.3 Procedural Realisation of Transfer Rule Composition 220

13.3.4 Composing Transfer Preferences 222

13.3.5 Improving Automatically Composed Rule Sets 222

13.4 Experiments 223

13.4.1 Swedish→English→French 224

13.4.2 English→Swedish→Danish 226

13.5 Evaluation 227

13.6 Conclusions 228

Part Ⅲ Speech Processing 231

14 Speech Recognition 231

14.1 Speech Recognition Based on Statistical Methods 231

14.2 Hidden Markov Models 232

14.2.1 Definition 232

14.2.2 Observation-Probability Computation 234

14.2.3 Estimation of the Hidden-State Sequence 236

14.2.4 Estimation of Model Parameters 237

14.3 The Speech Part of the Book 239

15 Acoustic Modelling 240

15.1 Introduction:Discrete or Continuous?That's the Question 240

15.2 Continuous-Density HMMs and Genones 241

15.3 Efficiency Issues 244

15.3.1 Baseline Experiments 244

15.3.2 Speed Optimisation 245

15.4 Discrete-Mixture HMMs 247

15.5 Conclusions 249

16 Language Modelling for Multilingual Speech Translation 250

16.1 Introduction 250

16.2 Fabricating Domain-Specific Data 251

16.3 Better Use of Domain-General Data 253

16.4 Unsupervised Language-Model Adaptation 255

16.5 Phrase-Based Language Models 256

16.6 Multilingual Language Modelling 262

16.7 Conclusions 263

17 Porting a Recogniser to a New Language 265

17.1 The Swedish Speech Corpus 265

17.1.1 Read-Text Corpora 265

17.1.2 WOZ Corpora 266

17.2 The Swedish Lexicon 267

17.2.1 Phone Set 267

17.2.2 Phonetic Transcription 268

17.2.3 Lexicon Statistics 270

17.3 Acoustic Models 270

17.3.1 SLT 2 Models 270

17.3.2 SLT-3 Models 272

17.4 Conclusions 273

18 Multiple Dialects and Languages 274

18.1 Introduction 274

18.2 Dialect Adaptation 274

18.2.1 Dialect Adaptation Methods 275

18.2.2 Experimental Results 277

18.3 The Multilingual Speech-Recognition System 280

18.3.1 Multilingual Recognition Experiments 281

18.3.2 Language Identification 282

18.4 Conclusions 283

19 Common Speech/Language Issues 284

19.1 The Speech/Language Interface 284

19.2 Split verses Unsplit Compounds in Speech Understanding 285

19.2.1 Introduction 285

19.2.2 Speech Recognition Experiments 286

19.2.3 Conclusions 289

19.3 Prosody Translation 290

19.3.1 Detection of Focal Accent 290

19.3.2 Prosody Transfer 293

Part Ⅳ Evaluation and Conclusions 297

20 Evaluation 297

20.1 Methodological Issues 297

20.2 Evaluation of Speech-to-Text Translation 298

20.3 Evaluation of Speech-to-Speech Translation 299

20.4 Speech-to-Text Evaluation Results 302

20.5 Pipeline Synergy 309

21 Conclusions 313

A Appendix:The Mathematics of Discriminant Scores 315

B Appendix:Notation for QLF-Based Processing 318

B.1 QLFs 318

B.2 Grammar Rules 320

B.3 Lexicon 322

References 323

Index 333