Ⅰ.Introduction to Text Mining 1
Ⅰ.1 Defining Text Mining 1
Ⅰ.2 General Architecture of Text Mining Systems 13
Ⅱ.Core Text Mining Operations 19
Ⅱ.1 Core Text Mining Operations 19
Ⅱ.2 Using Background Knowledge for Text Mining 41
Ⅱ.3 Text Mining Query Languages 51
Ⅲ.Text Mining Preprocessing Techniques 57
Ⅲ.1 Task-Oriented Approaches 58
Ⅲ.2 Further Reading 62
Ⅳ.Categorization 64
Ⅳ.1 Applications of Text Categorization 65
Ⅳ.2 Definition of the Problem 66
Ⅳ.3 Document Representation 68
Ⅳ.4 Knowledge Engineering Approach to TC 70
Ⅳ.5 Machine Learning Approach to TC 70
Ⅳ.6 Using Unlabeled Data to Improve Classification 78
Ⅳ.7 Evaluation of Text Classifiers 79
Ⅳ.8 Citations and Notes 80
Ⅴ.Clustering 82
Ⅴ.1 Clustering Tasks in Text Analysis 82
Ⅴ.2 The General Clustering Problem 84
Ⅴ.3 Clustering Algorithms 85
Ⅴ.4 Clustering of Textual Data 88
Ⅴ.5 Citations and Notes 92
Ⅵ.Information Extraction 94
Ⅵ.1 Introduction to Information Extraction 94
Ⅵ.2 Historical Evolution of IE:The Message Understanding Conferences and Tipster 96
Ⅵ.3 IE Examples 101
Ⅵ.4 Architecture of IE Systems 104
Ⅵ.5 Anaphora Resolution 109
Ⅵ.6 Inductive Algorithms for IE 119
Ⅵ.7 Structural IE 122
Ⅵ.8 Further Reading 129
Ⅶ.Probabilistic Models for Information Extraction 131
Ⅶ.1 Hidden Markov Models 131
Ⅶ.2 Stochastic Context-Free Grammars 137
Ⅶ.3 Maximal Entropy Modeling 138
Ⅶ.4 Maximal Entropy Markov Models 140
Ⅶ.5 Conditional Random Fields 142
Ⅶ.6 Further Reading 145
Ⅷ.Preprocessing Applications Using Probabilistic and Hybrid Approaches 146
Ⅷ.1 Applications of HMM to Textual Analysis 146
Ⅷ.2 Using MEMM for Information Extraction 152
Ⅷ.3 Applications of CRFs to Textual Analysis 153
Ⅷ.4 TEG:Using SCFG Rules for Hybrid Statistical-Knowledge-Based IE 155
Ⅷ.5 Bootstrapping 166
Ⅷ.6 Further Reading 175
Ⅸ.Presentation-Layer Considerations for Browsing and Query Refinement 177
Ⅸ.1 Browsing 177
Ⅸ.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer 185
Ⅸ.3 Accessing the Underlying Query Language 186
Ⅸ.4 Citations and Notes 187
Ⅹ.Visualization Approaches 189
Ⅹ.1 Introduction 189
Ⅹ.2 Architectural Considerations 192
Ⅹ.3 Common Visualization Approaches for Text Mining 194
Ⅹ.4 Visualization Techniques in Link Analysis 225
Ⅹ.5 Real-World Example:The Document Explorer System 235
Ⅺ.Link Analysis 242
Ⅺ.1 Preliminaries 242
Ⅺ.2 Automatic Layout of Networks 244
Ⅺ.3 Paths and Cycles in Graphs 248
Ⅺ.4 Centrality 249
Ⅺ.5 Partitioning of Networks 257
Ⅺ.6 Pattern Matching in Networks 270
Ⅺ.7 Software Packages for Link Analysis 271
Ⅺ.8 Citations and Notes 272
Ⅻ.Text Mining Applications 273
Ⅻ.1 General Considerations 274
Ⅻ.2 Corporate Finance:Mining Industry Literature for Business Intelligence 279
Ⅻ.3 A "Horizontal" Text Mining Application:Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 295
Ⅻ.4 Life Sciences Research:Mining Biological Pathway Information with Gene Ways 307
Appendix A:DIAL:A Dedicated Information Extraction Language for Text Mining 315
A.1 What Is the DIAL Language? 315
A.2 Information Extraction in the DIAL Environment 316
A.3 Text Tokenization 318
A.4 Concept and Rule Structure 318
A.5 Pattern Matching 320
A.6 Pattern Elements 321
A.7 Rule Constraints 325
A.8 Concept Guards 326
A.9 Complete DIAL Examples 327
Bibliography 335
Index 389