《文本挖掘 英文》PDF下载

  • 购买积分:14 如何计算积分?
  • 作  者:(以)RonenFeldman,(美)JamesSanger著
  • 出 版 社:北京:人民邮电出版社
  • 出版年份:2009
  • ISBN:9787115205353
  • 页数:410 页
图书介绍:本书是文本挖掘的导论。书中涵盖了核心文本挖掘操作、文本挖掘预处理技术、分类、聚类、信息提取、信息提取的概率模型、预处理应用、可视化方法、链接分析、文本挖掘应用等内容,很好地结合了文本挖掘的理论和实践。

Ⅰ.Introduction to Text Mining 1

Ⅰ.1 Defining Text Mining 1

Ⅰ.2 General Architecture of Text Mining Systems 13

Ⅱ.Core Text Mining Operations 19

Ⅱ.1 Core Text Mining Operations 19

Ⅱ.2 Using Background Knowledge for Text Mining 41

Ⅱ.3 Text Mining Query Languages 51

Ⅲ.Text Mining Preprocessing Techniques 57

Ⅲ.1 Task-Oriented Approaches 58

Ⅲ.2 Further Reading 62

Ⅳ.Categorization 64

Ⅳ.1 Applications of Text Categorization 65

Ⅳ.2 Definition of the Problem 66

Ⅳ.3 Document Representation 68

Ⅳ.4 Knowledge Engineering Approach to TC 70

Ⅳ.5 Machine Learning Approach to TC 70

Ⅳ.6 Using Unlabeled Data to Improve Classification 78

Ⅳ.7 Evaluation of Text Classifiers 79

Ⅳ.8 Citations and Notes 80

Ⅴ.Clustering 82

Ⅴ.1 Clustering Tasks in Text Analysis 82

Ⅴ.2 The General Clustering Problem 84

Ⅴ.3 Clustering Algorithms 85

Ⅴ.4 Clustering of Textual Data 88

Ⅴ.5 Citations and Notes 92

Ⅵ.Information Extraction 94

Ⅵ.1 Introduction to Information Extraction 94

Ⅵ.2 Historical Evolution of IE:The Message Understanding Conferences and Tipster 96

Ⅵ.3 IE Examples 101

Ⅵ.4 Architecture of IE Systems 104

Ⅵ.5 Anaphora Resolution 109

Ⅵ.6 Inductive Algorithms for IE 119

Ⅵ.7 Structural IE 122

Ⅵ.8 Further Reading 129

Ⅶ.Probabilistic Models for Information Extraction 131

Ⅶ.1 Hidden Markov Models 131

Ⅶ.2 Stochastic Context-Free Grammars 137

Ⅶ.3 Maximal Entropy Modeling 138

Ⅶ.4 Maximal Entropy Markov Models 140

Ⅶ.5 Conditional Random Fields 142

Ⅶ.6 Further Reading 145

Ⅷ.Preprocessing Applications Using Probabilistic and Hybrid Approaches 146

Ⅷ.1 Applications of HMM to Textual Analysis 146

Ⅷ.2 Using MEMM for Information Extraction 152

Ⅷ.3 Applications of CRFs to Textual Analysis 153

Ⅷ.4 TEG:Using SCFG Rules for Hybrid Statistical-Knowledge-Based IE 155

Ⅷ.5 Bootstrapping 166

Ⅷ.6 Further Reading 175

Ⅸ.Presentation-Layer Considerations for Browsing and Query Refinement 177

Ⅸ.1 Browsing 177

Ⅸ.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer 185

Ⅸ.3 Accessing the Underlying Query Language 186

Ⅸ.4 Citations and Notes 187

Ⅹ.Visualization Approaches 189

Ⅹ.1 Introduction 189

Ⅹ.2 Architectural Considerations 192

Ⅹ.3 Common Visualization Approaches for Text Mining 194

Ⅹ.4 Visualization Techniques in Link Analysis 225

Ⅹ.5 Real-World Example:The Document Explorer System 235

Ⅺ.Link Analysis 242

Ⅺ.1 Preliminaries 242

Ⅺ.2 Automatic Layout of Networks 244

Ⅺ.3 Paths and Cycles in Graphs 248

Ⅺ.4 Centrality 249

Ⅺ.5 Partitioning of Networks 257

Ⅺ.6 Pattern Matching in Networks 270

Ⅺ.7 Software Packages for Link Analysis 271

Ⅺ.8 Citations and Notes 272

Ⅻ.Text Mining Applications 273

Ⅻ.1 General Considerations 274

Ⅻ.2 Corporate Finance:Mining Industry Literature for Business Intelligence 279

Ⅻ.3 A "Horizontal" Text Mining Application:Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 295

Ⅻ.4 Life Sciences Research:Mining Biological Pathway Information with Gene Ways 307

Appendix A:DIAL:A Dedicated Information Extraction Language for Text Mining 315

A.1 What Is the DIAL Language? 315

A.2 Information Extraction in the DIAL Environment 316

A.3 Text Tokenization 318

A.4 Concept and Rule Structure 318

A.5 Pattern Matching 320

A.6 Pattern Elements 321

A.7 Rule Constraints 325

A.8 Concept Guards 326

A.9 Complete DIAL Examples 327

Bibliography 335

Index 389