《信息检索导论 英文版》PDF下载

  • 购买积分:15 如何计算积分?
  • 作  者:(美)ChristopherD.Manning,PrabhakarRaghavan,(德)HinrichSchütze著
  • 出 版 社:北京:人民邮电出版社
  • 出版年份:2010
  • ISBN:9787115218247
  • 页数:482 页
图书介绍:本书是信息检索的教材,旨在从计算机科学的视角提供一种现代的信息检索方法。书中从基本概念讲解网络搜索以及文本分类和文本聚类等,对收集、索引和搜索文档系统的设计和实现的方方面面、评估系统的方法、机器学习方法在文本收集中的应用等给出了最新的讲解。

Boolean retrieval 1 1

1 An example information retrieval problem 3 1

2 A first take at building an inverted index 6 1

3 Processing Boolean queries 9 1

4 The extended Boolean model versus ranked retrieval 13 1

5 References and further reading 16 1

The term vocabulary and postings lists 18 2

1 Document delineation and character sequence decoding 18 2

2 Determining the vocabulary of terms 21 2

3 Faster postings list intersection via skip pointers 33 2

4 Positional postings and phrase queries 36 2

5 References and further reading 43 2

Dictionaries and tolerant retrieval 45 3

1 Search structures for dictionaries 45 3

2 Wildcard queries 48 3

3 Spelling correction 52 3

4 Phonetic correction 58 3

5 References and further reading 59 3

Index construction 61 4

1 Hardware basics 62 4

2 Blocked sort-based indexing 63 4

3 Single-pass in-memory indexing 66 4

4 Distributed indexing 68 4

5 Dynamic indexing 71 4

6 Other types ofindexes 73 4

7 References and further reading 76 4

Index compression 78 5

1 Statistical properties of terms in information retrieval 79 5

2 Dictionary compression 82 5

3 Postings file compression 87 5

4 References and further reading 97 5

Scoring,term weighting,and the vector space model 100 6

1 Parametric and zone indexes 101 6

2 Term frequency and weighting 107 6

3 The vector space model for scoring 110 6

4 Variant tf-idf functions 116 6

5 References and further reading 122 6

Computing scores in a complete search system 124 7

1 Efficient scoring and ranking 124 7

2 Components of an information retrieval system 132 7

3 Vector space scoring and query operator interaction 136 7

4 References and further reading 137 7

Evaluation in information retrieval 139 8

1 Information retrieval system evaluation 140 8

2 Standard test collections 141 8

3 Evaluation of unranked retrieval sets 142 8

4 Evaluation of ranked retrieval results 145 8

5 Assessing relevance 151 8

6 A broader perspective:System quality and user utility 154 8

7 Results snippets 157 8

8 References and further reading 159 8

Relevance feedback and query expansion 162 9

1 Relevance feedback and pseudo relevance feedback 163 9

2 Global methods for query reformulation 173 9

3 References and further reading 177 9

XML retrieval 178 10

1 Basic XML concepts 180 10

2 Challenges in XML retrieval 183 10

3 A vector space model for XML retrieval 188 10

4 Evaluation of XML retrieval 192 10

5 Text-centric versus data-centric XML retrieval 196 10

6 References and further reading 198 10

Probabilistic information retrieval 201 11

1 Review of basic probability theory 202 11

2 The probability ranking principle 203 11

3 The binary independence model 204 11

4 An appraisal and some extensions 212 11

5 References and further reading 216 11

Language models for information retrieval 218 12

1 Language models 218 12

2 The query likelihood model 223 12

3 Language modeling versus other approaches in information retrieval 229 12

4 Extended language modeling approaches 230 12

5 References and further reading 232 12

Text classification and Naive Bayes 234 13

1 The text classification problem 237 13

2 Naive Bayes text classification 238 13

3 The Bernoulli model 243 13

4 Properties of Naive Bayes 245 13

5 Feature selection 251 13

6 Evaluation of text classification 258 13

7 References and further reading 264 13

Vector space classification 266 14

1 Document representations and measures of relatedness in vector spaces 267 14

2 Rocchio classification 269 14

3 k nearest neighbor 273 14

4 Linear versus nonlinear classifiers 277 14

5 Classification with more than two classes 281 14

6 The bias-variance tradeoff 284 14

7 References and further reading 291 14

Support vector machines and machine learning on documents 293 15

1 Support vector machines:The linearly separable case 294 15

2 Extensions to the support vector machine model 300 15

3 Issues in the classification of text documents 307 15

4 Machine-learning methods in ad hoc information retrieval 314 15

5 References and further reading 318 15

Flat clustering 321 16

1 Clustering in information retrieval 322 16

2 Problem statement 326 16

3 Evaluation of clustering 327 16

4 K-means 331 16

5 Model-based clustering 338 16

6 References and further reading 343 16

Hierarchical clustering 346 17

1 Hierarchical agglomerative clustering 347 17

2 Single-link and complete-link clustering 350 17

3 Group-average agglomerative clustering 356 17

4 Centroid clustering 358 17

5 Optimality of hierarchical agglomerative clustering 360 17

6 Divisive clustering 362 17

7 Cluster labeling 363 17

8 Implementation notes 365 17

9 References and further reading 367 17

Matrix decompositions and latent semantic indexing 369 18

1 Linear algebra review 369 18

2 Term-document matrices and singular value decompositions 373 18

3 Low-rank approximations 376 18

4 Latent semantic indexing 378 18

5 References and further reading 383 18

Web search basics 385 19

1 Background and history 385 19

2 Web characteristics 387 19

3 Advertising as the economic model 392 19

4 The search user experience 395 19

5 Index size and estimation 396 19

6 Near-duplicates and shingling 400 19

7 References and further reading 404 19

Web crawling and indexes 405 20

1 Overview 405 20

2 Crawling 406 20

3 Distributing indexes 415 20

4 Connectivity servers 416 20

5 References and further reading 419 20

Link analysis 421 21

1 TheWeb as agraph 422 21

2 PageRank 424 21

3 Hubs and authorities 433 21

4 References and further reading 439 21