1 Introduction 1
1.1 Sequence similarity,homology,and alignment 2
1.2 Overview of the book 2
1.3 Probabilities and probabilistic models 4
Contents 9
Preface 9
1.4 Further reading 10
2 Pairwise alignment 12
2.1 Introduction 12
2.2 The scoring model 13
2.3 Alignment algorithms 17
2.4 Dynamic programming with more complexmodels 28
2.5 Heuristic alignment algorithms 32
2.6 Linear space alignments 34
2.7 Significance of scores 36
2.8 Deriving score parameters from alignment data 41
2.9 Further reading 45
3 Markov chains and hidden Markov models 46
3.1 Markov chains 48
3.2 Hidden Markov models 51
3.3 Parameter estimation for HMMs 62
3.4 HMM model structure 68
3.5 More complex Markov chains 72
3.6 Numerical stability of HMM algorithms 77
3.7 Further reading 79
4 Pairwise alignment using HMMs 80
4.1 Pair HMMs 81
4.2 The full probability of x and y,summing over all paths 87
4.3 Suboptimal alignment 89
4.4 The posterior probability that xi is aligned to yj 91
4.5 Pair HMMs versus FSAs for searching 95
4.6 Further reading 98
5 Profile HMMs for sequence families 100
5.1 Ungapped score matrices 102
5.2 Adding insert and delete states to obtain profile HMMs 102
5.3 Deriving profile HMMs from multiple alignments 105
5.4 Searching with profile HMMs 108
5.5 Profile HMM variants for non-global alignments 113
5.6 More on estimation of probabilities 115
5.7 Optimal model construction 122
5.8 Weighting training sequences 124
5.9 Further reading 132
6 Multiple sequence alignment methods 134
6.1 What a multiple alignment means 135
6.2 Scoring amultiple alignment 137
6.3 Multidimensional dynamic programming 141
6.4 Progressive alignment methods 143
6.5 Multiple alignment byprofile HMM training 149
6.6 Further reading 159
7 Building phylogenetic trees 160
7.1 The tree of life 160
7.2 Background on trees 161
7.3 Making a tree from pairwise distances 165
7.4 Parsimony 173
7.5 Assessing the trees:the bootstrap 179
7.6 Simultaneous alignment and phylogeny 180
7.7 Further reading 188
7.8 Appendix:proof of neighbour-joining theorem 190
8 Probabilistic approaches to phylogeny 192
8.1 Introduction 192
8.2 Probabilistic models of evolution 193
8.3 Calculating the likelihood for ungapped alignments 197
8.4 Using the likelihood for inference 205
8.5 Towards more realistic evolutionary models 215
8.6 Comparison of probabilistic and non-probabilistic methods 224
8.7 Further reading 231
9 Transformational grammars 233
9.1 Transformational grammars 234
9.2 Regular grammars 237
9.3 Context-free grammars 242
9.4 Context-sensitive grammars 247
9.5 Stochastic grammars 250
9.6 Stochastic context-free grammars for sequence modelling 252
9.7 Further reading 259
10 RNA structure analysis 260
10.1 RNA 261
10.2 RNA secondary structure prediction 267
10.3 Covariance models:SCFG-based RNA profiles 277
10.4 Further reading 297
11 Background on probability 299
11.1 Probability distributions 299
11.2 Entropy 305
11.3 Inference 311
11.4 Sampling 314
11.5 Estimation of probabilities from counts 319
11.6 The EM algorithm 323
Bibliography 326
Author index 345
Subiect index 350