Introduction | p. 1 |
The Genesis of Bioinformatics | p. 1 |
Bioinformatics Versus Other Disciplines | p. 2 |
Further Developments: from Linear Information to Multidimensional Structure Organization | p. 4 |
Mathematical and Computational Methods | p. 5 |
Why Mathematical Modeling? | p. 6 |
Fitting Models to Data | p. 7 |
Computer Software | p. 7 |
Applications | p. 8 |
Mathematical and Computational Methods | |
Probability and Statistics | p. 13 |
The Rules of Probability Calculus | p. 13 |
Independence, Conditional Probabilities and Bayes' Rules | p. 14 |
Random Variables | p. 15 |
Vector Random Variables | p. 16 |
Marginal Distributions | p. 17 |
Operations on Random Variables | p. 17 |
Notation | p. 19 |
Expectation and Moments of Random Variables | p. 19 |
Probability-Generating Functions and Characteristic Functions | p. 20 |
A Collection of Discrete and Continuous Distributions | p. 22 |
Bernoulli Trials and the Binomial Distribution | p. 22 |
The Geometric Distribution | p. 23 |
The Negative Binomial Distribution | p. 23 |
The Poisson Distribution | p. 24 |
The Multinomial Distribution | p. 25 |
The Hypergeometric Distribution | p. 25 |
The Normal (Gaussian) Distribution | p. 26 |
The Exponential Distribution | p. 26 |
The Gamma Distribution | p. 27 |
The Beta Distribution | p. 27 |
Likelihood maximization | p. 28 |
Binomial Distribution | p. 29 |
Multinomial distribution | p. 29 |
Poisson Distribution | p. 29 |
Geometric Distribution | p. 30 |
Normal Distribution | p. 30 |
Exponential Distribution | p. 31 |
Other Methods of Estimating Parameters: a Comparison | p. 31 |
Example 1. Uniform Distribution | p. 31 |
Example 2. Cauchy Distribution | p. 33 |
Minimum Variance Parameter Estimation | p. 35 |
The Expectation Maximization Method | p. 37 |
The Derivations of the Algorithm | p. 38 |
Examples of Recursive Estimation of Parameters by Using the EM Algorithm | p. 41 |
Statistical Tests | p. 45 |
The Idea | p. 45 |
Parametric Tests | p. 47 |
Nonparametric Tests | p. 48 |
Type I and II statistical errors | p. 49 |
Markov Chains | p. 49 |
Transition Probability Matrix and State Transition Graph | p. 50 |
Time Evolution of Probability Distributions of States | p. 51 |
Classification of States | p. 52 |
Ergodicity | p. 54 |
Stationary Distribution | p. 54 |
Reversible Markov Chains | p. 55 |
Time-Continuous Markov Chains | p. 56 |
Markov Chain Monte Carlo (MCMC) Methods | p. 57 |
Acceptance-Rejection Rule | p. 59 |
Applications of the Metropolis-Hastings Algorithm | p. 59 |
Simulated Annealing and MC3 | p. 59 |
Hidden Markov Models | p. 60 |
Probability of Occurrence of a Sequence of Symbols | p. 60 |
Backward Algorithm | p. 61 |
Forward Algorithm | p. 61 |
Viterbi Algorithm | p. 62 |
The Baum-Welch algorithm | p. 63 |
Exercises | p. 63 |
Computer Science Algorithms | p. 67 |
Algorithms | p. 67 |
Sorting and Quicksort | p. 68 |
Simple Sort | p. 69 |
Quicksort | p. 69 |
String Searches. Fast Search | p. 70 |
Easy Search | p. 71 |
Fast Search | p. 71 |
Index Structures for Strings. Search Tries. Suffix Trees | p. 73 |
A Treelike Structure in Computer Memory | p. 74 |
Search Tries | p. 75 |
Compact Search Tries | p. 76 |
Suffix Tries and Suffix Trees | p. 77 |
Suffix Arrays | p. 80 |
Algorithms for Searching Tries | p. 80 |
Building Tries | p. 83 |
Remarks on the Efficiency of the Algorithms | p. 85 |
The Burrows-Wheeler Transform | p. 85 |
Inverse transform | p. 86 |
BW Transform as a Compression Tool | p. 88 |
BW Transform as a Search Tool for Patterns | p. 89 |
BW Transform as an Associative, Compressed Memory | p. 90 |
Computational Complexity of BW Transform | p. 91 |
Hashing | p. 91 |
Hashing functions for addressing variables | p. 91 |
Collisions | p. 92 |
Statistics of Memory Access Time with Hashing | p. 93 |
Inquiring About Repetitive Structure of Sequences, Comparing Sequences and Detecting Sequence Overlap by Hashing | p. 94 |
Exercises | p. 95 |
Pattern Analysis | p. 97 |
Feature Extraction | p. 97 |
Classification | p. 98 |
Linear Classifiers | p. 98 |
Linear Classifier Functions and Artificial Neurons | p. 100 |
Artificial Neural Networks | p. 100 |
Support Vector Machines | p. 102 |
Clustering | p. 103 |
K-means Clustering | p. 104 |
Hierarchical Clustering | p. 105 |
Dimensionality Reduction, Principal Component Analysis | p. 107 |
Singular-Value Decomposition (SVD) | p. 108 |
Geometric Interpretation of SVD | p. 109 |
Partial-Least-Squares (PLS) Method | p. 115 |
Parametric Transformations | p. 116 |
Hough Transform | p. 117 |
Generalized Hough Transforms | p. 118 |
Geometric Hashing | p. 119 |
Exercises | p. 119 |
Optimization | p. 123 |
Static Optimization | p. 124 |
Convexity and Concavity | p. 126 |
Constrained Optimization with Equality Constraints | p. 128 |
Constrained Optimization with Inequality Constraints | p. 131 |
Sufficiency of Optimality Conditions for Constrained Problems | p. 133 |
Computing Solutions to Optimization Problems | p. 133 |
Linear Programming | p. 136 |
Quadratic Programming | p. 137 |
Recursive Optimization Algorithms | p. 137 |
Dynamic Programming | p. 140 |
Dynamic Programming Algorithm for a Discrete-Time System | p. 141 |
Tracing a Path in a Plane | p. 143 |
Shortest Paths in Arrays and Graphs | p. 145 |
Combinatorial Optimization | p. 147 |
Examples of Combinatorial Optimization Problems | p. 148 |
Time Complexity | p. 148 |
Decision and Optimization Problems | p. 149 |
Classes of Problems and Algorithms | p. 149 |
Suboptimal Algorithms | p. 150 |
Unsolved Problems | p. 150 |
Exercises | p. 151 |
Applications | |
Sequence Alignment | p. 155 |
Number of Possible Alignments | p. 157 |
Dot Matrices | p. 159 |
Scoring Correspondences and Mismatches | p. 160 |
Developing Scoring Functions | p. 162 |
Estimating Probabilities of Nucleotide Substitution | p. 162 |
Parametric Models of Nucleotide Substitution | p. 163 |
Computing Transition Probabilities | p. 165 |
Fitting Nucleotide Substitution Models to Data | p. 168 |
Breaking the Loop of Dependencies | p. 173 |
Scaling Substitution Probabilities | p. 173 |
Amino Acid Substitution Matrices | p. 173 |
Gaps | p. 177 |
Sequence Alignment by Dynamic Programming | p. 178 |
The Needleman-Wunsch Alignment Algorithm | p. 178 |
The Smith-Waterman Algorithm | p. 181 |
Aligning Sequences Against Databases | p. 182 |
Methods of Multiple Alignment | p. 183 |
Exercises | p. 184 |
Molecular Phylogenetics | p. 187 |
Trees: Vocabulary and Methods | p. 187 |
The Vocabulary of Trees | p. 188 |
Overview of Tree-Building Methodologies | p. 189 |
Distance-Based Trees | p. 190 |
Tree-Derived Distance | p. 191 |
Ultrametric Distances and Molecular-Clock Trees | p. 191 |
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Algorithm | p. 193 |
Neighbor-Joining Trees | p. 193 |
Maximum Likelihood (Felsenstein) Trees | p. 194 |
Hypotheses and Steps | p. 196 |
The Pulley Principle | p. 197 |
Estimating Branch Lengths | p. 197 |
Estimating the Tree Topology | p. 198 |
Maximum-Parsimony Trees | p. 198 |
Minimal Number of Evolutionary Events for a Given Tree | p. 199 |
Searching for the Optimal Tree Topology | p. 199 |
Miscellaneous Topics in Phylogenetic Tree Models | p. 200 |
The Nonparametric Bootstrap Method | p. 200 |
Variable Substitution Rates, the Felsenstein-Churchill Algorithm and Related Methods | p. 201 |
The Evolutionary Trace Method and Functional Sites in Proteins | p. 201 |
Coalescence Theory | p. 202 |
Neutral Evolution: Interaction of Genetic Drift and Mutation | p. 202 |
Modeling Genetic Drift | p. 203 |
Modeling Mutation | p. 204 |
Coalescence Under Different Demographic Scenarios | p. 204 |
Statistical Inference on Demographic Hypotheses and Parameters | p. 207 |
Markov Chain Monte Carlo (MCMC) Methods | p. 207 |
Approximate Approaches | p. 208 |
Exercises | p. 212 |
Genomics | p. 213 |
The DNA Molecule and the Central Dogma of Molecular Biology | p. 214 |
Genome Structure | p. 220 |
Genome Sequencing | p. 223 |
Restriction Enzymes | p. 224 |
Electrophoresis | p. 224 |
Southern Blot | p. 224 |
The Polymerase Chain Reaction | p. 225 |
DNA Cloning | p. 226 |
Chain Termination DNA Sequencing | p. 226 |
Genome Shotgun Sequencing | p. 228 |
Genome Assembly Algorithms | p. 230 |
Growing Contigs from Fragments | p. 230 |
Detection of Overlaps Between Reads | p. 230 |
Repetitive Structure of DNA | p. 232 |
The Shortest Superstring Problem | p. 233 |
Overlap Graphs and the Hamiltonian Path Problem | p. 234 |
Sequencing by Hybridization | p. 235 |
De Bruijn Graphs | p. 238 |
All l-mers in the Reads | p. 238 |
The Euler Superpath Problem | p. 239 |
Further Aspects of DNA Assembly Algorithms | p. 240 |
Statistics of the Genome Coverage | p. 243 |
Contigs, Gaps and Anchored Contigs | p. 244 |
Statistics with Minimum Overlaps Between Fragments, Anchored Contigs | p. 246 |
Genome Length and Structure Estimation by Sampling l-mers | p. 247 |
Polymorphisms | p. 252 |
Genome Annotation | p. 252 |
Research Tools for Genome Annotation | p. 254 |
Gene Identification | p. 254 |
DNA Motifs | p. 257 |
Annotation by Words and Comparisons of Genome Assemblies | p. 258 |
Human Chromosome 14 | p. 258 |
Exercises | p. 259 |
Proteomics | p. 261 |
Protein Structure | p. 262 |
Amino Acids | p. 262 |
Peptide Bonds | p. 265 |
Primary Structure | p. 266 |
Secondary Structure | p. 266 |
Tertiary Structure | p. 268 |
Quaternary Structure | p. 271 |
Experimental Determination of Amino Acid Sequences and Protein Structures | p. 271 |
Electrophoresis | p. 272 |
Protein 2D Gels | p. 272 |
Protein Western Blots | p. 273 |
Mass Spectrometry | p. 273 |
Chemical Identification of Amino Acids in Peptides | p. 274 |
Analysis of Protein 3D Structure by X Ray Diffraction and NMR | p. 275 |
Other Assays for Protein Compositions and Interactions | p. 275 |
Computational Methods for Modeling Molecular Structures | p. 275 |
Molecular-Force-Field Model | p. 276 |
Molecular Dynamics | p. 281 |
Hydrogen Bonds | p. 281 |
Computation and Minimization of RMSD | p. 282 |
Solutions to the Problem of Minimization of RMSD over Rotations | p. 284 |
Solutions to the Problem of Minimization of RMSD over Rotations and Translations | p. 290 |
Solvent-Accessible Surface of a Protein | p. 290 |
Computational Prediction of Protein Structure and Function | p. 290 |
Inferring Structures of Proteins | p. 291 |
Protein Annotation | p. 292 |
De Novo Methods | p. 292 |
Comparative Modeling | p. 293 |
Protein-Ligand Binding Analysis | p. 295 |
Classification Based on Proteomic Assays | p. 295 |
Exercises | p. 296 |
RNA | p. 299 |
The RNA World Hypothesis | p. 300 |
The Functions of RNA | p. 300 |
Reverse Transcription, Sequencing RNA Chains | p. 301 |
The Northern Blot | p. 302 |
RNA Primary Structure | p. 302 |
RNA Secondary Structure | p. 302 |
RNA Tertiary Structure | p. 302 |
Computational Prediction of RNA Secondary Structure | p. 303 |
Nested Structure | p. 304 |
Maximizing the Number of Pairings Between Bases | p. 304 |
Minimizing the Energy of RNA Secondary Structure | p. 306 |
Pseudoknots | p. 310 |
Prediction of RNA Structure by Comparative Sequence Analysis | p. 311 |
Exercises | p. 311 |
DNA Microarrays | p. 313 |
Design of DNA Microarrays | p. 315 |
Kinetics of the Binding Process | p. 318 |
Data Preprocessing and Normalization | p. 320 |
Normalization Procedures for Single Microarrays | p. 321 |
Normalization Based on Spiked-in Control RNA | p. 323 |
RMA Normalization Procedure | p. 326 |
Correction of Ratio-Intensity Plots for cDNA | p. 328 |
Statistics of Gene Expression Profiles | p. 328 |
Modeling Probability Distributions of Gene Expressions | p. 331 |
Class Prediction and Class Discovery | p. 336 |
Dimensionality Reduction | p. 337 |
Example of Application of PCA to Microarray Data | p. 338 |
Class Discovery | p. 338 |
Hierarchical Clustering | p. 339 |
Class Prediction. Differentially Expressed Genes | p. 340 |
Multiple Testing, and Analysis of False Discovery Rate (FDR) | p. 341 |
FDR analysis in ALL versus AML gene expression data | p. 344 |
The Gene Ontology Database | p. 344 |
Structure of GO | p. 345 |
Other Vocabularies of Terms | p. 346 |
Supporting Results of DNA Microarray Analyses with GO and other Vocabulary Terms | p. 347 |
Exercises | p. 347 |
Bioinformatic Databases and Bioinformatic Internet Resources | p. 349 |
Genomic Databases | p. 350 |
Proteomic Databases | p. 350 |
RNA Databases | p. 350 |
Gene Expression Databases | p. 351 |
Ontology Databases | p. 351 |
Databases of Genetic and Proteomic Pathways | p. 351 |
Programs and Services | p. 352 |
Clinical Databases | p. 352 |
References | p. 355 |
Index | p. 371 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.