In order to study living organisms, scientists not only study them at an overall macroscopic scale but also on a more detailed microscopic scale. This observation, pushed to its limits, consists of investigating the very center of each cell, where we find the molecules that determine the way it functions: DNA (deoxyribonucleic acid) and RNA (ribonucleic acid).

In an organism, DNA carries the genetic information, which is called the genome. It is represented as four-letter sequences using the letters A, C, G and T; based on these sequences, computer methods described in this book can answer fundamental questions in bioinformatics.

This book explores how to quickly find sequences of a few hundred nucleotides within a genome that may be made up of several billion, how to compare those sequences and how to reconstruct the complete sequence of a genome. It also discusses the problems of identifying bacteria in a given environment and predicting the structure of RNA based on its sequence.

Annie Chateau is a lecturer at the University of Montpellier, France. Her research interests include algorithms and combinatorial structures.

Mikaël Salson is a lecturer at the University of Lille, France. His work focuses mainly on indexing and sequence comparison.

Preface xi

Author Biographies xvii

Chapter 1 Methodological Concepts: Algorithmic Solutions of Bioinformatics Problems 1
Annie CHATEAU and Tom DAVOT-GRANGÉ

1.1 Data, models, problem formalism in bioinformatics 1

1.1.1 Data 1

1.1.2 Genomemodeling 4

1.1.3 Problems in bioinformatics 5

1.2 Mathematicalpreliminaries 6

1.2.1 Propositional logic preliminaries 6

1.2.2 Preliminarieson sets 7

1.3 Vocabularyin text algorithmics 9

1.4 Graphtheory 10

1.4.1 Subgraphs 12

1.4.2 Path in a graph 13

1.4.3 Matching 13

1.4.4 Planarity 14

1.4.5 Tree decomposition 15

1.5 Algorithmicproblems 16

1.5.1 Definition 16

1.5.2 Graphproblem 17

1.5.3 Satisfiability problems 19

1.6 Problemsolutions 20

1.6.1 Algorithm 20

1.6.2 Complexity 21

1.6.3 Runtime 24

1.7 Complexityclasses 26

1.7.1 Generality 26

1.7.2 Exact algorithms 28

1.7.3 Approximationalgorithms 32

1.7.4 Solvers 34

1.8 Some algorithmic techniques 35

1.8.1 Dynamicprogramming 35

1.8.2 Tree traversal 38

1.9 Validation 41

1.9.1 The different types of errors 42

1.9.2 Quality measures 44

1.9.3 And in the non-binary case? 46

1.10.Conclusion 47

1.11.References 47

Chapter 2 Sequence Indexing 49
Thierry LECROQ and Mikaël SALSON

2.1 Introduction 49

2.1.1 What is indexing? 50

2.1.2 When to index? 51

2.1.3 What to index? 51

2.1.4 Indexing structures and queries considered 52

2.1.5 Basic notions and vocabulary 53

2.2 Word indexing 54

2.2.1 Bloomfilters 54

2.2.2 Invertedlist 56

2.2.3 DeBruijn graphs 60

2.2.4 Efficient structures for targeted queries 61

2.3 Full-text indexing 62

2.3.1 Suffixtree 62

2.3.2 (Extended)suffix array 64

2.3.3 Burrows–Wheeler transform 67

2.4 Indexingchoice criteria 76

2.4.1 Based on the type of the necessary query 77

2.4.2 Based on the space-time and data quantity trade-off 77

2.4.3 Based on the need to add or modify indexed data 79

2.4.4 Indexing choices according to applications 80

2.5 Conclusionandperspectives 81

2.5.1 Efficient methods for indexing a few genomes or sequencingsets 81

2.5.2 Methods that struggle to take advantage of data redundancy 82

2.6 References 83

Chapter 3 Sequence Alignment 87
Laurent NOÉ

3.1 Introduction 87

3.1.1 What is pairwise alignment? 87

3.1.2 Howto evaluate an alignment? 88

3.2 Exact alignment 90

3.2.1 Representationin edit graphform 90

3.2.2 Global alignment and Needleman–Wunsch algorithm 93

3.2.3 Local alignment and Smith–Waterman algorithm 94

3.2.4 Alignment with affine indel function and the Gotoh algorithm 96

3.3 Heuristic alignment 98

3.3.1 Seeds 99

3.3.2 Min-hash andglobal sampling 105

3.3.3 Minimizing andlocal sampling 106

3.4 References 109

Chapter 4 Genome Assembly 113
Dominique LAVENIER

4.1 Introduction 113

4.2 Sequencing technologies 116

4.2.1 Short reads 117

4.2.2 Longreads 118

4.2.3 Linkedreads 118

4.2.4 Hi-Creads 119

4.2.5 Opticalmapping 119

4.3 Assemblystrategies 120

4.3.1 Themainsteps 120

4.3.2 Cleaningand correctionof reads 121

4.3.3 Scaffoldconstruction 122

4.3.4 Scaffoldordering 123

4.4 Scaffold construction methods 124

4.4.1 Greedyassembly 124

4.4.2 OLCassembly 126

4.4.3 DBGassembly 127

4.4.4 Constrainedassembly 130

4.5 Scaffold-ordering methods 132

4.5.1 Hi-C data-based methods 132

4.5.2 Optical mapping-based methods 137

4.6 Assemblyvalidation 139

4.6.1 Metrics 140

4.6.2 Read realignment 140

4.6.3 Gene prediction 141

4.6.4 Competitions 141

4.7 Conclusion 142

4.8 References 143

Chapter 5 Metagenomics and Metatranscriptomics 147
Cervin GUYOMAR and Claire LEMAITRE

5.1 What ismetagenomics? 147

5.1.1 Motivations andhistorical context 147

5.1.2 Themetagenomicdata 148

5.1.3 Bioinformatics challenges formetagenomics 151

5.2 “Who are they”: taxonomic characterization of microbial communities 153

5.2.1 Methods for targeted metagenomics 154

5.2.2 Whole-genome methods with reference 155

5.2.3 Reference-free methods 160

5.3 “What are they able to do?”: functional metagenomics 166

5.3.1 Gene prediction and annotation 166

5.3.2 Metatranscriptomics 167

5.3.3 Reconstruction of metabolic networks 168

5.4 Comparativemetagenomics 169

5.4.1 Comparative metagenomics with diversity estimation 170

5.4.2 De novocomparativemetagenomics 170

5.5 Conclusion 175

5.6 References 176

Chapter 6 RNA Folding 185
Yann PONTY and Vladimir REINHARZ

6.1 Introduction 185

6.1.1 RNAfolding 186

6.1.2 Secondary structure 189

6.2 Optimizationfor structureprediction 192

6.2.1 Computing the minimum free-energy (MFE) structure 192

6.2.2 Listing (sub)optimal structures 198

6.2.3 Comparative prediction: simultaneous alignment/folding ofRNAs 203

6.2.4 Joint alignment/folding model 204

6.3 AnalyzingtheBoltzmann ensemble 210

6.3.1 Computing the partition function 210

6.3.2 Statistical sampling 215

6.3.3 Boltzmann probability of structural patterns 220

6.4 Studying RNA structure in practice 225

6.4.1 TheTurnermodel 225

6.4.2 Tools 228

6.5 References 228

Conclusion 233

List of Authors 237

Index 239

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

From Sequences to Graphs Discrete Methods and Structures for Bioinformatics

9781789450668

1789450667

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program