Introduction | p. 1 |
An example of a Burrows-Wheeler Transform | p. 3 |
Genesis of the Burrows-Wheeler Transform | p. 5 |
Transformation | p. 8 |
Permutation | p. 11 |
Recency | p. 12 |
Pattern matching | p. 13 |
Organization of this book | p. 14 |
Further reading | p. 16 |
How the Burrows-Wheeler Transform works | p. 19 |
The forward Burrows-Wheeler Transform | p. 19 |
The reverse Burrows-Wheeler Transform | p. 23 |
Special cases | p. 29 |
Further reading | p. 31 |
Coders for the Burrows-Wheeler Transform | p. 33 |
Entropy coding | p. 33 |
Run-length and arithmetic coder | p. 38 |
Move-to-front lists | p. 39 |
Frequency counting methods | p. 42 |
Inversion Frequencies (IF) | p. 43 |
Distance coding | p. 44 |
Wavelet trees | p. 45 |
Other permutations | p. 46 |
Block size | p. 47 |
Further reading | p. 48 |
Suffix trees and suffix arrays | p. 51 |
Suffix Trees | p. 51 |
Basic notations and definitions | p. 52 |
Construction of a suffix tree | p. 54 |
Ukkonen's suffix tree algorithm | p. 57 |
From implicit suffix tree to true suffix tree | p. 64 |
Farach's recursive construction | p. 66 |
Generalized suffix trees | p. 73 |
Implementation issues | p. 74 |
Suffix arrays | p. 75 |
Traditional string sorting | p. 76 |
Suffix arrays via suffix trees | p. 78 |
Manber-Myers suffix sorting algorithm | p. 78 |
Linear-time direct suffix sorting | p. 81 |
Space issues in suffix trees and suffix arrays | p. 85 |
Further reading | p. 88 |
Analysis of the Burrows-Wheeler Transform | p. 91 |
The BWT, suffix trees and suffix arrays | p. 93 |
Computational complexity | p. 95 |
BWT first stage - the transform | p. 95 |
BWT second stage - coding the transformed text | p. 95 |
BWT context clustering property | p. 97 |
Context trees | p. 97 |
Estimation using context trees | p. 100 |
BWT and context trees | p. 103 |
Analysis of BWT output | p. 104 |
Theoretical distribution of BWT output | p. 104 |
Empirical distribution of BWT output | p. 105 |
Analysis of BWT compression performance | p. 119 |
Definitions and notation | p. 120 |
Performance using recency ranking | p. 123 |
Performance without LGT | p. 129 |
Performance using piecewise constant parameters | p. 132 |
Performance on general sources via empirical entropy | p. 133 |
Relationship with other compression schemes | p. 135 |
Context-based schemes | p. 135 |
Symbol ranking schemes | p. 148 |
Further reading | p. 149 |
Variants of the Burrows-Wheeler Transform | p. 153 |
The sort transform | p. 154 |
Forward sort transform | p. 154 |
Inverse sort transform | p. 155 |
Performance of the sort transform | p. 159 |
Lexical permutation sorting | p. 163 |
Sorting permutations | p. 164 |
Lexical permutation sorting algorithm | p. 167 |
The extended BWT | p. 168 |
Sort order between strings | p. 168 |
Performing the extended BWT | p. 169 |
Inverting the transform | p. 170 |
Sort-based context similarity measurement | p. 173 |
Context similarity measurement and ranking | p. 173 |
The prefix list data structure | p. 175 |
Relationship with the Burrows-Wheeler Transform | p. 178 |
Performance of the prefix list | p. 180 |
Word-based compression | p. 180 |
General word-based compression | p. 181 |
Word-based Burrows-Wheeler Transform | p. 183 |
Further reading | p. 185 |
Exact and approximate pattern matching | p. 187 |
Exact pattern matching algorithms | p. 188 |
Brute force matching | p. 189 |
The Knuth-Morris-Pratt Algorithm | p. 190 |
The Boyer-Moore algorithm | p. 195 |
The Karp-Rabin algorithm | p. 197 |
The shift-and method | p. 199 |
Multiple pattern matching | p. 200 |
Pattern matching with don't-care characters | p. 204 |
Pattern matching using the Burrows-Wheeler Transform | p. 207 |
Boyer-Moore pattern matching using the BWT | p. 209 |
BWT-based exact pattern matching with binary search | p. 209 |
BWT-based exact pattern matching with suffix arrays | p. 214 |
Pattern matching using the FM-index | p. 215 |
Algorithm improvements with overwritten arrays | p. 220 |
Performance of BWT-based exact pattern matching | p. 221 |
Compression performance | p. 222 |
Search performance | p. 224 |
Array construction speeds | p. 231 |
Comparison with LZ-based compressed-domain pattern matching | p. 232 |
Approximate pattern matching | p. 233 |
Edit distance: dynamic programming formulation | p. 234 |
Edit graphs | p. 236 |
Local similarity | p. 237 |
The longest common subsequence problem | p. 239 |
String matching with k differences | p. 244 |
The k-mismatch problem using the BWT | p. 247 |
k-approximate matching using the BWT | p. 253 |
Hardware algorithms for pattern matching | p. 255 |
An equivalent hardware algorithm | p. 256 |
A brief review of other hardware algorithms | p. 258 |
Conclusion | p. 259 |
Further reading | p. 260 |
Other applications of the Burrows-Wheeler Transform | p. 265 |
Compressed suffix trees and compressed suffix arrays | p. 266 |
Compressed suffix trees | p. 267 |
Compressed suffix arrays | p. 270 |
Compressed full-text indexing | p. 275 |
Full-text indexing using CSTs and CSAs | p. 276 |
Searching on compressed suffix trees | p. 277 |
Searching on compressed suffix arrays | p. 278 |
Bioinformatics and computational biology | p. 278 |
DNA sequence compression | p. 279 |
Analysis of repetition structures | p. 280 |
Whole-genome comparisons | p. 281 |
Genome annotation | p. 282 |
Distance measure between sequences and phylogeny | p. 283 |
Test data compression | p. 284 |
Nature of test data | p. 285 |
BWT-based test data compression | p. 286 |
Image compression, computer vision and machine translation | p. 287 |
Image compression | p. 287 |
Shape matching | p. 292 |
Machine translation | p. 294 |
Joint source-channel coding | p. 296 |
General source coding via channel coding | p. 297 |
BWT-based joint source-channel coding | p. 298 |
Prediction and entropy estimation | p. 299 |
Further reading | p. 301 |
Conclusion | p. 305 |
Notation | p. 309 |
Ongoing work on the Burrows-Wheeler Transform | p. 313 |
BWT-related web sites | p. 313 |
Ph.D. theses relating to the Burrows-Wheeler Transform | p. 314 |
References | p. 317 |
Index | p. 341 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.