A visualization tool to explore alphabet orderings for the Burrows-Wheeler Transform (2402.17005v1)
Abstract: The Burrows-Wheeler Transform (BWT) is an efficient invertible text transformation algorithm with the properties of tending to group identical characters together in a run, and enabling search of the text. This transformation has extensive uses particularly in lossless compression algorithms, indexing, and within bioinformatics for sequence alignment tasks. There has been recent interest in minimizing the number of identical character runs ($r$) for a transform and in finding useful alphabet orderings for the sorting step of the matrix associated with the BWT construction. This motivates the inspection of many transforms while developing algorithms. However, the full Burrows-Wheeler matrix is $O(n2)$ space and therefore very difficult to display and inspect for large input sizes. In this paper we present a graphical user interface (GUI) for working with BWTs, which includes features for searching for matrix row prefixes, skipping over sections in the right-most column (the transform), and displaying BWTs while exploring alphabet orderings with the goal of minimizing the number of runs.
- The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer US, Boston, MA, 2008.
- On the complexity of BWT-runs minimization via alphabet reordering. In F. Grandoni, G. Herman, and P. Sanders, editors, 28th Annual European Symposium on Algorithms (ESA 2020), volume 173 of Leibniz International Proceedings in Informatics (LIPIcs), pages 15:1–15:13, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. DOI:10.4230/LIPIcs.ESA.2020.15.
- M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Digital Systems Research Center, Palo Alto, 1994. https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf.
- CacheSleuth. Burrows-Wheeler transform. https://www.cachesleuth.com/burrowswheeler.html. Accessed: 23/02/2024.
- calcoolator.eu. Burrows-Wheeler transform - online encoder / decoder- online calculators. https://calcoolator.eu/burrows-wheeler-transform-encoder-decoder-. Accessed: 23/02/2024.
- B. Chapin and S. R. Tate. Higher compression from the Burrows-Wheeler Transform by modified sorting. In Data Compression Conference, DCC 1998, Snowbird, Utah, USA, March 30 - April 1, 1998, page 532. IEEE Computer Society, 1998. DOI:10.1109/DCC.1998.672253.
- dCode. Burrows–Wheeler Transform Calculator - Online Decoder, Encoder. https://www.dcode.fr/burrows-wheeler-transform. Accessed: 23/02/2024.
- P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552–581, July 2005. DOI:10.1145/1082036.1082039.
- S. Ferretti. On the complex network structure of musical pieces: analysis of some use cases from different music genres. Multim. Tools Appl., 77(13):16003–16029, 2018. DOI:10.1007/S11042-017-5175-Y.
- Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM, 67(1):1–54, 2020. DOI:10.1145/3375890.
- A new class of string transformations for compressed text indexing. Inf. Comput., 294:105068, 2023. DOI:10.1016/J.IC.2023.105068.
- B. Haubold. Compute Burrows-Wheeler Transform. http://guanine.evolbio.mpg.de/cgi-bin/bwt/bwt.cgi.pl, 2012. Accessed: 23/02/2024.
- Japiejoo. Geocachingtoolbox.com: Burrows-Wheeler transform. https://www.geocachingtoolbox.com/index.php?lang=en&page=burrowsWheelerTransform. Accessed: 23/02/2024.
- P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. Journal of Discrete Algorithms, Combinatorial Pattern Matching (CPM) Special Issue, 3(2):143–156, 2005. DOI:10.1016/j.jda.2004.08.002.
- B. Langmead and S. L. Salzberg. Fast gapped-read alignment with Bowtie 2. Nat Methods, 9:357–359, 2012. DOI:10.1038/nmeth.1923.
- H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009. DOI:10.1093/bioinformatics/btp324.
- SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 25(15):1966–1967, 06 2009. DOI:10.1093/bioinformatics/btp336.
- Heuristics for the run-length encoded Burrows-Wheeler Transform alphabet ordering problem. arXiv, 2401.16435, 2024. DOI:10.48550/arXiv.2401.16435.
- P. Medvedev and M. Pop. What do Eulerian and Hamiltonian cycles have to do with genome assembly? PLOS Computational Biology, 17(5):e1008928, 2021. DOI:10.1371/journal.pcbi.1008928.
- Y. Mori. An implementation of the induced sorting algorithm. https://web.archive.org/web/20230309123010/https://sites.google.com/site/yuta256/sais, 2010. Accessed: 23/02/2024.
- Two efficient algorithms for linear time suffix array construction. IEEE Transactions on Computers, 60(10):1471–1484, 2011. DOI:10.1109/TC.2010.188.
- OpenJFX. JavaFX base. https://mvnrepository.com/artifact/org.openjfx/javafx-base/19.0.2.1, 2023.
- OpenJFX. JavaFX controls. https://mvnrepository.com/artifact/org.openjfx/javafx-controls/19.0.2.1, 2023.
- OpenJFX. JavaFX FXML. https://mvnrepository.com/artifact/org.openjfx/javafx-fxml/19.0.2.1, 2023.
- OpenJFX. JavaFX graphics. https://mvnrepository.com/artifact/org.openjfx/javafx-graphics/19.0.2.1, 2023.
- org.json. JSON in Java. https://mvnrepository.com/artifact/org.json/json, 2024.
- Finding maximal exact matches using the r-index. J. Comput. Biol., 29(2):188–194, 2022. DOI:10.1089/cmb.2021.0445.
- A. Rubbi. Burrows-Wheeler-implementation: Python implementation of the Burrow Wheeler algorithm with a GUI. https://github.com/AndreaRubbi/Burrows-Wheeler-implementation, 2019.
- J. Seward. bzip2 and libbzip2. http://sourceware.org/bzip2/, 1996.
- Movi: a fast and cache-efficient full-text pangenome index. bioRxiv, 2023.11.04.565615, 2023. DOI:10.1101/2023.11.04.565615.