String Partition for Building Long BWTs

Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to...

Full description

Saved in:
Bibliographic Details
Main Authors Adler, Enno, Böttcher, Stefan, Hartel, Rita
Format Journal Article
LanguageEnglish
Published 15.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to process these partitions fast. We provide an implementation partDNA for DNA sequences. Through comparison with state-of-the-art BWT construction algorithms, we show that partDNA with ropebwt2 offers a novel trade-off for construction time and memory usage for BWT construction on real genome datasets. Beyond this, the proposed partitioning strategy is applicable to strings of any alphabet.
DOI:10.48550/arxiv.2406.10610