String Partition for Building Long BWTs
Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
15.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Constructing the Burrows-Wheeler transform (BWT) for long strings poses
significant challenges regarding construction time and memory usage. We use a
prefix of the suffix array to partition a long string into shorter substrings,
thereby enabling the use of multi-string BWT construction algorithms to process
these partitions fast. We provide an implementation partDNA for DNA sequences.
Through comparison with state-of-the-art BWT construction algorithms, we show
that partDNA with ropebwt2 offers a novel trade-off for construction time and
memory usage for BWT construction on real genome datasets. Beyond this, the
proposed partitioning strategy is applicable to strings of any alphabet. |
---|---|
DOI: | 10.48550/arxiv.2406.10610 |