BWA-MEME: BWA-MEM emulated with a machine learning approach

Abstract Motivation The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads i...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 38; no. 9; pp. 2404 - 2413
Main Authors Jung, Youngmok, Han, Dongsu
Format Journal Article
LanguageEnglish
Published England Oxford University Press 28.04.2022
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Motivation The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses. Results This article presents BWA-MEME, the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45× speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60×, memory accesses by 8.77× and LLC misses by 2.21×, while ensuring the identical SAM output to BWA-MEM2. Availability and implementation The source code and test scripts are available for academic use at https://github.com/kaist-ina/BWA-MEME/. Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btac137