PA-Star: A disk-assisted parallel A-Star strategy with locality-sensitive hash for multiple sequence alignment

Multiple Sequence Alignment (MSA) is a basic operation in Bioinformatics, and is used to highlight the similarities among a set of sequences. The MSA problem was proven NP-Hard, thus requiring a high amount of memory and computing power. This problem can be modeled as a search for the path with mini...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 112; pp. 154 - 165
Main Authors Sundfeld, Daniel, Razzolini, Caina, Teodoro, George, Boukerche, Azzedine, de Melo, Alba Cristina Magalhaes Alves
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.02.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multiple Sequence Alignment (MSA) is a basic operation in Bioinformatics, and is used to highlight the similarities among a set of sequences. The MSA problem was proven NP-Hard, thus requiring a high amount of memory and computing power. This problem can be modeled as a search for the path with minimum cost in a graph, and the A-Star algorithm has been adapted to solve it sequentially and in parallel. The design of a parallel version for MSA with A-Star is subject to challenges such as irregular dependency pattern and substantial memory requirements. In this paper, we propose PA-Star, a locality-sensitive multithreaded strategy based on A-Star, which computes optimal MSAs using both RAM and disk to store nodes. The experimental results obtained in 3 different machines show that the optimizations used in PA-Star can achieve an acceleration of 1.88× in the serial execution, and the parallel execution can attain an acceleration of 5.52× with 8 cores. We also show that PA-Star outperforms a state-of-the-art MSA tool based on A-Star, executing up to 4.77× faster. Finally, we show that our disk-assisted strategy is able to retrieve the optimal alignment when other tools fail. •An A-Star based algorithm that retrieves optimal multiple sequence alignments.•Locality-sensitive hash functions to assign work to cores.•Disk-assisted strategy which augments the amount of memory.•Better performance than state-of-the-art.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2017.04.014