Fast Top-k Similar Sequence Search on DNA Databases
Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics fi...
Saved in:
Published in | Information Integration and Web Intelligence pp. 145 - 150 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer Nature Switzerland
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics field, it suffers from an expensive computational cost. To overcome these limitations, we propose a novel fast top-k similarity search algorithm for DNA databases. We conducted experiments using real-world DNA sequence datasets, and experimentally confirmed that the proposed method achieves a faster top-k search than baseline algorithms while keeping high accuracy. |
---|---|
ISBN: | 9783031210464 3031210468 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-21047-1_14 |