Fast Top-k Similar Sequence Search on DNA Databases

Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics fi...

Full description

Saved in:

Bibliographic Details
Published in	Information Integration and Web Intelligence pp. 145 - 150
Main Authors	Yagi, Ryuichi, Shiokawa, Hiroaki
Format	Book Chapter
Language	English
Published	Cham Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Subjects	DNA database Edit distance similarity search Top
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics field, it suffers from an expensive computational cost. To overcome these limitations, we propose a novel fast top-k similarity search algorithm for DNA databases. We conducted experiments using real-world DNA sequence datasets, and experimentally confirmed that the proposed method achieves a faster top-k search than baseline algorithms while keeping high accuracy.
ISBN:	9783031210464 3031210468
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-21047-1_14