Fast Top-k Similar Sequence Search on DNA Databases

Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics fi...

Full description

Saved in:
Bibliographic Details
Published inInformation Integration and Web Intelligence pp. 145 - 150
Main Authors Yagi, Ryuichi, Shiokawa, Hiroaki
Format Book Chapter
LanguageEnglish
Published Cham Springer Nature Switzerland
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Top-k similar sequence search is an essential tool for DNA data management. Given a DNA database, it is a problem to extract k similar DNA sequence pairs in the database, which yield the highest similarity among all possible pairs. Although this is a fundamental problem used in the bioinformatics field, it suffers from an expensive computational cost. To overcome these limitations, we propose a novel fast top-k similarity search algorithm for DNA databases. We conducted experiments using real-world DNA sequence datasets, and experimentally confirmed that the proposed method achieves a faster top-k search than baseline algorithms while keeping high accuracy.
ISBN:9783031210464
3031210468
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-031-21047-1_14