Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage

Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 24; no. 1; pp. 111 - 11
Main Authors Xie, Ranze, Zan, Xiangzhen, Chu, Ling, Su, Yanqing, Xu, Peng, Liu, Wenbin
Format Journal Article
LanguageEnglish
Published London BioMed Central 23.03.2023
BioMed Central Ltd
BMC
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/s12859-023-05237-9

Cover

More Information
Summary:Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (≤ 70), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-023-05237-9