Sequencing Coverage Analysis for Combinatorial DNA-Based Storage Systems
This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of th...
Saved in:
Published in | IEEE transactions on molecular, biological, and multi-scale communications Vol. 10; no. 2; pp. 297 - 316 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.06.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 2372-2061 2372-2061 |
DOI | 10.1109/TMBMC.2024.3408053 |
Cover
Summary: | This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of the coupon collector distribution for this purpose. For any given number of observed reads, <inline-formula> <tex-math notation="LaTeX">R\in \mathbb {N} </tex-math></inline-formula>, we use a Markov Chain representation of the process to compute the probability of error-free reconstruction. We develop theoretical bounds on the decoding probability and use empirical simulations to validate these bounds and assess tightness. This work contributes to understanding sequencing coverage in DNA-based data storage, offering insights into decoding complexity, error correction, and sequence reconstruction. We provide a Python package, with its input being the code design and other message parameters, all of which are denoted as <inline-formula> <tex-math notation="LaTeX">\boldsymbol {\Theta } </tex-math></inline-formula>, and a desired confidence level <inline-formula> <tex-math notation="LaTeX">1-\delta </tex-math></inline-formula>. This package computes the required read coverage, guaranteeing the message reconstruction <inline-formula> <tex-math notation="LaTeX">R=R(\delta,\boldsymbol {\Theta }) </tex-math></inline-formula>. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2372-2061 2372-2061 |
DOI: | 10.1109/TMBMC.2024.3408053 |