Sequencing Coverage Analysis for Combinatorial DNA-Based Storage Systems

This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of th...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on molecular, biological, and multi-scale communications Vol. 10; no. 2; pp. 297 - 316
Main Authors Preuss, Inbal, Galili, Ben, Yakhini, Zohar, Anavy, Leon
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.06.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2372-2061
2372-2061
DOI10.1109/TMBMC.2024.3408053

Cover

More Information
Summary:This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of the coupon collector distribution for this purpose. For any given number of observed reads, <inline-formula> <tex-math notation="LaTeX">R\in \mathbb {N} </tex-math></inline-formula>, we use a Markov Chain representation of the process to compute the probability of error-free reconstruction. We develop theoretical bounds on the decoding probability and use empirical simulations to validate these bounds and assess tightness. This work contributes to understanding sequencing coverage in DNA-based data storage, offering insights into decoding complexity, error correction, and sequence reconstruction. We provide a Python package, with its input being the code design and other message parameters, all of which are denoted as <inline-formula> <tex-math notation="LaTeX">\boldsymbol {\Theta } </tex-math></inline-formula>, and a desired confidence level <inline-formula> <tex-math notation="LaTeX">1-\delta </tex-math></inline-formula>. This package computes the required read coverage, guaranteeing the message reconstruction <inline-formula> <tex-math notation="LaTeX">R=R(\delta,\boldsymbol {\Theta }) </tex-math></inline-formula>.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2372-2061
2372-2061
DOI:10.1109/TMBMC.2024.3408053