Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintai...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Bar-Lev, Daniella, Orr, Itai, Sabary, Omer, Etzion, Tuvi, Yaakobi, Eitan
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 11.03.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Tensor-Product (TP) based Error-Correcting Codes (ECC), and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1MB of information using two different sequencing technologies. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
AbstractList DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of the main bottlenecks are the scalability and accuracy, which have a natural tradeoff between the two. Here we show a modular and holistic approach that combines Deep Neural Networks (DNN) trained on simulated data, Tensor-Product (TP) based Error-Correcting Codes (ECC), and a safety margin mechanism into a single coherent pipeline. We demonstrated our solution on 3.1MB of information using two different sequencing technologies. Our work improves upon the current leading solutions by up to x3200 increase in speed, 40% improvement in accuracy, and offers a code rate of 1.6 bits per base in a high noise regime. In a broader sense, our work shows a viable path to commercial DNA storage solutions hindered by current information retrieval processes.
Author Etzion, Tuvi
Sabary, Omer
Yaakobi, Eitan
Orr, Itai
Bar-Lev, Daniella
Author_xml – sequence: 1
  givenname: Daniella
  surname: Bar-Lev
  fullname: Bar-Lev, Daniella
– sequence: 2
  givenname: Itai
  surname: Orr
  fullname: Orr, Itai
– sequence: 3
  givenname: Omer
  surname: Sabary
  fullname: Sabary, Omer
– sequence: 4
  givenname: Tuvi
  surname: Etzion
  fullname: Etzion, Tuvi
– sequence: 5
  givenname: Eitan
  surname: Yaakobi
  fullname: Yaakobi, Eitan
BookMark eNrjYmDJy89LZWLgNDI2NtS1MDEy4mDgLS7OMjAwMDIzNzI1NeZkCHVJTS1QcPFzVAguyS9KTE-1UghOTsxJTMpJVUjMS1EIyk8qLS5BVqBQlpmo4JyfkpmXrhCSkZpfVAlWCDbHJzWxKA8owcPAmpaYU5zKC6W5GZTdXEOcPXQLivILS1OLS-Kz8kuL8oBS8UamZhYmQBcZWBoTpwoA6C1APQ
ContentType Paper
Copyright 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Engineering Collection
Engineering Database
ProQuest Publicly Available Content database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest One Academic
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-proquest_journals_25684000093
IEDL.DBID 8FG
IngestDate Thu Oct 10 19:11:04 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_25684000093
OpenAccessLink https://www.proquest.com/docview/2568400009?pq-origsite=%requestingapplication%
PQID 2568400009
PQPubID 2050157
ParticipantIDs proquest_journals_2568400009
PublicationCentury 2000
PublicationDate 20240311
PublicationDateYYYYMMDD 2024-03-11
PublicationDate_xml – month: 03
  year: 2024
  text: 20240311
  day: 11
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2024
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 3.529157
SecondaryResourceType preprint
Snippet DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Artificial neural networks
Clustering
Codes
Deoxyribonucleic acid
DNA
Error correcting codes
Error correction
Gene sequencing
Information retrieval
Machine learning
Nanotechnology
Robustness (mathematics)
Storage systems
Synthesis
Title Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning
URI https://www.proquest.com/docview/2568400009
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8MwED90RfDNT_yYI6CvxSXt0uqL6NY6hJWxOdjbyFfFF1vXzkf_di-xU0HYY0gIyZHc736XuxzAVR4Hsc5Z7osbY0uYMePH3EhfK2ODGaVQ7kuhUcaHs_Bp3ps3DreqCatc60SnqHWhrI_8GqEZuYg1Ce7Kd99WjbKvq00JjW3wKIsiS77i9PHHx8J4hBZz8E_NOuxI98Abi9Is92HLvB3Ajgu5VNUhzAbGlGSQ3ZMp8l681rdkigKzqUwE6T2ZFHJV1X8HkI9XQfqFRRvynVPvBrp5mn9SX47gMk2e-0N_vZZFc1qqxe_egmNoIe03J0BCobjistfVMgx1pIWmNKJacya7JlD8FNqbZjrb3H0Ouwzh2UZTUdqGVr1cmQuE11p2nAw74D0k2XiCrdFn8gVtg4S9
link.rule.ids 783,787,12777,21400,33385,33756,43612,43817
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT8MwDLZgE4IbT_EYEAmuFetjaeGC0EYpsFWIbdJuVR7uxIWOteP344QOkJB2jmUlVuLPdvwAuMwjP9K5lzviGs0IMw-diKN0tEKTzCiFsi2FBilPxsHTpDOpA25lnVa51IlWUetCmRj5FUEz-SLGJLidfThmapT5Xa1HaKxDM_AJq02lePzwE2PxeEgWs_9PzVrsiLeh-SJmON-BNXzfhQ2bcqnKPRj3EGesl96xIfm99Kxv2JAEZkqZGLn37LWQi7L6S8A-3wTrFgZt2HdNvSW0fOo-qdN9uIjvR93EWe4lq29Lmf2ezT-ABrn9eAgsEIorLjttLYNAh1po1w1drbkn2-grfgStVZyOVy-fw2YyGvSz_mP6fAJbHkG1yaxy3RY0qvkCTwlqK3lm5fkFbx2E1A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Deep+DNA+Storage%3A+Scalable+and+Robust+DNA+Storage+via+Coding+Theory+and+Deep+Learning&rft.jtitle=arXiv.org&rft.au=Bar-Lev%2C+Daniella&rft.au=Orr%2C+Itai&rft.au=Sabary%2C+Omer&rft.au=Etzion%2C+Tuvi&rft.date=2024-03-11&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422