Error-Correcting Codes for Noisy Duplication Channels

Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data stora...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 67; no. 6; pp. 3452 - 3463
Main Authors Tang, Yuanyuan, Farnoud, Farzad
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.
AbstractList Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length [Formula Omitted]. An exact duplication inserts a copy of a substring of length [Formula Omitted] of the sequence immediately after that substring, e.g., [Formula Omitted], where [Formula Omitted], while a noisy duplication inserts a copy suffering from substitution noise, e.g., [Formula Omitted]. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only.
Author Tang, Yuanyuan
Farnoud, Farzad
Author_xml – sequence: 1
  givenname: Yuanyuan
  orcidid: 0000-0003-2946-7782
  surname: Tang
  fullname: Tang, Yuanyuan
  email: yt5tz@virginia.edu
  organization: Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA
– sequence: 2
  givenname: Farzad
  orcidid: 0000-0002-8684-4487
  surname: Farnoud
  fullname: Farnoud, Farzad
  email: farzad@virginia.edu
  organization: Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA
BookMark eNp9kD1PwzAQhi1UJNrCjsQSiTnFX-fEIwoFKlWwlNlK7Au4KnGx06H_npRWDAzccjrpfe5Oz4SMutAhIdeMzhij-m61WM045WwmKGiq4YyMGUCRawVyRMaUsjLXUpYXZJLSehglMD4mMI8xxLwKMaLtffeeVcFhytoQs5fg0z572G033ta9D11WfdRdh5t0Sc7bepPw6tSn5O1xvqqe8-Xr06K6X-aWa9bnkktEp6hEVVBVNlQ30knKuGhUXeqmdJYrpbSrpbO2cGixbQVKCwCuEKWYktvj3m0MXztMvVmHXeyGk4aDoGwoCkNKHVM2hpQitsb6_ufhPtZ-Yxg1B0VmUGQOisxJ0QDSP-A2-s867v9Dbo6IR8TfuBYgALj4BgnhclQ
CODEN IETTAW
CitedBy_id crossref_primary_10_1109_MBITS_2024_3355883
crossref_primary_10_1109_TIT_2022_3233733
crossref_primary_10_1109_TIT_2022_3176371
crossref_primary_10_1109_TIT_2021_3125724
crossref_primary_10_1109_TIT_2022_3176917
crossref_primary_10_1109_MBITS_2023_3318516
crossref_primary_10_1109_TMBMC_2024_3403755
Cites_doi 10.1137/080730093
10.1109/ISIT.2018.8437507
10.1109/TIT.2020.3006228
10.1109/ALLERTON.2019.8919847
10.1109/TIT.2017.2778143
10.1038/srep14138
10.1109/ISIT.2017.8007104
10.1109/TIT.2017.2728079
10.1109/ISIT.2018.8437868
10.1109/TIT.2017.2688361
10.3389/fbioe.2017.00057
10.1109/ISIT.2018.8437731
10.1109/ISIT.2019.8849750
10.1109/TIT.2018.2876281
10.1007/s10623-018-0523-0
10.1145/3338514
10.1109/LCOMM.2018.2868666
10.1007/s00438-007-0294-1
10.1109/TIT.1984.1056962
10.1038/s41598-017-05188-1
10.1109/TMBMC.2016.2537305
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TIT.2021.3059095
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1557-9654
EndPage 3463
ExternalDocumentID 10_1109_TIT_2021_3059095
9353552
Genre orig-research
GrantInformation_xml – fundername: NSF
  grantid: 1816409; 1755773
  funderid: 10.13039/100000001
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
VH1
VJK
AAYOK
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-424eed604e67068b09b4d40123b6a89b8dc26669da4dcc7deceff3e4c555d7383
IEDL.DBID RIE
ISSN 0018-9448
IngestDate Mon Jun 30 03:51:31 EDT 2025
Thu Apr 24 23:01:24 EDT 2025
Tue Jul 01 02:16:16 EDT 2025
Wed Aug 27 02:51:10 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-424eed604e67068b09b4d40123b6a89b8dc26669da4dcc7deceff3e4c555d7383
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-2946-7782
0000-0002-8684-4487
PQID 2530111105
PQPubID 36024
PageCount 12
ParticipantIDs crossref_citationtrail_10_1109_TIT_2021_3059095
proquest_journals_2530111105
ieee_primary_9353552
crossref_primary_10_1109_TIT_2021_3059095
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-06-01
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on information theory
PublicationTitleAbbrev TIT
PublicationYear 2021
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref14
ref20
ref11
ref22
ref10
ref21
sloane (ref15) 2002; 10
ref2
ref1
ref17
ref16
ref18
ref8
ref7
ref9
ref4
ref3
ref6
ref5
yazdi (ref19) 2017; 7
References_xml – ident: ref2
  doi: 10.1137/080730093
– ident: ref10
  doi: 10.1109/ISIT.2018.8437507
– ident: ref17
  doi: 10.1109/TIT.2020.3006228
– ident: ref16
  doi: 10.1109/ALLERTON.2019.8919847
– ident: ref4
  doi: 10.1109/TIT.2017.2778143
– ident: ref21
  doi: 10.1038/srep14138
– ident: ref6
  doi: 10.1109/ISIT.2017.8007104
– ident: ref7
  doi: 10.1109/TIT.2017.2728079
– ident: ref14
  doi: 10.1109/ISIT.2018.8437868
– ident: ref5
  doi: 10.1109/TIT.2017.2688361
– ident: ref12
  doi: 10.3389/fbioe.2017.00057
– ident: ref22
  doi: 10.1109/ISIT.2018.8437731
– ident: ref13
  doi: 10.1109/ISIT.2019.8849750
– ident: ref3
  doi: 10.1109/TIT.2018.2876281
– ident: ref9
  doi: 10.1007/s10623-018-0523-0
– ident: ref1
  doi: 10.1145/3338514
– ident: ref8
  doi: 10.1109/LCOMM.2018.2868666
– ident: ref11
  doi: 10.1007/s00438-007-0294-1
– ident: ref18
  doi: 10.1109/TIT.1984.1056962
– volume: 7
  year: 2017
  ident: ref19
  article-title: Portable and error-free DNA-based data storage
  publication-title: Sci Rep
  doi: 10.1038/s41598-017-05188-1
– ident: ref20
  doi: 10.1109/TMBMC.2016.2537305
– volume: 10
  start-page: 273
  year: 2002
  ident: ref15
  article-title: On single-deletion-correcting codes
  publication-title: CODES02
SSID ssj0014512
Score 2.4552257
Snippet Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3452
SubjectTerms Asymptotic properties
Codes
Data storage
DNA
DNA storage
Error correcting codes
Error correction
Error correction codes
exact tandem duplication
Hamming distance
Inserts
Media
Memory
Noise measurement
noisy tandem duplication
Reproduction (copying)
Transforms
Title Error-Correcting Codes for Noisy Duplication Channels
URI https://ieeexplore.ieee.org/document/9353552
https://www.proquest.com/docview/2530111105
Volume 67
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFH9MT3pw6hSnU3rwItgt7ZK0OcqcqLCdNtittMkriLLKth70r_elXwwV8dZDEkLex--9vi-Aa4E-hikjCohh7HIdC1eJhJShl_I4RoMitdXIk6l8nPPnhVi04LaphUHEIvkM-_aziOWbTOf2V9lADQXBIyncHXLcylqtJmLAhVd2BvdIgMnnqEOSTA1mTzNyBH2vP7SVlnaSxBYEFTNVfijiAl0e2jCp71Umlbz2803S15_fWjb-9-KHcFCZmc5dyRdH0MLlMbTrEQ5OJdHHsL_Vj7ADYrxaZSt3ZCd2aJsP7Ywyg2uHDFtnmr2sP5z7vIl3O7YwYUnQegLzh_Fs9OhWcxVc7Stv43KfEzJKxlEGTIYJUwk33BpXiYxDlYRGE2xLZWJutA4MakzTIXIthDABubSnsLvMlngGTqpQkokpApNqzgMdMq38hLQC6ViUMu3CoH7qSFdNx-3si7eocD6Yiog4kSVOVBGnCzfNjvey4cYfazv2rZt11TN3oVdTM6okch0RLzILD0yc_77rAvbs2WUaWA92N6scL8ng2CRXBad9AbIv0ME
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV07b9swED4k6dBkqJtHEaduyyEdOsimaJIShw6Fk8DOw5MDZFMk8gQELazCDxTpb8lf6X_rUS8EadEtQDcNpATxPt19p3sBHCsUGOecJKCGaSBtqgKjMlKGYS7TFB2q3FcjX031-Fqe36ibDXhoa2EQsUw-w76_LGP5rrBr_6tsYIaKzKOoUygv8P4HOWjLz5MTkuZHIc5OZ6NxUM8QCKww4SqQQpIV0FyijriOM24y6aQnEplOY5PFzpKJ0sal0lkbObSY50OUVinlInLf6L6b8IJ4hhJVdVgbo5AqrHqRh6QyyMtpgqDcDGaTGbmeIuwPfW2nn13xyOiVU1z-UP2lPTvrwK_mJKo0lq_99Srr259PmkT-r0f1Gl7VRJp9qZC_Cxs434NOM6SC1TprD3YedVzcB3W6WBSLYORnklif8c1GhcMlI-rOpsXd8p6drNuIPvOlF3MiDwdw_Syv8ga25sUcD4HlBjWRaBW53EoZ2ZhbIzLSe2RFUOu8C4NGtImt26r76R7fktK94iYhMCQeDEkNhi58and8r1qK_GPtvpdtu64Waxd6DXqSWucsE_rauDeAXB39fdcHeDmeXV0ml5PpxVvY9s-pkt56sLVarPEd0atV9r5EOYPb58bKb5fcL0g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Error-Correcting+Codes+for+Noisy+Duplication+Channels&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Tang%2C+Yuanyuan&rft.au=Farnoud%2C+Farzad&rft.date=2021-06-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=67&rft.issue=6&rft.spage=3452&rft_id=info:doi/10.1109%2FTIT.2021.3059095&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon