Error-Correcting Codes for Noisy Duplication Channels
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data stora...
Saved in:
Published in | IEEE transactions on information theory Vol. 67; no. 6; pp. 3452 - 3463 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.06.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only. |
---|---|
AbstractList | Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. An exact duplication inserts a copy of a substring of length <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> of the sequence immediately after that substring, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {ACG}T} </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">k=3 </tex-math></inline-formula>, while a noisy duplication inserts a copy suffering from substitution noise, e.g., <inline-formula> <tex-math notation="LaTeX">\mathsf {ACGT} \to \mathsf {ACG\underline {A \color {Red}{T}}GT} </tex-math></inline-formula>. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only. Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this article, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length [Formula Omitted]. An exact duplication inserts a copy of a substring of length [Formula Omitted] of the sequence immediately after that substring, e.g., [Formula Omitted], where [Formula Omitted], while a noisy duplication inserts a copy suffering from substitution noise, e.g., [Formula Omitted]. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our constructions rely upon recovering the duplication root of the stored codeword. We characterize the ways in which duplication errors manifest in the root of affected sequences and design efficient codes for correcting these error patterns. We show that the proposed construction is asymptotically optimal, in the sense that it has the same asymptotic rate as optimal codes correcting exact duplications only. |
Author | Tang, Yuanyuan Farnoud, Farzad |
Author_xml | – sequence: 1 givenname: Yuanyuan orcidid: 0000-0003-2946-7782 surname: Tang fullname: Tang, Yuanyuan email: yt5tz@virginia.edu organization: Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA – sequence: 2 givenname: Farzad orcidid: 0000-0002-8684-4487 surname: Farnoud fullname: Farnoud, Farzad email: farzad@virginia.edu organization: Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA |
BookMark | eNp9kD1PwzAQhi1UJNrCjsQSiTnFX-fEIwoFKlWwlNlK7Au4KnGx06H_npRWDAzccjrpfe5Oz4SMutAhIdeMzhij-m61WM045WwmKGiq4YyMGUCRawVyRMaUsjLXUpYXZJLSehglMD4mMI8xxLwKMaLtffeeVcFhytoQs5fg0z572G033ta9D11WfdRdh5t0Sc7bepPw6tSn5O1xvqqe8-Xr06K6X-aWa9bnkktEp6hEVVBVNlQ30knKuGhUXeqmdJYrpbSrpbO2cGixbQVKCwCuEKWYktvj3m0MXztMvVmHXeyGk4aDoGwoCkNKHVM2hpQitsb6_ufhPtZ-Yxg1B0VmUGQOisxJ0QDSP-A2-s867v9Dbo6IR8TfuBYgALj4BgnhclQ |
CODEN | IETTAW |
CitedBy_id | crossref_primary_10_1109_MBITS_2024_3355883 crossref_primary_10_1109_TIT_2022_3233733 crossref_primary_10_1109_TIT_2022_3176371 crossref_primary_10_1109_TIT_2021_3125724 crossref_primary_10_1109_TIT_2022_3176917 crossref_primary_10_1109_MBITS_2023_3318516 crossref_primary_10_1109_TMBMC_2024_3403755 |
Cites_doi | 10.1137/080730093 10.1109/ISIT.2018.8437507 10.1109/TIT.2020.3006228 10.1109/ALLERTON.2019.8919847 10.1109/TIT.2017.2778143 10.1038/srep14138 10.1109/ISIT.2017.8007104 10.1109/TIT.2017.2728079 10.1109/ISIT.2018.8437868 10.1109/TIT.2017.2688361 10.3389/fbioe.2017.00057 10.1109/ISIT.2018.8437731 10.1109/ISIT.2019.8849750 10.1109/TIT.2018.2876281 10.1007/s10623-018-0523-0 10.1145/3338514 10.1109/LCOMM.2018.2868666 10.1007/s00438-007-0294-1 10.1109/TIT.1984.1056962 10.1038/s41598-017-05188-1 10.1109/TMBMC.2016.2537305 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TIT.2021.3059095 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1557-9654 |
EndPage | 3463 |
ExternalDocumentID | 10_1109_TIT_2021_3059095 9353552 |
Genre | orig-research |
GrantInformation_xml | – fundername: NSF grantid: 1816409; 1755773 funderid: 10.13039/100000001 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 VH1 VJK AAYOK AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c291t-424eed604e67068b09b4d40123b6a89b8dc26669da4dcc7deceff3e4c555d7383 |
IEDL.DBID | RIE |
ISSN | 0018-9448 |
IngestDate | Mon Jun 30 03:51:31 EDT 2025 Thu Apr 24 23:01:24 EDT 2025 Tue Jul 01 02:16:16 EDT 2025 Wed Aug 27 02:51:10 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 6 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c291t-424eed604e67068b09b4d40123b6a89b8dc26669da4dcc7deceff3e4c555d7383 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-2946-7782 0000-0002-8684-4487 |
PQID | 2530111105 |
PQPubID | 36024 |
PageCount | 12 |
ParticipantIDs | crossref_citationtrail_10_1109_TIT_2021_3059095 proquest_journals_2530111105 ieee_primary_9353552 crossref_primary_10_1109_TIT_2021_3059095 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-06-01 |
PublicationDateYYYYMMDD | 2021-06-01 |
PublicationDate_xml | – month: 06 year: 2021 text: 2021-06-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on information theory |
PublicationTitleAbbrev | TIT |
PublicationYear | 2021 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref14 ref20 ref11 ref22 ref10 ref21 sloane (ref15) 2002; 10 ref2 ref1 ref17 ref16 ref18 ref8 ref7 ref9 ref4 ref3 ref6 ref5 yazdi (ref19) 2017; 7 |
References_xml | – ident: ref2 doi: 10.1137/080730093 – ident: ref10 doi: 10.1109/ISIT.2018.8437507 – ident: ref17 doi: 10.1109/TIT.2020.3006228 – ident: ref16 doi: 10.1109/ALLERTON.2019.8919847 – ident: ref4 doi: 10.1109/TIT.2017.2778143 – ident: ref21 doi: 10.1038/srep14138 – ident: ref6 doi: 10.1109/ISIT.2017.8007104 – ident: ref7 doi: 10.1109/TIT.2017.2728079 – ident: ref14 doi: 10.1109/ISIT.2018.8437868 – ident: ref5 doi: 10.1109/TIT.2017.2688361 – ident: ref12 doi: 10.3389/fbioe.2017.00057 – ident: ref22 doi: 10.1109/ISIT.2018.8437731 – ident: ref13 doi: 10.1109/ISIT.2019.8849750 – ident: ref3 doi: 10.1109/TIT.2018.2876281 – ident: ref9 doi: 10.1007/s10623-018-0523-0 – ident: ref1 doi: 10.1145/3338514 – ident: ref8 doi: 10.1109/LCOMM.2018.2868666 – ident: ref11 doi: 10.1007/s00438-007-0294-1 – ident: ref18 doi: 10.1109/TIT.1984.1056962 – volume: 7 year: 2017 ident: ref19 article-title: Portable and error-free DNA-based data storage publication-title: Sci Rep doi: 10.1038/s41598-017-05188-1 – ident: ref20 doi: 10.1109/TMBMC.2016.2537305 – volume: 10 start-page: 273 year: 2002 ident: ref15 article-title: On single-deletion-correcting codes publication-title: CODES02 |
SSID | ssj0014512 |
Score | 2.4552257 |
Snippet | Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 3452 |
SubjectTerms | Asymptotic properties Codes Data storage DNA DNA storage Error correcting codes Error correction Error correction codes exact tandem duplication Hamming distance Inserts Media Memory Noise measurement noisy tandem duplication Reproduction (copying) Transforms |
Title | Error-Correcting Codes for Noisy Duplication Channels |
URI | https://ieeexplore.ieee.org/document/9353552 https://www.proquest.com/docview/2530111105 |
Volume | 67 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFH9MT3pw6hSnU3rwItgt7ZK0OcqcqLCdNtittMkriLLKth70r_elXwwV8dZDEkLex--9vi-Aa4E-hikjCohh7HIdC1eJhJShl_I4RoMitdXIk6l8nPPnhVi04LaphUHEIvkM-_aziOWbTOf2V9lADQXBIyncHXLcylqtJmLAhVd2BvdIgMnnqEOSTA1mTzNyBH2vP7SVlnaSxBYEFTNVfijiAl0e2jCp71Umlbz2803S15_fWjb-9-KHcFCZmc5dyRdH0MLlMbTrEQ5OJdHHsL_Vj7ADYrxaZSt3ZCd2aJsP7Ywyg2uHDFtnmr2sP5z7vIl3O7YwYUnQegLzh_Fs9OhWcxVc7Stv43KfEzJKxlEGTIYJUwk33BpXiYxDlYRGE2xLZWJutA4MakzTIXIthDABubSnsLvMlngGTqpQkokpApNqzgMdMq38hLQC6ViUMu3CoH7qSFdNx-3si7eocD6Yiog4kSVOVBGnCzfNjvey4cYfazv2rZt11TN3oVdTM6okch0RLzILD0yc_77rAvbs2WUaWA92N6scL8ng2CRXBad9AbIv0ME |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV07b9swED4k6dBkqJtHEaduyyEdOsimaJIShw6Fk8DOw5MDZFMk8gQELazCDxTpb8lf6X_rUS8EadEtQDcNpATxPt19p3sBHCsUGOecJKCGaSBtqgKjMlKGYS7TFB2q3FcjX031-Fqe36ibDXhoa2EQsUw-w76_LGP5rrBr_6tsYIaKzKOoUygv8P4HOWjLz5MTkuZHIc5OZ6NxUM8QCKww4SqQQpIV0FyijriOM24y6aQnEplOY5PFzpKJ0sal0lkbObSY50OUVinlInLf6L6b8IJ4hhJVdVgbo5AqrHqRh6QyyMtpgqDcDGaTGbmeIuwPfW2nn13xyOiVU1z-UP2lPTvrwK_mJKo0lq_99Srr259PmkT-r0f1Gl7VRJp9qZC_Cxs434NOM6SC1TprD3YedVzcB3W6WBSLYORnklif8c1GhcMlI-rOpsXd8p6drNuIPvOlF3MiDwdw_Syv8ga25sUcD4HlBjWRaBW53EoZ2ZhbIzLSe2RFUOu8C4NGtImt26r76R7fktK94iYhMCQeDEkNhi58and8r1qK_GPtvpdtu64Waxd6DXqSWucsE_rauDeAXB39fdcHeDmeXV0ml5PpxVvY9s-pkt56sLVarPEd0atV9r5EOYPb58bKb5fcL0g |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Error-Correcting+Codes+for+Noisy+Duplication+Channels&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Tang%2C+Yuanyuan&rft.au=Farnoud%2C+Farzad&rft.date=2021-06-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=67&rft.issue=6&rft.spage=3452&rft_id=info:doi/10.1109%2FTIT.2021.3059095&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon |