Creating and Using Minimizer Sketches in Computational Genomics
Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimi...
Saved in:
Published in | Journal of computational biology Vol. 30; no. 12; pp. 1251 - 1276 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
Mary Ann Liebert, Inc., publishers
01.12.2023
|
Subjects | |
Online Access | Get full text |
ISSN | 1557-8666 |
DOI | 10.1089/cmb.2023.0094 |
Cover
Loading…
Abstract | Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future. |
---|---|
AbstractList | Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future. |
Author | Zheng, Hongyu Kingsford, Carl Marçais, Guillaume |
Author_xml | – sequence: 1 givenname: Hongyu orcidid: 0000-0002-7668-2090 surname: Zheng fullname: Zheng, Hongyu organization: Computer Science Department, Princeton University, Princeton, New Jersey, USA – sequence: 2 givenname: Guillaume surname: Marçais fullname: Marçais, Guillaume organization: Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA – sequence: 3 givenname: Carl surname: Kingsford fullname: Kingsford, Carl organization: Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37646787$$D View this record in MEDLINE/PubMed |
BookMark | eNo9kE1PhDAURRujcT506dbwB8BXHrRlZQzR0WSMC501actDq1AIMAv99QMZdXVfbk5u8s6KnfrWE2NXHCIOKruxjYliiDECyJITtuRpKkMlhFiw1TB8AnAUIM_ZAqVIhFRyyW7znvTo_HugfRnshvl6dt417of64PWLRvtBQ-B8kLdNtx8ntvW6Djbk28bZ4YKdVboe6PI312z3cP-WP4bbl81TfrcNLcZShCaWSIKSRGUkFOdVmgEpm051yRNj0RgJILmWIkaylVTllKLkqFAaBFyz6-NutzcNlUXXu0b338XfJxOAR2Cutfe1I0P9-A9yKGZHxeSomB0VsyM8AKl_Woc |
CitedBy_id | crossref_primary_10_1093_bioinformatics_btae736 crossref_primary_10_1109_TCBB_2024_3489478 crossref_primary_10_1093_bioinformatics_btae629 crossref_primary_10_1186_s13059_024_03414_4 crossref_primary_10_1089_cmb_2024_0544 crossref_primary_10_1101_gr_279339_124 crossref_primary_10_1186_s13015_025_00270_0 |
ContentType | Journal Article |
Copyright | Hongyu Zheng, et al., 2023; Published by Mary Ann Liebert, Inc. |
Copyright_xml | – notice: Hongyu Zheng, et al., 2023; Published by Mary Ann Liebert, Inc. |
DBID | 1-M CGR CUY CVF ECM EIF NPM |
DOI | 10.1089/cmb.2023.0094 |
DatabaseName | Mary Ann Liebert Online - Open Access Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: 1-M name: Mary Ann Liebert Online - Open Access url: http://liebertopenaccess.com/OAJournals sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology Mathematics |
EISSN | 1557-8666 |
EndPage | 1276 |
ExternalDocumentID | 37646787 10_1089_cmb_2023_0094 |
Genre | Research Support, U.S. Gov't, Non-P.H.S Review Journal Article Research Support, N.I.H., Extramural |
GrantInformation_xml | – fundername: NHGRI NIH HHS grantid: R01 HG012470 |
GroupedDBID | --- 0R~ 1-M 29K 4.4 53G 5GY ABBKN ACGFO ADBBV AENEX AFOSN ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CS3 D-I DIK DU5 EBS F5P IAO IHR IM4 MV1 NQHIM O9- P2P RIG RML RNS TN5 TR2 UE5 34G 39C ABEFU AI. CAG CGR COF CUY CVF ECM EIF EJD IER IGS ITC NPM R.V RMSOB VH1 |
ID | FETCH-LOGICAL-c3276-b273e6e4489e6811f590e8c5273d14bc3bb70071a7623ecf78d23e6d13837b303 |
IEDL.DBID | 1-M |
IngestDate | Thu Apr 03 06:58:10 EDT 2025 Thu Sep 26 12:00:47 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Keywords | minimizers sketching mer counting read mapping de Bruijn graphs k-mer counting |
Language | English |
License | This Open Access article is distributed under the terms of the Creative Commons License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3276-b273e6e4489e6811f590e8c5273d14bc3bb70071a7623ecf78d23e6d13837b303 |
ORCID | 0000-0002-7668-2090 |
OpenAccessLink | https://www.liebertpub.com/doi/abs/10.1089/cmb.2023.0094 |
PMID | 37646787 |
PageCount | 26 |
ParticipantIDs | pubmed_primary_37646787 maryannliebert_primary_10_1089_cmb_2023_0094 |
PublicationCentury | 2000 |
PublicationDate | 20231201 2023-12-00 |
PublicationDateYYYYMMDD | 2023-12-01 |
PublicationDate_xml | – month: 12 year: 2023 text: 20231201 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Journal of computational biology |
PublicationTitleAlternate | J Comput Biol |
PublicationYear | 2023 |
Publisher | Mary Ann Liebert, Inc., publishers |
Publisher_xml | – name: Mary Ann Liebert, Inc., publishers |
SSID | ssj0013607 |
Score | 2.4484317 |
SecondaryResourceType | review_article |
Snippet | Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has... |
SourceID | pubmed maryannliebert |
SourceType | Index Database Publisher |
StartPage | 1251 |
SubjectTerms | Algorithms Genomics - methods High-Throughput Nucleotide Sequencing - methods Review Article Sequence Analysis, DNA - methods Software |
Title | Creating and Using Minimizer Sketches in Computational Genomics |
URI | https://www.liebertpub.com/doi/abs/10.1089/cmb.2023.0094 https://www.ncbi.nlm.nih.gov/pubmed/37646787 |
Volume | 30 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_mRHCg6PyaX-TBRzvbpkvTJxFxDqFD0MHeSpOmUmRR1vmgf72XpE7UJ1_aPvQj_K65-13ucgdwxkuznBCjpxrF1DgozBMSpzvyuFAwPyqprVuQjtloEt1NB9MW8K-9MC6vWZmkYhyPVdVmbueibhLikgs5E33T9LtvsuJWYDVE0mL-58BLv-MHzI-bipp_HunAhtkVlmvdfOgXo7SWZbgFmw0lJFdOhtvQUroLa65J5HsXOumysmq9A5fXluXpJ5LrgtiAP0krXc2qDzUnD89WCjWpNHH9Gpq1PnKr7P7jehcmw5vH65HXNEHwJA0RQ4H8QjGFXlSiGA-CcpD4iktTN60IIiGpELHhCTlqNaokQl_gmRWBcT0FGqg9aOsXrQ6ACFFwJCSUUbTJoWJ5xErhDwqWoxnjhezB-U9gsldX8SKzkWqeZIhjZnDMDI492HewLW9DZYV6l8eH_3vREayba5ckcgztxfxNnaCpX4hTK1U8ju_TTzCxpP8 |
linkProvider | Mary Ann Liebert, Inc. |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT4NAEJ5ojdEmGq2v-tyDR6lQ6AInYxpr1dKLbdIbYdnFENPVlHrQX-_MQurr5ImEsEC-zc58szP7DcB5kNF2go-Rque7FKBwS6S43JHHtQW3vcw1ugXRkPfH3v2kM_nW6qusa1ZUVIz_Y0w1rW3ai64q4sLLdCpa1PW7RWVxy7BCklwkm-9Y0VcCgdt-Jan5Z0gdNuhYWKJ19aVflNK4lt4WbFackF2Xk7gNS0o3YLXsEvnegHq0kFYtduCqa2iefmKJlsxk_FmU63yaf6gZe3w201CwXLOyYUO12cdulTmAXOzCuHcz6vatqguClbptBFEgwVBcYRgVKh44TtYJbRWkJJwmHU-krhA-EYUEzZqrUsRe4pVLh2JPgR5qD2r6RasDYELIABmJy110ym3FE49nwu5InqAfC2TahIufwMSvpeRFbFLVQRgjjjHhGBOOTdgvYVs8htYKDW_gH_7vRWew1h9Fg3hwN3w4gnW6X1aMHENtPntTJ-j35-LUzPAnRUWnXQ |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLZgCMQkEIzXeObAkW7t0qXpCaHBGI9OSDBpt6ppUlShhWkbB_j1OGk1XidOlaqmrT4n9ufYsQFOeWa2EwL0VP2AGgeFOSLF5Y48riWY62fU1i2I-qw38G-H7eG3Vl9FXrMyScX4P1ZVm7WtxjIrM-LCZjoSDdP1u2HS4hZhCUk2N5Pbc6KvAAJzg7Kk5p8hVVgzx8ISrcsv_aKU1rR0N2C95ITkohDiJiwoXYPlokvkew2q0by06nQLzjuW5ulnkmhJbMSfRLnOR_mHmpDHFyuGKck1KRo2lJt95FrZA8jTbRh0r546PafsguCktIUgCiQYiil0o0LFuOdl7dBVPDWF06Tni5QKERiikKBaoypF7CVemfSM7ynQQu1ARb9qtQdECMmRkVBG0Si3FEt8lgm3LVmCdozLtA5nP4GJx0XJi9iGqnkYI46xwTE2ONZht4Bt_hhqK1S8PNj_34tOYOXhshvf3_TvDmDV3C4SRg6hMpu8qSM0-zNxbAX8CXVUpvA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Creating+and+Using+Minimizer+Sketches+in+Computational+Genomics&rft.jtitle=Journal+of+computational+biology&rft.date=2023-12-01&rft.pub=Mary+Ann+Liebert%2C+Inc.%2C+publishers&rft.eissn=1557-8666&rft.volume=30&rft.issue=12&rft.spage=1251&rft.epage=1276&rft_id=info:doi/10.1089%2Fcmb.2023.0094&rft.externalDocID=10_1089_cmb_2023_0094 |