A variant selection framework for genome graphs
Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is na...
Saved in:
Published in | Bioinformatics (Oxford, England) Vol. 37; no. Supplement_1; pp. i460 - i467 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Oxford University Press
12.07.2021
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Abstract
Motivation
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.
Results
In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.
Availability and implementation
https://github.com/AT-CG/VF.
Supplementary information
Supplementary data are available at Bioinformatics online. |
---|---|
AbstractList | Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online. Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online. Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.MOTIVATIONVariation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.RESULTSIn this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.: https://github.com/AT-CG/VF.AVAILABILITY AND IMPLEMENTATION: https://github.com/AT-CG/VF.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online. |
Author | Aluru, Srinivas Jain, Chirag Tavakoli, Neda |
AuthorAffiliation | 1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India 2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA |
AuthorAffiliation_xml | – name: 2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA – name: 1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India |
Author_xml | – sequence: 1 givenname: Chirag surname: Jain fullname: Jain, Chirag email: chirag@iisc.ac.in – sequence: 2 givenname: Neda surname: Tavakoli fullname: Tavakoli, Neda – sequence: 3 givenname: Srinivas surname: Aluru fullname: Aluru, Srinivas |
BackLink | https://www.osti.gov/biblio/1807640$$D View this record in Osti.gov |
BookMark | eNqNUV1rHCEUlZDSfPUvhKF96ct2rzrOKJRCCElaCPSleRbHve6azuhWnYT--xh2W0he2qcr3PPhueeEHIYYkJBzCp8oKL4cfPTBxTSZ4m1eDsUMHNgBOaa86xetpPTw7xv4ETnJ-R4ABIjuLTniLRNMteKYLC-aB5O8CaXJOKItPobGJTPhY0w_m-rQrDHECZt1MttNPiNvnBkzvtvPU3J3ffXj8uvi9vvNt8uL24VtpSwLBUZyJQ0d0DnWQS9UZwTvKXV2pTpkioLr-coKCgJbZNhyJpVaOTVIyy0_JV92utt5mHBlMZRkRr1NfjLpt47G65eb4Dd6HR-05LwTilWB9zuBmIvX2fqCdmNjCDWjphL6roUK-rh3SfHXjLnoyWeL42gCxjlrJgRloISUFfrhFfQ-zinUG2hOmQTKGRUV9XmHsinmnNDpamyej1o_6UdNQT_Xp1_Wp_f1VXr3iv4n8D-JdJ923v4v5wmASriO |
CitedBy_id | crossref_primary_10_1007_s11047_022_09882_6 crossref_primary_10_1089_cmb_2024_0601 crossref_primary_10_1101_gr_279143_124 |
Cites_doi | 10.1186/s13059-019-1774-4 10.1016/S0890-5401(03)00057-9 10.1089/cmb.2019.0309 10.1089/cmb.2019.0066 10.1093/bioinformatics/btw371 10.1371/journal.pone.0109384 10.1186/s13059-020-02168-z 10.1093/bioinformatics/btaa265 10.1186/s13059-018-1595-x 10.1093/bioinformatics/btz575 10.1038/ng.3257 10.1186/s13059-020-02157-2 10.1093/bioinformatics/btu756 10.1016/j.cell.2018.12.019 10.1093/bioinformatics/btr330 10.1038/ng.3964 10.1038/nature15393 10.1007/s00453-003-1028-3 10.1038/ng.1028 10.1186/s13059-020-01963-y 10.1093/bioinformatics/btz341 10.1146/annurev-genom-120219-080406 10.1101/gr.214155.116 10.1007/978-3-319-43681-4_18 10.1093/bioinformatics/bts378 10.1093/nar/gks425 10.1186/s13059-019-1828-7 10.1186/gb-2009-10-9-r98 10.2140/pjm.1965.15.835 10.1109/TCBB.2013.2297101 10.1038/nbt.4227 10.1093/bioinformatics/btaa446 |
ContentType | Journal Article |
Copyright | The Author(s) 2021. Published by Oxford University Press. 2021 The Author(s) 2021. Published by Oxford University Press. |
Copyright_xml | – notice: The Author(s) 2021. Published by Oxford University Press. 2021 – notice: The Author(s) 2021. Published by Oxford University Press. |
CorporateAuthor | Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC) |
CorporateAuthor_xml | – name: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC) |
DBID | TOX AAYXX CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 OTOTI 5PM |
DOI | 10.1093/bioinformatics/btab302 |
DatabaseName | Oxford Academic Open Access Journals CrossRef Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic OSTI.GOV PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
DatabaseTitleList | Materials Research Database MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology Mathematics |
DocumentTitleAlternate | ISMB/ECCB 2021 Proceedings |
EISSN | 1367-4811 |
EndPage | i467 |
ExternalDocumentID | PMC8336592 1807640 10_1093_bioinformatics_btab302 10.1093/bioinformatics/btab302 |
GrantInformation_xml | – fundername: ; ; grantid: DE-AC02-05CH11231 – fundername: ; ; – fundername: ; ; grantid: CCF-1816027 |
GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABIXL ABNKS ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFS ACIWK ACMRT ACPRK ACUFI ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADRDM ADRIX ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BCRHZ BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL ROX RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H XJT YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX ABGNP ABPQP ACUXJ ADMLS AMNDL CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 OTOTI 5PM |
ID | FETCH-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3 |
IEDL.DBID | TOX |
ISSN | 1367-4803 1367-4811 |
IngestDate | Thu Aug 21 14:08:37 EDT 2025 Mon Mar 17 03:27:52 EDT 2025 Fri Jul 11 08:09:17 EDT 2025 Mon Jun 30 10:51:57 EDT 2025 Thu Apr 24 23:12:27 EDT 2025 Tue Jul 01 02:33:55 EDT 2025 Fri Nov 15 02:52:49 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | Supplement_1 |
Language | English |
License | This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 USDOE USDOE Office of Science (SC) National Science Foundation (NSF) AC02-05CH11231; CCF-1816027 |
OpenAccessLink | https://dx.doi.org/10.1093/bioinformatics/btab302 |
PMID | 34252945 |
PQID | 3128013215 |
PQPubID | 36124 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_8336592 osti_scitechconnect_1807640 proquest_miscellaneous_2551209588 proquest_journals_3128013215 crossref_citationtrail_10_1093_bioinformatics_btab302 crossref_primary_10_1093_bioinformatics_btab302 oup_primary_10_1093_bioinformatics_btab302 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20210712 |
PublicationDateYYYYMMDD | 2021-07-12 |
PublicationDate_xml | – month: 7 year: 2021 text: 20210712 day: 12 |
PublicationDecade | 2020 |
PublicationPlace | Oxford |
PublicationPlace_xml | – name: Oxford – name: United States |
PublicationTitle | Bioinformatics (Oxford, England) |
PublicationYear | 2021 |
Publisher | Oxford University Press Oxford Publishing Limited (England) |
Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
References | Consortium (2023062410281887400_btab302-B5) 2015; 526 Garrison (2023062410281887400_btab302-B13) 2018; 36 Ivanov (2023062410281887400_btab302-B18) 2020 Rautiainen (2023062410281887400_btab302-B35) 2020; 21 Fulkerson (2023062410281887400_btab302-B12) 1965; 15 Lanctot (2023062410281887400_btab302-B25) 2003; 185 Marcus (2023062410281887400_btab302-B30) 2014; 30 Dilthey (2023062410281887400_btab302-B9) 2015; 47 Ballouz (2023062410281887400_btab302-B2) 2019; 20 Jain (2023062410281887400_btab302-B19) 2019 Iqbal (2023062410281887400_btab302-B17) 2012; 44 van den Brand (2023062410281887400_btab302-B39) 2020 Ghaffaari (2023062410281887400_btab302-B14) 2019; 35 Li (2023062410281887400_btab302-B26) 2020; 21 Liu (2023062410281887400_btab302-B27) 2016; 32 Eggertsson (2023062410281887400_btab302-B10) 2017; 49 Pritt (2023062410281887400_btab302-B33) 2018; 19 Chang (2023062410281887400_btab302-B3) 2020; 36 Holley (2023062410281887400_btab302-B16) 2016; 11 Rausch (2023062410281887400_btab302-B34) 2012; 28 Jain (2023062410281887400_btab302-B20) 2019 Kuosmanen (2023062410281887400_btab302-B24) 2018 Vijaya (2023062410281887400_btab302-B40) 2012; 40 Audano (2023062410281887400_btab302-B1) 2019; 176 Gramm (2023062410281887400_btab302-B15) 2003; 37 Mokveld (2023062410281887400_btab302-B31) 2020; 21 Danek (2023062410281887400_btab302-B7) 2014; 9 Mahmoud (2023062410281887400_btab302-B29) 2019; 20 Schneeberger (2023062410281887400_btab302-B36) 2009; 10 Eizenga (2023062410281887400_btab302-B11) 2020; 21 Kuhnle (2023062410281887400_btab302-B23) 2020; 27 Kim (2023062410281887400_btab302-B22) 2018 (2023062410281887400_btab302-B4) 2018; 19 Sirén (2023062410281887400_btab302-B38) 2020; 36 Darby (2023062410281887400_btab302-B8) 2020; 36 Jain (2023062410281887400_btab302-B21) 2020; 27 Sirén (2023062410281887400_btab302-B37) 2014; 11 Maciuca (2023062410281887400_btab302-B28) 2016 Danecek (2023062410281887400_btab302-B6) 2011; 27 Paten (2023062410281887400_btab302-B32) 2017; 27 |
References_xml | – start-page: 259 year: 2020 ident: 2023062410281887400_btab302-B39 article-title: A deterministic linear program solver in current matrix multiplication time – volume: 20 start-page: 1 year: 2019 ident: 2023062410281887400_btab302-B2 article-title: Is it time to change the reference genome? publication-title: Genome Biol doi: 10.1186/s13059-019-1774-4 – volume: 185 start-page: 41 year: 2003 ident: 2023062410281887400_btab302-B25 article-title: Distinguishing string selection problems publication-title: Inf. Comput doi: 10.1016/S0890-5401(03)00057-9 – volume: 27 start-page: 500 year: 2020 ident: 2023062410281887400_btab302-B23 article-title: Efficient construction of a complete index for pan-genomics read alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.2019.0309 – volume: 27 start-page: 640 year: 2020 ident: 2023062410281887400_btab302-B21 article-title: On the complexity of sequence-to-graph alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.2019.0066 – volume: 32 start-page: 3224 year: 2016 ident: 2023062410281887400_btab302-B27 article-title: debga: read alignment with de Bruijn graph-based seed and extension publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw371 – volume: 9 start-page: e109384 year: 2014 ident: 2023062410281887400_btab302-B7 article-title: Indexes of large genome collections on a PC publication-title: PLoS One doi: 10.1371/journal.pone.0109384 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B26 article-title: The design and construction of reference pangenome graphs with minigraph publication-title: Genome Biol doi: 10.1186/s13059-020-02168-z – volume: 36 start-page: 3712 year: 2020 ident: 2023062410281887400_btab302-B8 article-title: Vargas: heuristic-free alignment for assessing linear and graph read aligners publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa265 – volume: 19 start-page: 1 year: 2018 ident: 2023062410281887400_btab302-B33 article-title: Forge: prioritizing variants for graph genomes publication-title: Genome Biol doi: 10.1186/s13059-018-1595-x – volume: 36 start-page: 400 year: 2020 ident: 2023062410281887400_btab302-B38 article-title: Haplotype-aware graph indexes publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz575 – volume: 47 start-page: 682 year: 2015 ident: 2023062410281887400_btab302-B9 article-title: Improved genome inference in the MHC using a population reference graph publication-title: Nat. Genet doi: 10.1038/ng.3257 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B35 article-title: Graphaligner: rapid and versatile sequence-to-graph alignment publication-title: Genome Biol doi: 10.1186/s13059-020-02157-2 – volume: 30 start-page: 3476 year: 2014 ident: 2023062410281887400_btab302-B30 article-title: Splitmem: a graphical algorithm for pan-genome analysis with suffix skips publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu756 – volume: 176 start-page: 663 year: 2019 ident: 2023062410281887400_btab302-B1 article-title: Characterizing the major structural variant alleles of the human genome publication-title: Cell doi: 10.1016/j.cell.2018.12.019 – start-page: 17:1 year: 2019 ident: 2023062410281887400_btab302-B20 article-title: Validating paired-end read alignments in sequence graphs – volume: 27 start-page: 2156 year: 2011 ident: 2023062410281887400_btab302-B6 article-title: The variant call format and vcftools publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr330 – volume: 49 start-page: 1654 year: 2017 ident: 2023062410281887400_btab302-B10 article-title: Graphtyper enables population-scale genotyping using pangenome graphs publication-title: Nat. Genet doi: 10.1038/ng.3964 – volume: 526 start-page: 68 year: 2015 ident: 2023062410281887400_btab302-B5 article-title: A global reference for human genetic variation publication-title: Nature doi: 10.1038/nature15393 – volume: 37 start-page: 25 year: 2003 ident: 2023062410281887400_btab302-B15 article-title: Fixed-parameter algorithms for closest string and related problems publication-title: Algorithmica doi: 10.1007/s00453-003-1028-3 – volume: 44 start-page: 226 year: 2012 ident: 2023062410281887400_btab302-B17 article-title: De novo assembly and genotyping of variants using colored de bruijn graphs publication-title: Nat. Genet doi: 10.1038/ng.1028 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B31 article-title: Chop: haplotype-aware path indexing in population graphs publication-title: Genome Biol doi: 10.1186/s13059-020-01963-y – start-page: 266197 year: 2018 ident: 2023062410281887400_btab302-B22 article-title: Hisat-genotype: next generation genomic analysis platform on a personal computer publication-title: BioRxiv – volume: 19 start-page: 118 year: 2018 ident: 2023062410281887400_btab302-B4 article-title: Computational pan-genomics: status, promises and challenges publication-title: Brief. Bioinform – volume: 35 start-page: i81 year: 2019 ident: 2023062410281887400_btab302-B14 article-title: Fully-sensitive seed finding in sequence graphs using a hybrid index publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz341 – volume: 21 start-page: 139 year: 2020 ident: 2023062410281887400_btab302-B11 article-title: Pangenome graphs publication-title: Annu. Rev. Genomics Hum. Genet doi: 10.1146/annurev-genom-120219-080406 – volume: 27 start-page: 665 year: 2017 ident: 2023062410281887400_btab302-B32 article-title: Genome graphs and the evolution of genome inference publication-title: Genome Res doi: 10.1101/gr.214155.116 – start-page: 105 year: 2018 ident: 2023062410281887400_btab302-B24 article-title: Using minimum path cover to boost dynamic programming on DAGs: co-linear chaining extended – start-page: 222 volume-title: International Workshop on Algorithms in Bioinformatics year: 2016 ident: 2023062410281887400_btab302-B28 doi: 10.1007/978-3-319-43681-4_18 – volume: 28 start-page: i333 year: 2012 ident: 2023062410281887400_btab302-B34 article-title: Delly: structural variant discovery by integrated paired-end and split-read analysis publication-title: Bioinformatics doi: 10.1093/bioinformatics/bts378 – volume: 40 start-page: e127 year: 2012 ident: 2023062410281887400_btab302-B40 article-title: A new strategy to reduce allelic bias in RNA-seq readmapping publication-title: Nucleic Acids Res doi: 10.1093/nar/gks425 – volume: 11 start-page: 1 year: 2016 ident: 2023062410281887400_btab302-B16 article-title: Bloom filter trie: an alignment-free and reference-free data structure for pan-genome storage publication-title: Algor. Mol. Biol – volume: 20 start-page: 1 year: 2019 ident: 2023062410281887400_btab302-B29 article-title: Structural variant calling: the long and the short of it publication-title: Genome Biol doi: 10.1186/s13059-019-1828-7 – volume: 10 start-page: R98 year: 2009 ident: 2023062410281887400_btab302-B36 article-title: Simultaneous alignment of short reads against multiple genomes publication-title: Genome Biol doi: 10.1186/gb-2009-10-9-r98 – volume: 15 start-page: 835 year: 1965 ident: 2023062410281887400_btab302-B12 article-title: Incidence matrices and interval graphs publication-title: Pac. J. Math doi: 10.2140/pjm.1965.15.835 – start-page: 104 year: 2020 ident: 2023062410281887400_btab302-B18 article-title: Astarix: fast and optimal sequence-to-graph alignment – start-page: 451 volume-title: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) year: 2019 ident: 2023062410281887400_btab302-B19 article-title: Accelerating sequence alignment to graphs – volume: 11 start-page: 375 year: 2014 ident: 2023062410281887400_btab302-B37 article-title: Indexing graphs for path queries with applications in genome research publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform doi: 10.1109/TCBB.2013.2297101 – volume: 36 start-page: 875 year: 2018 ident: 2023062410281887400_btab302-B13 article-title: Variation graph toolkit improves read mapping by representing genetic variation in the reference publication-title: Nat. Biotechnol doi: 10.1038/nbt.4227 – volume: 36 start-page: i146 year: 2020 ident: 2023062410281887400_btab302-B3 article-title: Distance indexing and seed clustering in sequence graphs publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa446 |
SSID | ssj0005056 |
Score | 2.4095838 |
Snippet | Abstract
Motivation
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to... Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture... Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population... Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture... |
SourceID | pubmedcentral osti proquest crossref oup |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | i460 |
SubjectTerms | Algorithms Availability BASIC BIOLOGICAL SCIENCES Bias Bioinformatics Chromosome 1 Chromosomes Gene mapping General Computational Biology Genetic diversity Genetic variance Genomes Genomic analysis Graph representations Graphical representations Graphs Mapping Mathematics Nucleotides Parameters Population genetics Single-nucleotide polymorphism Size reduction |
Title | A variant selection framework for genome graphs |
URI | https://www.proquest.com/docview/3128013215 https://www.proquest.com/docview/2551209588 https://www.osti.gov/biblio/1807640 https://pubmed.ncbi.nlm.nih.gov/PMC8336592 |
Volume | 37 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KQfAiPjG2SgRPQshj89gci1iKoF5a6G1JNrtY0ERMKvjvncmjmoKo581ulpkd5htm5huAK_QxmROryEoSxS2fhaHFYwRykuC1Tl2Z1Fx69w_hbOHfLYPlANyuF2Y7hR8zO10VLYkoERfbaZWkrKaPRE9MbPnzx-VXUYdTz2slHjLL5w7reoJ_PKbnjoYFmtVWpxsBzn655Df_M92HvRY4mpNG0wcwUPkh7DSjJD-OwJ6Y7xj2opzMsh5tg_I2dVd5ZeKNTKJjfVFmTVFdHsNieju_mVntMARLoo1VVuwkiGV44qZKawxCoiAOk4BFrqtlFofKQ7-vI5ZJQkzKV57yGQVTmY5TLplkJzDMi1ydgpkxjV9rKX3F0GRlGqPdRlprpUMlXWZA0MlEyJYpnAZWPIsmY81EX5ailaUB9mbfa8OV8euOEYlcoLcnylpJtT2yEi53otB3DLhGTfz5qHGnMNGaYSkYel9KJrmBAZebZTQgyookuSrWpcCYivqHA84NiHqK3vyZKLj7K_nqqabi5oxRXvrsPzcdwa5HhTHEzumNYVi9rdU5Ipsqvagf8yfnQv1K |
linkProvider | Oxford University Press |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+variant+selection+framework+for+genome+graphs&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jain%2C+Chirag&rft.au=Tavakoli%2C+Neda&rft.au=Aluru%2C+Srinivas&rft.date=2021-07-12&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.volume=37&rft.issue=Supplement_1&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab302&rft.externalDocID=1807640 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |