A variant selection framework for genome graphs

Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is na...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 37; no. Supplement_1; pp. i460 - i467
Main Authors	Jain, Chirag, Tavakoli, Neda, Aluru, Srinivas
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 12.07.2021 Oxford Publishing Limited (England)
Subjects	Algorithms Availability BASIC BIOLOGICAL SCIENCES Bias Bioinformatics Chromosome 1 Chromosomes Gene mapping General Computational Biology Genetic diversity Genetic variance Genomes Genomic analysis Graph representations Graphical representations Graphs Mapping Mathematics Nucleotides Parameters Population genetics Single-nucleotide polymorphism Size reduction
Online Access	Get full text

Cover

Loading…

Abstract	Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList	Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online. Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online. Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.MOTIVATIONVariation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.RESULTSIn this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.: https://github.com/AT-CG/VF.AVAILABILITY AND IMPLEMENTATION: https://github.com/AT-CG/VF.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Author	Aluru, Srinivas Jain, Chirag Tavakoli, Neda
AuthorAffiliation	1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India 2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
AuthorAffiliation_xml	– name: 2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA – name: 1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India
Author_xml	– sequence: 1 givenname: Chirag surname: Jain fullname: Jain, Chirag email: chirag@iisc.ac.in – sequence: 2 givenname: Neda surname: Tavakoli fullname: Tavakoli, Neda – sequence: 3 givenname: Srinivas surname: Aluru fullname: Aluru, Srinivas
BackLink	https://www.osti.gov/biblio/1807640$$D View this record in Osti.gov
BookMark	eNqNUV1rHCEUlZDSfPUvhKF96ct2rzrOKJRCCElaCPSleRbHve6azuhWnYT--xh2W0he2qcr3PPhueeEHIYYkJBzCp8oKL4cfPTBxTSZ4m1eDsUMHNgBOaa86xetpPTw7xv4ETnJ-R4ABIjuLTniLRNMteKYLC-aB5O8CaXJOKItPobGJTPhY0w_m-rQrDHECZt1MttNPiNvnBkzvtvPU3J3ffXj8uvi9vvNt8uL24VtpSwLBUZyJQ0d0DnWQS9UZwTvKXV2pTpkioLr-coKCgJbZNhyJpVaOTVIyy0_JV92utt5mHBlMZRkRr1NfjLpt47G65eb4Dd6HR-05LwTilWB9zuBmIvX2fqCdmNjCDWjphL6roUK-rh3SfHXjLnoyWeL42gCxjlrJgRloISUFfrhFfQ-zinUG2hOmQTKGRUV9XmHsinmnNDpamyej1o_6UdNQT_Xp1_Wp_f1VXr3iv4n8D-JdJ923v4v5wmASriO
CitedBy_id	crossref_primary_10_1007_s11047_022_09882_6 crossref_primary_10_1089_cmb_2024_0601 crossref_primary_10_1101_gr_279143_124
Cites_doi	10.1186/s13059-019-1774-4 10.1016/S0890-5401(03)00057-9 10.1089/cmb.2019.0309 10.1089/cmb.2019.0066 10.1093/bioinformatics/btw371 10.1371/journal.pone.0109384 10.1186/s13059-020-02168-z 10.1093/bioinformatics/btaa265 10.1186/s13059-018-1595-x 10.1093/bioinformatics/btz575 10.1038/ng.3257 10.1186/s13059-020-02157-2 10.1093/bioinformatics/btu756 10.1016/j.cell.2018.12.019 10.1093/bioinformatics/btr330 10.1038/ng.3964 10.1038/nature15393 10.1007/s00453-003-1028-3 10.1038/ng.1028 10.1186/s13059-020-01963-y 10.1093/bioinformatics/btz341 10.1146/annurev-genom-120219-080406 10.1101/gr.214155.116 10.1007/978-3-319-43681-4_18 10.1093/bioinformatics/bts378 10.1093/nar/gks425 10.1186/s13059-019-1828-7 10.1186/gb-2009-10-9-r98 10.2140/pjm.1965.15.835 10.1109/TCBB.2013.2297101 10.1038/nbt.4227 10.1093/bioinformatics/btaa446
ContentType	Journal Article
Copyright	The Author(s) 2021. Published by Oxford University Press. 2021 The Author(s) 2021. Published by Oxford University Press.
Copyright_xml	– notice: The Author(s) 2021. Published by Oxford University Press. 2021 – notice: The Author(s) 2021. Published by Oxford University Press.
CorporateAuthor	Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
CorporateAuthor_xml	– name: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
DBID	TOX AAYXX CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 OTOTI 5PM
DOI	10.1093/bioinformatics/btab302
DatabaseName	Oxford Academic Open Access Journals CrossRef Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic OSTI.GOV PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic
DatabaseTitleList	Materials Research Database MEDLINE - Academic
Database_xml	– sequence: 1 dbid: TOX name: Oxford Journals Open Access Collection url: https://academic.oup.com/journals/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology Mathematics
DocumentTitleAlternate	ISMB/ECCB 2021 Proceedings
EISSN	1367-4811
EndPage	i467
ExternalDocumentID	PMC8336592 1807640 10_1093_bioinformatics_btab302 10.1093/bioinformatics/btab302
GrantInformation_xml	– fundername: ; ; grantid: DE-AC02-05CH11231 – fundername: ; ; – fundername: ; ; grantid: CCF-1816027
GroupedDBID	--- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEJV ABEUO ABIXL ABNKS ABPTD ABQLI ABQTQ ABWST ABXVV ABZBJ ACGFS ACIWK ACMRT ACPRK ACUFI ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADRDM ADRIX ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC APIBT APWMN AQDSO ARIXL ASPBG ATTQO AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BCRHZ BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE COF CS3 CZ4 DAKXR DIK DILTD DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HVGLF HW0 HZ~ IOX J21 JXSIZ KAQDR KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL ROX RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H XJT YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM AAYXX ABGNP ABPQP ACUXJ ADMLS AMNDL CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 OTOTI 5PM
ID	FETCH-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3
IEDL.DBID	TOX
ISSN	1367-4803 1367-4811
IngestDate	Thu Aug 21 14:08:37 EDT 2025 Mon Mar 17 03:27:52 EDT 2025 Fri Jul 11 08:09:17 EDT 2025 Mon Jun 30 10:51:57 EDT 2025 Thu Apr 24 23:12:27 EDT 2025 Tue Jul 01 02:33:55 EDT 2025 Fri Nov 15 02:52:49 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	Supplement_1
Language	English
License	This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 USDOE USDOE Office of Science (SC) National Science Foundation (NSF) AC02-05CH11231; CCF-1816027
OpenAccessLink	https://dx.doi.org/10.1093/bioinformatics/btab302
PMID	34252945
PQID	3128013215
PQPubID	36124
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_8336592 osti_scitechconnect_1807640 proquest_miscellaneous_2551209588 proquest_journals_3128013215 crossref_citationtrail_10_1093_bioinformatics_btab302 crossref_primary_10_1093_bioinformatics_btab302 oup_primary_10_1093_bioinformatics_btab302
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20210712
PublicationDateYYYYMMDD	2021-07-12
PublicationDate_xml	– month: 7 year: 2021 text: 20210712 day: 12
PublicationDecade	2020
PublicationPlace	Oxford
PublicationPlace_xml	– name: Oxford – name: United States
PublicationTitle	Bioinformatics (Oxford, England)
PublicationYear	2021
Publisher	Oxford University Press Oxford Publishing Limited (England)
Publisher_xml	– name: Oxford University Press – name: Oxford Publishing Limited (England)
References	Consortium (2023062410281887400_btab302-B5) 2015; 526 Garrison (2023062410281887400_btab302-B13) 2018; 36 Ivanov (2023062410281887400_btab302-B18) 2020 Rautiainen (2023062410281887400_btab302-B35) 2020; 21 Fulkerson (2023062410281887400_btab302-B12) 1965; 15 Lanctot (2023062410281887400_btab302-B25) 2003; 185 Marcus (2023062410281887400_btab302-B30) 2014; 30 Dilthey (2023062410281887400_btab302-B9) 2015; 47 Ballouz (2023062410281887400_btab302-B2) 2019; 20 Jain (2023062410281887400_btab302-B19) 2019 Iqbal (2023062410281887400_btab302-B17) 2012; 44 van den Brand (2023062410281887400_btab302-B39) 2020 Ghaffaari (2023062410281887400_btab302-B14) 2019; 35 Li (2023062410281887400_btab302-B26) 2020; 21 Liu (2023062410281887400_btab302-B27) 2016; 32 Eggertsson (2023062410281887400_btab302-B10) 2017; 49 Pritt (2023062410281887400_btab302-B33) 2018; 19 Chang (2023062410281887400_btab302-B3) 2020; 36 Holley (2023062410281887400_btab302-B16) 2016; 11 Rausch (2023062410281887400_btab302-B34) 2012; 28 Jain (2023062410281887400_btab302-B20) 2019 Kuosmanen (2023062410281887400_btab302-B24) 2018 Vijaya (2023062410281887400_btab302-B40) 2012; 40 Audano (2023062410281887400_btab302-B1) 2019; 176 Gramm (2023062410281887400_btab302-B15) 2003; 37 Mokveld (2023062410281887400_btab302-B31) 2020; 21 Danek (2023062410281887400_btab302-B7) 2014; 9 Mahmoud (2023062410281887400_btab302-B29) 2019; 20 Schneeberger (2023062410281887400_btab302-B36) 2009; 10 Eizenga (2023062410281887400_btab302-B11) 2020; 21 Kuhnle (2023062410281887400_btab302-B23) 2020; 27 Kim (2023062410281887400_btab302-B22) 2018 (2023062410281887400_btab302-B4) 2018; 19 Sirén (2023062410281887400_btab302-B38) 2020; 36 Darby (2023062410281887400_btab302-B8) 2020; 36 Jain (2023062410281887400_btab302-B21) 2020; 27 Sirén (2023062410281887400_btab302-B37) 2014; 11 Maciuca (2023062410281887400_btab302-B28) 2016 Danecek (2023062410281887400_btab302-B6) 2011; 27 Paten (2023062410281887400_btab302-B32) 2017; 27
References_xml	– start-page: 259 year: 2020 ident: 2023062410281887400_btab302-B39 article-title: A deterministic linear program solver in current matrix multiplication time – volume: 20 start-page: 1 year: 2019 ident: 2023062410281887400_btab302-B2 article-title: Is it time to change the reference genome? publication-title: Genome Biol doi: 10.1186/s13059-019-1774-4 – volume: 185 start-page: 41 year: 2003 ident: 2023062410281887400_btab302-B25 article-title: Distinguishing string selection problems publication-title: Inf. Comput doi: 10.1016/S0890-5401(03)00057-9 – volume: 27 start-page: 500 year: 2020 ident: 2023062410281887400_btab302-B23 article-title: Efficient construction of a complete index for pan-genomics read alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.2019.0309 – volume: 27 start-page: 640 year: 2020 ident: 2023062410281887400_btab302-B21 article-title: On the complexity of sequence-to-graph alignment publication-title: J. Comput. Biol doi: 10.1089/cmb.2019.0066 – volume: 32 start-page: 3224 year: 2016 ident: 2023062410281887400_btab302-B27 article-title: debga: read alignment with de Bruijn graph-based seed and extension publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw371 – volume: 9 start-page: e109384 year: 2014 ident: 2023062410281887400_btab302-B7 article-title: Indexes of large genome collections on a PC publication-title: PLoS One doi: 10.1371/journal.pone.0109384 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B26 article-title: The design and construction of reference pangenome graphs with minigraph publication-title: Genome Biol doi: 10.1186/s13059-020-02168-z – volume: 36 start-page: 3712 year: 2020 ident: 2023062410281887400_btab302-B8 article-title: Vargas: heuristic-free alignment for assessing linear and graph read aligners publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa265 – volume: 19 start-page: 1 year: 2018 ident: 2023062410281887400_btab302-B33 article-title: Forge: prioritizing variants for graph genomes publication-title: Genome Biol doi: 10.1186/s13059-018-1595-x – volume: 36 start-page: 400 year: 2020 ident: 2023062410281887400_btab302-B38 article-title: Haplotype-aware graph indexes publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz575 – volume: 47 start-page: 682 year: 2015 ident: 2023062410281887400_btab302-B9 article-title: Improved genome inference in the MHC using a population reference graph publication-title: Nat. Genet doi: 10.1038/ng.3257 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B35 article-title: Graphaligner: rapid and versatile sequence-to-graph alignment publication-title: Genome Biol doi: 10.1186/s13059-020-02157-2 – volume: 30 start-page: 3476 year: 2014 ident: 2023062410281887400_btab302-B30 article-title: Splitmem: a graphical algorithm for pan-genome analysis with suffix skips publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu756 – volume: 176 start-page: 663 year: 2019 ident: 2023062410281887400_btab302-B1 article-title: Characterizing the major structural variant alleles of the human genome publication-title: Cell doi: 10.1016/j.cell.2018.12.019 – start-page: 17:1 year: 2019 ident: 2023062410281887400_btab302-B20 article-title: Validating paired-end read alignments in sequence graphs – volume: 27 start-page: 2156 year: 2011 ident: 2023062410281887400_btab302-B6 article-title: The variant call format and vcftools publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr330 – volume: 49 start-page: 1654 year: 2017 ident: 2023062410281887400_btab302-B10 article-title: Graphtyper enables population-scale genotyping using pangenome graphs publication-title: Nat. Genet doi: 10.1038/ng.3964 – volume: 526 start-page: 68 year: 2015 ident: 2023062410281887400_btab302-B5 article-title: A global reference for human genetic variation publication-title: Nature doi: 10.1038/nature15393 – volume: 37 start-page: 25 year: 2003 ident: 2023062410281887400_btab302-B15 article-title: Fixed-parameter algorithms for closest string and related problems publication-title: Algorithmica doi: 10.1007/s00453-003-1028-3 – volume: 44 start-page: 226 year: 2012 ident: 2023062410281887400_btab302-B17 article-title: De novo assembly and genotyping of variants using colored de bruijn graphs publication-title: Nat. Genet doi: 10.1038/ng.1028 – volume: 21 start-page: 1 year: 2020 ident: 2023062410281887400_btab302-B31 article-title: Chop: haplotype-aware path indexing in population graphs publication-title: Genome Biol doi: 10.1186/s13059-020-01963-y – start-page: 266197 year: 2018 ident: 2023062410281887400_btab302-B22 article-title: Hisat-genotype: next generation genomic analysis platform on a personal computer publication-title: BioRxiv – volume: 19 start-page: 118 year: 2018 ident: 2023062410281887400_btab302-B4 article-title: Computational pan-genomics: status, promises and challenges publication-title: Brief. Bioinform – volume: 35 start-page: i81 year: 2019 ident: 2023062410281887400_btab302-B14 article-title: Fully-sensitive seed finding in sequence graphs using a hybrid index publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz341 – volume: 21 start-page: 139 year: 2020 ident: 2023062410281887400_btab302-B11 article-title: Pangenome graphs publication-title: Annu. Rev. Genomics Hum. Genet doi: 10.1146/annurev-genom-120219-080406 – volume: 27 start-page: 665 year: 2017 ident: 2023062410281887400_btab302-B32 article-title: Genome graphs and the evolution of genome inference publication-title: Genome Res doi: 10.1101/gr.214155.116 – start-page: 105 year: 2018 ident: 2023062410281887400_btab302-B24 article-title: Using minimum path cover to boost dynamic programming on DAGs: co-linear chaining extended – start-page: 222 volume-title: International Workshop on Algorithms in Bioinformatics year: 2016 ident: 2023062410281887400_btab302-B28 doi: 10.1007/978-3-319-43681-4_18 – volume: 28 start-page: i333 year: 2012 ident: 2023062410281887400_btab302-B34 article-title: Delly: structural variant discovery by integrated paired-end and split-read analysis publication-title: Bioinformatics doi: 10.1093/bioinformatics/bts378 – volume: 40 start-page: e127 year: 2012 ident: 2023062410281887400_btab302-B40 article-title: A new strategy to reduce allelic bias in RNA-seq readmapping publication-title: Nucleic Acids Res doi: 10.1093/nar/gks425 – volume: 11 start-page: 1 year: 2016 ident: 2023062410281887400_btab302-B16 article-title: Bloom filter trie: an alignment-free and reference-free data structure for pan-genome storage publication-title: Algor. Mol. Biol – volume: 20 start-page: 1 year: 2019 ident: 2023062410281887400_btab302-B29 article-title: Structural variant calling: the long and the short of it publication-title: Genome Biol doi: 10.1186/s13059-019-1828-7 – volume: 10 start-page: R98 year: 2009 ident: 2023062410281887400_btab302-B36 article-title: Simultaneous alignment of short reads against multiple genomes publication-title: Genome Biol doi: 10.1186/gb-2009-10-9-r98 – volume: 15 start-page: 835 year: 1965 ident: 2023062410281887400_btab302-B12 article-title: Incidence matrices and interval graphs publication-title: Pac. J. Math doi: 10.2140/pjm.1965.15.835 – start-page: 104 year: 2020 ident: 2023062410281887400_btab302-B18 article-title: Astarix: fast and optimal sequence-to-graph alignment – start-page: 451 volume-title: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) year: 2019 ident: 2023062410281887400_btab302-B19 article-title: Accelerating sequence alignment to graphs – volume: 11 start-page: 375 year: 2014 ident: 2023062410281887400_btab302-B37 article-title: Indexing graphs for path queries with applications in genome research publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform doi: 10.1109/TCBB.2013.2297101 – volume: 36 start-page: 875 year: 2018 ident: 2023062410281887400_btab302-B13 article-title: Variation graph toolkit improves read mapping by representing genetic variation in the reference publication-title: Nat. Biotechnol doi: 10.1038/nbt.4227 – volume: 36 start-page: i146 year: 2020 ident: 2023062410281887400_btab302-B3 article-title: Distance indexing and seed clustering in sequence graphs publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa446
SSID	ssj0005056
Score	2.4095838
Snippet	Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to... Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture... Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population... Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture...
SourceID	pubmedcentral osti proquest crossref oup
SourceType	Open Access Repository Aggregation Database Enrichment Source Index Database Publisher
StartPage	i460
SubjectTerms	Algorithms Availability BASIC BIOLOGICAL SCIENCES Bias Bioinformatics Chromosome 1 Chromosomes Gene mapping General Computational Biology Genetic diversity Genetic variance Genomes Genomic analysis Graph representations Graphical representations Graphs Mapping Mathematics Nucleotides Parameters Population genetics Single-nucleotide polymorphism Size reduction
Title	A variant selection framework for genome graphs
URI	https://www.proquest.com/docview/3128013215 https://www.proquest.com/docview/2551209588 https://www.osti.gov/biblio/1807640 https://pubmed.ncbi.nlm.nih.gov/PMC8336592
Volume	37
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KQfAiPjG2SgRPQshj89gci1iKoF5a6G1JNrtY0ERMKvjvncmjmoKo581ulpkd5htm5huAK_QxmROryEoSxS2fhaHFYwRykuC1Tl2Z1Fx69w_hbOHfLYPlANyuF2Y7hR8zO10VLYkoERfbaZWkrKaPRE9MbPnzx-VXUYdTz2slHjLL5w7reoJ_PKbnjoYFmtVWpxsBzn655Df_M92HvRY4mpNG0wcwUPkh7DSjJD-OwJ6Y7xj2opzMsh5tg_I2dVd5ZeKNTKJjfVFmTVFdHsNieju_mVntMARLoo1VVuwkiGV44qZKawxCoiAOk4BFrqtlFofKQ7-vI5ZJQkzKV57yGQVTmY5TLplkJzDMi1ydgpkxjV9rKX3F0GRlGqPdRlprpUMlXWZA0MlEyJYpnAZWPIsmY81EX5ailaUB9mbfa8OV8euOEYlcoLcnylpJtT2yEi53otB3DLhGTfz5qHGnMNGaYSkYel9KJrmBAZebZTQgyookuSrWpcCYivqHA84NiHqK3vyZKLj7K_nqqabi5oxRXvrsPzcdwa5HhTHEzumNYVi9rdU5Ipsqvagf8yfnQv1K
linkProvider	Oxford University Press
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+variant+selection+framework+for+genome+graphs&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jain%2C+Chirag&rft.au=Tavakoli%2C+Neda&rft.au=Aluru%2C+Srinivas&rft.date=2021-07-12&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.volume=37&rft.issue=Supplement_1&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab302&rft.externalDocID=1807640
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon