A variant selection framework for genome graphs

Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is na...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 37; no. Supplement_1; pp. i460 - i467
Main Authors Jain, Chirag, Tavakoli, Neda, Aluru, Srinivas
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 12.07.2021
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
AbstractList Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results: In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.
Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online.
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.MOTIVATIONVariation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping.In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.RESULTSIn this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis.: https://github.com/AT-CG/VF.AVAILABILITY AND IMPLEMENTATION: https://github.com/AT-CG/VF.Supplementary data are available at Bioinformatics online.SUPPLEMENTARY INFORMATIONSupplementary data are available at Bioinformatics online.
Author Aluru, Srinivas
Jain, Chirag
Tavakoli, Neda
AuthorAffiliation 1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India
2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
AuthorAffiliation_xml – name: 2 School of Computational Science and Engineering, Georgia Institute of Technology , Atlanta, GA 30332, USA
– name: 1 Department of Computational and Data Sciences, Indian Institute of Science , Bangalore, KA 560012, India
Author_xml – sequence: 1
  givenname: Chirag
  surname: Jain
  fullname: Jain, Chirag
  email: chirag@iisc.ac.in
– sequence: 2
  givenname: Neda
  surname: Tavakoli
  fullname: Tavakoli, Neda
– sequence: 3
  givenname: Srinivas
  surname: Aluru
  fullname: Aluru, Srinivas
BackLink https://www.osti.gov/biblio/1807640$$D View this record in Osti.gov
BookMark eNqNUV1rHCEUlZDSfPUvhKF96ct2rzrOKJRCCElaCPSleRbHve6azuhWnYT--xh2W0he2qcr3PPhueeEHIYYkJBzCp8oKL4cfPTBxTSZ4m1eDsUMHNgBOaa86xetpPTw7xv4ETnJ-R4ABIjuLTniLRNMteKYLC-aB5O8CaXJOKItPobGJTPhY0w_m-rQrDHECZt1MttNPiNvnBkzvtvPU3J3ffXj8uvi9vvNt8uL24VtpSwLBUZyJQ0d0DnWQS9UZwTvKXV2pTpkioLr-coKCgJbZNhyJpVaOTVIyy0_JV92utt5mHBlMZRkRr1NfjLpt47G65eb4Dd6HR-05LwTilWB9zuBmIvX2fqCdmNjCDWjphL6roUK-rh3SfHXjLnoyWeL42gCxjlrJgRloISUFfrhFfQ-zinUG2hOmQTKGRUV9XmHsinmnNDpamyej1o_6UdNQT_Xp1_Wp_f1VXr3iv4n8D-JdJ923v4v5wmASriO
CitedBy_id crossref_primary_10_1007_s11047_022_09882_6
crossref_primary_10_1089_cmb_2024_0601
crossref_primary_10_1101_gr_279143_124
Cites_doi 10.1186/s13059-019-1774-4
10.1016/S0890-5401(03)00057-9
10.1089/cmb.2019.0309
10.1089/cmb.2019.0066
10.1093/bioinformatics/btw371
10.1371/journal.pone.0109384
10.1186/s13059-020-02168-z
10.1093/bioinformatics/btaa265
10.1186/s13059-018-1595-x
10.1093/bioinformatics/btz575
10.1038/ng.3257
10.1186/s13059-020-02157-2
10.1093/bioinformatics/btu756
10.1016/j.cell.2018.12.019
10.1093/bioinformatics/btr330
10.1038/ng.3964
10.1038/nature15393
10.1007/s00453-003-1028-3
10.1038/ng.1028
10.1186/s13059-020-01963-y
10.1093/bioinformatics/btz341
10.1146/annurev-genom-120219-080406
10.1101/gr.214155.116
10.1007/978-3-319-43681-4_18
10.1093/bioinformatics/bts378
10.1093/nar/gks425
10.1186/s13059-019-1828-7
10.1186/gb-2009-10-9-r98
10.2140/pjm.1965.15.835
10.1109/TCBB.2013.2297101
10.1038/nbt.4227
10.1093/bioinformatics/btaa446
ContentType Journal Article
Copyright The Author(s) 2021. Published by Oxford University Press. 2021
The Author(s) 2021. Published by Oxford University Press.
Copyright_xml – notice: The Author(s) 2021. Published by Oxford University Press. 2021
– notice: The Author(s) 2021. Published by Oxford University Press.
CorporateAuthor Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
CorporateAuthor_xml – name: Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
DBID TOX
AAYXX
CITATION
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
OTOTI
5PM
DOI 10.1093/bioinformatics/btab302
DatabaseName Oxford Academic Open Access Journals
CrossRef
Aluminium Industry Abstracts
Biotechnology Research Abstracts
Ceramic Abstracts
Computer and Information Systems Abstracts
Corrosion Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
Materials Business File
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
Aerospace Database
Copper Technical Reference Library
AIDS and Cancer Research Abstracts
Materials Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
OSTI.GOV
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
Materials Research Database
Oncogenes and Growth Factors Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Nucleic Acids Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Materials Business File
Aerospace Database
Copper Technical Reference Library
Engineered Materials Abstracts
Biotechnology Research Abstracts
AIDS and Cancer Research Abstracts
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Civil Engineering Abstracts
Aluminium Industry Abstracts
Electronics & Communications Abstracts
Ceramic Abstracts
METADEX
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
Solid State and Superconductivity Abstracts
Engineering Research Database
Corrosion Abstracts
MEDLINE - Academic
DatabaseTitleList Materials Research Database


MEDLINE - Academic
Database_xml – sequence: 1
  dbid: TOX
  name: Oxford Journals Open Access Collection
  url: https://academic.oup.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Biology
Mathematics
DocumentTitleAlternate ISMB/ECCB 2021 Proceedings
EISSN 1367-4811
EndPage i467
ExternalDocumentID PMC8336592
1807640
10_1093_bioinformatics_btab302
10.1093/bioinformatics/btab302
GrantInformation_xml – fundername: ; ;
  grantid: DE-AC02-05CH11231
– fundername: ; ;
– fundername: ; ;
  grantid: CCF-1816027
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AAIJN
AAIMJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEJV
ABEUO
ABIXL
ABNKS
ABPTD
ABQLI
ABQTQ
ABWST
ABXVV
ABZBJ
ACGFS
ACIWK
ACMRT
ACPRK
ACUFI
ACYTK
ADBBV
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADPDF
ADRDM
ADRIX
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AFXEN
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
APIBT
APWMN
AQDSO
ARIXL
ASPBG
ATTQO
AVWKF
AXUDD
AYOIW
AZFZN
AZVOD
BAWUL
BAYMD
BCRHZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C1A
C45
CAG
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HVGLF
HW0
HZ~
IOX
J21
JXSIZ
KAQDR
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
ROX
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
XJT
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
AAYXX
ABGNP
ABPQP
ACUXJ
ADMLS
AMNDL
CITATION
7QF
7QO
7QQ
7SC
7SE
7SP
7SR
7TA
7TB
7TM
7TO
7U5
8BQ
8FD
F28
FR3
H8D
H8G
H94
JG9
JQ2
K9.
KR7
L7M
L~C
L~D
P64
7X8
OTOTI
5PM
ID FETCH-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3
IEDL.DBID TOX
ISSN 1367-4803
1367-4811
IngestDate Thu Aug 21 14:08:37 EDT 2025
Mon Mar 17 03:27:52 EDT 2025
Fri Jul 11 08:09:17 EDT 2025
Mon Jun 30 10:51:57 EDT 2025
Thu Apr 24 23:12:27 EDT 2025
Tue Jul 01 02:33:55 EDT 2025
Fri Nov 15 02:52:49 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Supplement_1
Language English
License This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c488t-90a8398a1beff2607596a53711fcd96e2910f73dc5105e4e2e432899df9b8c3c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
USDOE
USDOE Office of Science (SC)
National Science Foundation (NSF)
AC02-05CH11231; CCF-1816027
OpenAccessLink https://dx.doi.org/10.1093/bioinformatics/btab302
PMID 34252945
PQID 3128013215
PQPubID 36124
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_8336592
osti_scitechconnect_1807640
proquest_miscellaneous_2551209588
proquest_journals_3128013215
crossref_citationtrail_10_1093_bioinformatics_btab302
crossref_primary_10_1093_bioinformatics_btab302
oup_primary_10_1093_bioinformatics_btab302
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20210712
PublicationDateYYYYMMDD 2021-07-12
PublicationDate_xml – month: 7
  year: 2021
  text: 20210712
  day: 12
PublicationDecade 2020
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
– name: United States
PublicationTitle Bioinformatics (Oxford, England)
PublicationYear 2021
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Consortium (2023062410281887400_btab302-B5) 2015; 526
Garrison (2023062410281887400_btab302-B13) 2018; 36
Ivanov (2023062410281887400_btab302-B18) 2020
Rautiainen (2023062410281887400_btab302-B35) 2020; 21
Fulkerson (2023062410281887400_btab302-B12) 1965; 15
Lanctot (2023062410281887400_btab302-B25) 2003; 185
Marcus (2023062410281887400_btab302-B30) 2014; 30
Dilthey (2023062410281887400_btab302-B9) 2015; 47
Ballouz (2023062410281887400_btab302-B2) 2019; 20
Jain (2023062410281887400_btab302-B19) 2019
Iqbal (2023062410281887400_btab302-B17) 2012; 44
van den Brand (2023062410281887400_btab302-B39) 2020
Ghaffaari (2023062410281887400_btab302-B14) 2019; 35
Li (2023062410281887400_btab302-B26) 2020; 21
Liu (2023062410281887400_btab302-B27) 2016; 32
Eggertsson (2023062410281887400_btab302-B10) 2017; 49
Pritt (2023062410281887400_btab302-B33) 2018; 19
Chang (2023062410281887400_btab302-B3) 2020; 36
Holley (2023062410281887400_btab302-B16) 2016; 11
Rausch (2023062410281887400_btab302-B34) 2012; 28
Jain (2023062410281887400_btab302-B20) 2019
Kuosmanen (2023062410281887400_btab302-B24) 2018
Vijaya (2023062410281887400_btab302-B40) 2012; 40
Audano (2023062410281887400_btab302-B1) 2019; 176
Gramm (2023062410281887400_btab302-B15) 2003; 37
Mokveld (2023062410281887400_btab302-B31) 2020; 21
Danek (2023062410281887400_btab302-B7) 2014; 9
Mahmoud (2023062410281887400_btab302-B29) 2019; 20
Schneeberger (2023062410281887400_btab302-B36) 2009; 10
Eizenga (2023062410281887400_btab302-B11) 2020; 21
Kuhnle (2023062410281887400_btab302-B23) 2020; 27
Kim (2023062410281887400_btab302-B22) 2018
(2023062410281887400_btab302-B4) 2018; 19
Sirén (2023062410281887400_btab302-B38) 2020; 36
Darby (2023062410281887400_btab302-B8) 2020; 36
Jain (2023062410281887400_btab302-B21) 2020; 27
Sirén (2023062410281887400_btab302-B37) 2014; 11
Maciuca (2023062410281887400_btab302-B28) 2016
Danecek (2023062410281887400_btab302-B6) 2011; 27
Paten (2023062410281887400_btab302-B32) 2017; 27
References_xml – start-page: 259
  year: 2020
  ident: 2023062410281887400_btab302-B39
  article-title: A deterministic linear program solver in current matrix multiplication time
– volume: 20
  start-page: 1
  year: 2019
  ident: 2023062410281887400_btab302-B2
  article-title: Is it time to change the reference genome?
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1774-4
– volume: 185
  start-page: 41
  year: 2003
  ident: 2023062410281887400_btab302-B25
  article-title: Distinguishing string selection problems
  publication-title: Inf. Comput
  doi: 10.1016/S0890-5401(03)00057-9
– volume: 27
  start-page: 500
  year: 2020
  ident: 2023062410281887400_btab302-B23
  article-title: Efficient construction of a complete index for pan-genomics read alignment
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.2019.0309
– volume: 27
  start-page: 640
  year: 2020
  ident: 2023062410281887400_btab302-B21
  article-title: On the complexity of sequence-to-graph alignment
  publication-title: J. Comput. Biol
  doi: 10.1089/cmb.2019.0066
– volume: 32
  start-page: 3224
  year: 2016
  ident: 2023062410281887400_btab302-B27
  article-title: debga: read alignment with de Bruijn graph-based seed and extension
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw371
– volume: 9
  start-page: e109384
  year: 2014
  ident: 2023062410281887400_btab302-B7
  article-title: Indexes of large genome collections on a PC
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0109384
– volume: 21
  start-page: 1
  year: 2020
  ident: 2023062410281887400_btab302-B26
  article-title: The design and construction of reference pangenome graphs with minigraph
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-02168-z
– volume: 36
  start-page: 3712
  year: 2020
  ident: 2023062410281887400_btab302-B8
  article-title: Vargas: heuristic-free alignment for assessing linear and graph read aligners
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa265
– volume: 19
  start-page: 1
  year: 2018
  ident: 2023062410281887400_btab302-B33
  article-title: Forge: prioritizing variants for graph genomes
  publication-title: Genome Biol
  doi: 10.1186/s13059-018-1595-x
– volume: 36
  start-page: 400
  year: 2020
  ident: 2023062410281887400_btab302-B38
  article-title: Haplotype-aware graph indexes
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz575
– volume: 47
  start-page: 682
  year: 2015
  ident: 2023062410281887400_btab302-B9
  article-title: Improved genome inference in the MHC using a population reference graph
  publication-title: Nat. Genet
  doi: 10.1038/ng.3257
– volume: 21
  start-page: 1
  year: 2020
  ident: 2023062410281887400_btab302-B35
  article-title: Graphaligner: rapid and versatile sequence-to-graph alignment
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-02157-2
– volume: 30
  start-page: 3476
  year: 2014
  ident: 2023062410281887400_btab302-B30
  article-title: Splitmem: a graphical algorithm for pan-genome analysis with suffix skips
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu756
– volume: 176
  start-page: 663
  year: 2019
  ident: 2023062410281887400_btab302-B1
  article-title: Characterizing the major structural variant alleles of the human genome
  publication-title: Cell
  doi: 10.1016/j.cell.2018.12.019
– start-page: 17:1
  year: 2019
  ident: 2023062410281887400_btab302-B20
  article-title: Validating paired-end read alignments in sequence graphs
– volume: 27
  start-page: 2156
  year: 2011
  ident: 2023062410281887400_btab302-B6
  article-title: The variant call format and vcftools
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btr330
– volume: 49
  start-page: 1654
  year: 2017
  ident: 2023062410281887400_btab302-B10
  article-title: Graphtyper enables population-scale genotyping using pangenome graphs
  publication-title: Nat. Genet
  doi: 10.1038/ng.3964
– volume: 526
  start-page: 68
  year: 2015
  ident: 2023062410281887400_btab302-B5
  article-title: A global reference for human genetic variation
  publication-title: Nature
  doi: 10.1038/nature15393
– volume: 37
  start-page: 25
  year: 2003
  ident: 2023062410281887400_btab302-B15
  article-title: Fixed-parameter algorithms for closest string and related problems
  publication-title: Algorithmica
  doi: 10.1007/s00453-003-1028-3
– volume: 44
  start-page: 226
  year: 2012
  ident: 2023062410281887400_btab302-B17
  article-title: De novo assembly and genotyping of variants using colored de bruijn graphs
  publication-title: Nat. Genet
  doi: 10.1038/ng.1028
– volume: 21
  start-page: 1
  year: 2020
  ident: 2023062410281887400_btab302-B31
  article-title: Chop: haplotype-aware path indexing in population graphs
  publication-title: Genome Biol
  doi: 10.1186/s13059-020-01963-y
– start-page: 266197
  year: 2018
  ident: 2023062410281887400_btab302-B22
  article-title: Hisat-genotype: next generation genomic analysis platform on a personal computer
  publication-title: BioRxiv
– volume: 19
  start-page: 118
  year: 2018
  ident: 2023062410281887400_btab302-B4
  article-title: Computational pan-genomics: status, promises and challenges
  publication-title: Brief. Bioinform
– volume: 35
  start-page: i81
  year: 2019
  ident: 2023062410281887400_btab302-B14
  article-title: Fully-sensitive seed finding in sequence graphs using a hybrid index
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz341
– volume: 21
  start-page: 139
  year: 2020
  ident: 2023062410281887400_btab302-B11
  article-title: Pangenome graphs
  publication-title: Annu. Rev. Genomics Hum. Genet
  doi: 10.1146/annurev-genom-120219-080406
– volume: 27
  start-page: 665
  year: 2017
  ident: 2023062410281887400_btab302-B32
  article-title: Genome graphs and the evolution of genome inference
  publication-title: Genome Res
  doi: 10.1101/gr.214155.116
– start-page: 105
  year: 2018
  ident: 2023062410281887400_btab302-B24
  article-title: Using minimum path cover to boost dynamic programming on DAGs: co-linear chaining extended
– start-page: 222
  volume-title: International Workshop on Algorithms in Bioinformatics
  year: 2016
  ident: 2023062410281887400_btab302-B28
  doi: 10.1007/978-3-319-43681-4_18
– volume: 28
  start-page: i333
  year: 2012
  ident: 2023062410281887400_btab302-B34
  article-title: Delly: structural variant discovery by integrated paired-end and split-read analysis
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts378
– volume: 40
  start-page: e127
  year: 2012
  ident: 2023062410281887400_btab302-B40
  article-title: A new strategy to reduce allelic bias in RNA-seq readmapping
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks425
– volume: 11
  start-page: 1
  year: 2016
  ident: 2023062410281887400_btab302-B16
  article-title: Bloom filter trie: an alignment-free and reference-free data structure for pan-genome storage
  publication-title: Algor. Mol. Biol
– volume: 20
  start-page: 1
  year: 2019
  ident: 2023062410281887400_btab302-B29
  article-title: Structural variant calling: the long and the short of it
  publication-title: Genome Biol
  doi: 10.1186/s13059-019-1828-7
– volume: 10
  start-page: R98
  year: 2009
  ident: 2023062410281887400_btab302-B36
  article-title: Simultaneous alignment of short reads against multiple genomes
  publication-title: Genome Biol
  doi: 10.1186/gb-2009-10-9-r98
– volume: 15
  start-page: 835
  year: 1965
  ident: 2023062410281887400_btab302-B12
  article-title: Incidence matrices and interval graphs
  publication-title: Pac. J. Math
  doi: 10.2140/pjm.1965.15.835
– start-page: 104
  year: 2020
  ident: 2023062410281887400_btab302-B18
  article-title: Astarix: fast and optimal sequence-to-graph alignment
– start-page: 451
  volume-title: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  year: 2019
  ident: 2023062410281887400_btab302-B19
  article-title: Accelerating sequence alignment to graphs
– volume: 11
  start-page: 375
  year: 2014
  ident: 2023062410281887400_btab302-B37
  article-title: Indexing graphs for path queries with applications in genome research
  publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform
  doi: 10.1109/TCBB.2013.2297101
– volume: 36
  start-page: 875
  year: 2018
  ident: 2023062410281887400_btab302-B13
  article-title: Variation graph toolkit improves read mapping by representing genetic variation in the reference
  publication-title: Nat. Biotechnol
  doi: 10.1038/nbt.4227
– volume: 36
  start-page: i146
  year: 2020
  ident: 2023062410281887400_btab302-B3
  article-title: Distance indexing and seed clustering in sequence graphs
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa446
SSID ssj0005056
Score 2.4095838
Snippet Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to...
Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture...
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population...
Motivation: Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture...
SourceID pubmedcentral
osti
proquest
crossref
oup
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage i460
SubjectTerms Algorithms
Availability
BASIC BIOLOGICAL SCIENCES
Bias
Bioinformatics
Chromosome 1
Chromosomes
Gene mapping
General Computational Biology
Genetic diversity
Genetic variance
Genomes
Genomic analysis
Graph representations
Graphical representations
Graphs
Mapping
Mathematics
Nucleotides
Parameters
Population genetics
Single-nucleotide polymorphism
Size reduction
Title A variant selection framework for genome graphs
URI https://www.proquest.com/docview/3128013215
https://www.proquest.com/docview/2551209588
https://www.osti.gov/biblio/1807640
https://pubmed.ncbi.nlm.nih.gov/PMC8336592
Volume 37
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5KQfAiPjG2SgRPQshj89gci1iKoF5a6G1JNrtY0ERMKvjvncmjmoKo581ulpkd5htm5huAK_QxmROryEoSxS2fhaHFYwRykuC1Tl2Z1Fx69w_hbOHfLYPlANyuF2Y7hR8zO10VLYkoERfbaZWkrKaPRE9MbPnzx-VXUYdTz2slHjLL5w7reoJ_PKbnjoYFmtVWpxsBzn655Df_M92HvRY4mpNG0wcwUPkh7DSjJD-OwJ6Y7xj2opzMsh5tg_I2dVd5ZeKNTKJjfVFmTVFdHsNieju_mVntMARLoo1VVuwkiGV44qZKawxCoiAOk4BFrqtlFofKQ7-vI5ZJQkzKV57yGQVTmY5TLplkJzDMi1ydgpkxjV9rKX3F0GRlGqPdRlprpUMlXWZA0MlEyJYpnAZWPIsmY81EX5ailaUB9mbfa8OV8euOEYlcoLcnylpJtT2yEi53otB3DLhGTfz5qHGnMNGaYSkYel9KJrmBAZebZTQgyookuSrWpcCYivqHA84NiHqK3vyZKLj7K_nqqabi5oxRXvrsPzcdwa5HhTHEzumNYVi9rdU5Ipsqvagf8yfnQv1K
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+variant+selection+framework+for+genome+graphs&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Jain%2C+Chirag&rft.au=Tavakoli%2C+Neda&rft.au=Aluru%2C+Srinivas&rft.date=2021-07-12&rft.pub=Oxford+University+Press&rft.issn=1367-4803&rft.volume=37&rft.issue=Supplement_1&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbtab302&rft.externalDocID=1807640
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon