Gaps and complex structurally variant loci in phased genome assemblies

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, asse...

Full description

Saved in:
Bibliographic Details
Published inGenome research Vol. 33; no. 4; pp. 496 - 510
Main Authors Porubsky, David, Vollger, Mitchell R., Harvey, William T., Rozanski, Allison N., Ebert, Peter, Hickey, Glenn, Hasenfeld, Patrick, Sanders, Ashley D., Stober, Catherine, Korbel, Jan O., Paten, Benedict, Marschall, Tobias, Eichler, Evan E.
Format Journal Article
LanguageEnglish
Published United States Cold Spring Harbor Laboratory Press 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
AbstractList There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
Author Korbel, Jan O.
Ebert, Peter
Stober, Catherine
Marschall, Tobias
Paten, Benedict
Hasenfeld, Patrick
Eichler, Evan E.
Porubsky, David
Vollger, Mitchell R.
Rozanski, Allison N.
Harvey, William T.
Sanders, Ashley D.
Hickey, Glenn
Author_xml – sequence: 1
  givenname: David
  orcidid: 0000-0001-8414-8966
  surname: Porubsky
  fullname: Porubsky, David
– sequence: 2
  givenname: Mitchell R.
  orcidid: 0000-0002-8651-1615
  surname: Vollger
  fullname: Vollger, Mitchell R.
– sequence: 3
  givenname: William T.
  orcidid: 0000-0003-0646-7528
  surname: Harvey
  fullname: Harvey, William T.
– sequence: 4
  givenname: Allison N.
  orcidid: 0000-0002-5034-1773
  surname: Rozanski
  fullname: Rozanski, Allison N.
– sequence: 5
  givenname: Peter
  orcidid: 0000-0001-7441-532X
  surname: Ebert
  fullname: Ebert, Peter
– sequence: 6
  givenname: Glenn
  orcidid: 0000-0002-2280-9404
  surname: Hickey
  fullname: Hickey, Glenn
– sequence: 7
  givenname: Patrick
  orcidid: 0000-0003-2319-2482
  surname: Hasenfeld
  fullname: Hasenfeld, Patrick
– sequence: 8
  givenname: Ashley D.
  orcidid: 0000-0003-3945-0677
  surname: Sanders
  fullname: Sanders, Ashley D.
– sequence: 9
  givenname: Catherine
  orcidid: 0000-0002-9481-013X
  surname: Stober
  fullname: Stober, Catherine
– sequence: 10
  givenname: Jan O.
  orcidid: 0000-0002-2798-3794
  surname: Korbel
  fullname: Korbel, Jan O.
– sequence: 11
  givenname: Benedict
  orcidid: 0000-0001-8863-3539
  surname: Paten
  fullname: Paten, Benedict
– sequence: 12
  givenname: Tobias
  orcidid: 0000-0002-9376-1030
  surname: Marschall
  fullname: Marschall, Tobias
– sequence: 13
  givenname: Evan E.
  orcidid: 0000-0002-8246-4014
  surname: Eichler
  fullname: Eichler, Evan E.
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37164484$$D View this record in MEDLINE/PubMed
BookMark eNp1kc9LHTEQx0NRqr722GsJeOlln_m1SfYkItUWBC_2HLLZec9INlmTXdH_vnk8K63Q0wzMZ758Z74n6CCmCAh9oWRNKaFn27xmSnEu1pSxD-iYtqJrWiG7g9oTrZuOtPQInZTyQAjhQuuP6IgrKoXQ4hhdXdupYBsH7NI4BXjGZc6Lm5dsQ3jBTzZ7G2cckvPYRzzd2wID3kJMI2BbCox98FA-ocONDQU-v9YV-nX1_e7yR3Nze_3z8uKmcYLKuVFOWkZlr5hTgksHnAy2Vc52Q7_Rmg8MaE_BSik2RDDZOzEIRYZOgHU7fIXO97rT0o8wOIhzNWqm7EebX0yy3vw7if7ebNOToYRxwbquKnx7VcjpcYEym9EXByHYCGkphmnKWqJV_dUKnb5DH9KSY71vRwnSEtW1lfr6t6U3L39-XIFmD7icSsmweUMoMbsMzTabfYamZlh5_o53frazT7uLfPjP1m8UW5__
CitedBy_id crossref_primary_10_1101_gr_277175_122
crossref_primary_10_1038_s41588_024_02051_8
crossref_primary_10_1186_s13059_023_02995_w
crossref_primary_10_1038_s41576_024_00718_w
crossref_primary_10_1042_ETLS20230074
crossref_primary_10_1038_s41592_024_02269_8
crossref_primary_10_1038_s41586_023_05895_y
crossref_primary_10_1038_s41586_023_05896_x
crossref_primary_10_17816_fm16167
crossref_primary_10_1038_s41467_025_57505_2
crossref_primary_10_1016_j_scib_2023_06_014
crossref_primary_10_1186_s13023_024_03307_6
crossref_primary_10_1016_j_gde_2024_102233
crossref_primary_10_1186_s13059_023_02919_8
crossref_primary_10_1016_j_cell_2024_01_002
crossref_primary_10_1038_s41435_024_00279_2
crossref_primary_10_1101_gr_279346_124
crossref_primary_10_1016_j_cell_2024_01_052
Cites_doi 10.1126/science.abf7117
10.1038/s41588-022-01043-w
10.1093/bioinformatics/bty191
10.1038/s41586-023-05896-x
10.1101/gr.263566.120
10.1038/nature15393
10.1038/s41587-023-01662-6
10.1111/ahg.12364
10.1038/ng.3092
10.1093/bioinformatics/btp352
10.1038/nmeth0810-576
10.1038/s41592-020-01056-5
10.1126/science.abl4178
10.1093/bioinformatics/btp698
10.1038/nbt.4235
10.1038/nprot.2017.029
10.1126/science.abj6987
10.1089/cmb.2014.0157
10.1038/s41586-022-04601-8
10.1101/gr.209841.116
10.1038/s41587-019-0217-9
10.1093/nar/30.11.2478
10.1038/s41467-018-08148-z
10.1016/j.cell.2022.08.004
10.1016/j.gpb.2016.05.004
10.1038/s41587-019-0366-x
10.1016/j.ajhg.2022.02.014
10.1038/s41587-019-0072-8
10.1016/j.cell.2022.04.017
10.1126/science.1197005
10.1101/2023.04.05.535718
10.1038/s41587-020-0503-6
10.1038/ng.909
10.1038/s41586-023-05976-y
10.1126/science.abj6965
10.1038/s41586-023-05895-y
10.1038/s41586-022-05325-5
10.1038/s41587-020-0711-0
10.1038/ng1862
10.1038/nature15394
10.1101/705616
10.1038/s41587-020-0719-5
10.1038/nmeth.2206
10.1038/s41586-021-03420-7
10.1186/s12915-018-0535-2
10.1038/s41587-022-01261-x
10.1093/bioinformatics/btv098
ContentType Journal Article
Contributor Ebler, Jana
Prins, Pjotr
Green, Richard E
Martin, Fergal J
Billis, Konstantinos
Mountcastle, Jacquelyn
Fairley, Susan
Frankish, Adam
Lu, Tsung-Yu
Markello, Charles
Mwaniki, Moses Njagi
Guarracino, Andrea
Baker, Carl A
Jarvis, Erich D
Monlong, Jean
Giron, Carlos Garcia
Pesout, Trevor
Cornejo, Omar E
Gao, Yan
Paten, Benedict
Colonna, Vincenza
Rautiainen, Mikko
Flicek, Paul
Rhie, Arang
Nurk, Sergey
Chaisson, Mark J P
Ji, Hanlee P
Doerr, Daniel
Kolesnikov, Alexey
Olsen, Hugh E
Harvey, William T
Chang, Pi-Chuan
Belyaeva, Anastasiya
Garg, Shilpa
Magalhães, Hugo
Cook, Daniel E
Groza, Cristian
Hoekzema, Kendra
Marco-Sola, Santiago
Asri, Mobin
Chu, Justin
Lu, Shuangjia
Munson, Katherine M
Cheng, Haoyu
Korbel, Jan O
Lee, HoJoon
Cody, Sarah
Chang, Xian H
Ebert, Peter
Haussler, David
Olson, Nathan D
Marijon, Pierre
Garrison, Nanibaa' A
McDaniel, Jennifer
Fedrigo, Olivier
Hall, Ira M
Fischer, Christian
Fulton, Robert S
Haukness, Marina
Kordosky, Jennifer
Bourque, Guillaume
Carroll, Andrew
Regier, Allison A
Koren, Sergey
Garrison, Erik
Mitchell, Matthew W
Nattesta
Contributor_xml – sequence: 1
  givenname: Haley J
  surname: Abel
  fullname: Abel, Haley J
– sequence: 2
  givenname: Lucinda L
  surname: Antonacci-Fulton
  fullname: Antonacci-Fulton, Lucinda L
– sequence: 3
  givenname: Mobin
  surname: Asri
  fullname: Asri, Mobin
– sequence: 4
  givenname: Gunjan
  surname: Baid
  fullname: Baid, Gunjan
– sequence: 5
  givenname: Carl A
  surname: Baker
  fullname: Baker, Carl A
– sequence: 6
  givenname: Anastasiya
  surname: Belyaeva
  fullname: Belyaeva, Anastasiya
– sequence: 7
  givenname: Konstantinos
  surname: Billis
  fullname: Billis, Konstantinos
– sequence: 8
  givenname: Guillaume
  surname: Bourque
  fullname: Bourque, Guillaume
– sequence: 9
  givenname: Silvia
  surname: Buonaiuto
  fullname: Buonaiuto, Silvia
– sequence: 10
  givenname: Andrew
  surname: Carroll
  fullname: Carroll, Andrew
– sequence: 11
  givenname: Mark J P
  surname: Chaisson
  fullname: Chaisson, Mark J P
– sequence: 12
  givenname: Pi-Chuan
  surname: Chang
  fullname: Chang, Pi-Chuan
– sequence: 13
  givenname: Xian H
  surname: Chang
  fullname: Chang, Xian H
– sequence: 14
  givenname: Haoyu
  surname: Cheng
  fullname: Cheng, Haoyu
– sequence: 15
  givenname: Justin
  surname: Chu
  fullname: Chu, Justin
– sequence: 16
  givenname: Sarah
  surname: Cody
  fullname: Cody, Sarah
– sequence: 17
  givenname: Vincenza
  surname: Colonna
  fullname: Colonna, Vincenza
– sequence: 18
  givenname: Daniel E
  surname: Cook
  fullname: Cook, Daniel E
– sequence: 19
  givenname: Robert M
  surname: Cook-Deegan
  fullname: Cook-Deegan, Robert M
– sequence: 20
  givenname: Omar E
  surname: Cornejo
  fullname: Cornejo, Omar E
– sequence: 21
  givenname: Mark
  surname: Diekhans
  fullname: Diekhans, Mark
– sequence: 22
  givenname: Daniel
  surname: Doerr
  fullname: Doerr, Daniel
– sequence: 23
  givenname: Peter
  surname: Ebert
  fullname: Ebert, Peter
– sequence: 24
  givenname: Jana
  surname: Ebler
  fullname: Ebler, Jana
– sequence: 25
  givenname: Evan E
  surname: Eichler
  fullname: Eichler, Evan E
– sequence: 26
  givenname: Jordan M
  surname: Eizenga
  fullname: Eizenga, Jordan M
– sequence: 27
  givenname: Susan
  surname: Fairley
  fullname: Fairley, Susan
– sequence: 28
  givenname: Olivier
  surname: Fedrigo
  fullname: Fedrigo, Olivier
– sequence: 29
  givenname: Adam L
  surname: Felsenfeld
  fullname: Felsenfeld, Adam L
– sequence: 30
  givenname: Xiaowen
  surname: Feng
  fullname: Feng, Xiaowen
– sequence: 31
  givenname: Christian
  surname: Fischer
  fullname: Fischer, Christian
– sequence: 32
  givenname: Paul
  surname: Flicek
  fullname: Flicek, Paul
– sequence: 33
  givenname: Giulio
  surname: Formenti
  fullname: Formenti, Giulio
– sequence: 34
  givenname: Adam
  surname: Frankish
  fullname: Frankish, Adam
– sequence: 35
  givenname: Robert S
  surname: Fulton
  fullname: Fulton, Robert S
– sequence: 36
  givenname: Yan
  surname: Gao
  fullname: Gao, Yan
– sequence: 37
  givenname: Shilpa
  surname: Garg
  fullname: Garg, Shilpa
– sequence: 38
  givenname: Erik
  surname: Garrison
  fullname: Garrison, Erik
– sequence: 39
  givenname: Nanibaa' A
  surname: Garrison
  fullname: Garrison, Nanibaa' A
– sequence: 40
  givenname: Carlos Garcia
  surname: Giron
  fullname: Giron, Carlos Garcia
– sequence: 41
  givenname: Richard E
  surname: Green
  fullname: Green, Richard E
– sequence: 42
  givenname: Cristian
  surname: Groza
  fullname: Groza, Cristian
– sequence: 43
  givenname: Andrea
  surname: Guarracino
  fullname: Guarracino, Andrea
– sequence: 44
  givenname: Leanne
  surname: Haggerty
  fullname: Haggerty, Leanne
– sequence: 45
  givenname: Ira M
  surname: Hall
  fullname: Hall, Ira M
– sequence: 46
  givenname: William T
  surname: Harvey
  fullname: Harvey, William T
– sequence: 47
  givenname: Marina
  surname: Haukness
  fullname: Haukness, Marina
– sequence: 48
  givenname: David
  surname: Haussler
  fullname: Haussler, David
– sequence: 49
  givenname: Simon
  surname: Heumos
  fullname: Heumos, Simon
– sequence: 50
  givenname: Glenn
  surname: Hickey
  fullname: Hickey, Glenn
– sequence: 51
  givenname: Kendra
  surname: Hoekzema
  fullname: Hoekzema, Kendra
– sequence: 52
  givenname: Thibaut
  surname: Hourlier
  fullname: Hourlier, Thibaut
– sequence: 53
  givenname: Kerstin
  surname: Howe
  fullname: Howe, Kerstin
– sequence: 54
  givenname: Miten
  surname: Jain
  fullname: Jain, Miten
– sequence: 55
  givenname: Erich D
  surname: Jarvis
  fullname: Jarvis, Erich D
– sequence: 56
  givenname: Hanlee P
  surname: Ji
  fullname: Ji, Hanlee P
– sequence: 57
  givenname: Eimear E
  surname: Kenny
  fullname: Kenny, Eimear E
– sequence: 58
  givenname: Barbara A
  surname: Koenig
  fullname: Koenig, Barbara A
– sequence: 59
  givenname: Alexey
  surname: Kolesnikov
  fullname: Kolesnikov, Alexey
– sequence: 60
  givenname: Jan O
  surname: Korbel
  fullname: Korbel, Jan O
– sequence: 61
  givenname: Jennifer
  surname: Kordosky
  fullname: Kordosky, Jennifer
– sequence: 62
  givenname: Sergey
  surname: Koren
  fullname: Koren, Sergey
– sequence: 63
  givenname: HoJoon
  surname: Lee
  fullname: Lee, HoJoon
– sequence: 64
  givenname: Alexandra P
  surname: Lewis
  fullname: Lewis, Alexandra P
– sequence: 65
  givenname: Heng
  surname: Li
  fullname: Li, Heng
– sequence: 66
  givenname: Wen-Wei
  surname: Liao
  fullname: Liao, Wen-Wei
– sequence: 67
  givenname: Shuangjia
  surname: Lu
  fullname: Lu, Shuangjia
– sequence: 68
  givenname: Tsung-Yu
  surname: Lu
  fullname: Lu, Tsung-Yu
– sequence: 69
  givenname: Julian K
  surname: Lucas
  fullname: Lucas, Julian K
– sequence: 70
  givenname: Hugo
  surname: Magalhães
  fullname: Magalhães, Hugo
– sequence: 71
  givenname: Santiago
  surname: Marco-Sola
  fullname: Marco-Sola, Santiago
– sequence: 72
  givenname: Pierre
  surname: Marijon
  fullname: Marijon, Pierre
– sequence: 73
  givenname: Charles
  surname: Markello
  fullname: Markello, Charles
– sequence: 74
  givenname: Tobias
  surname: Marschall
  fullname: Marschall, Tobias
– sequence: 75
  givenname: Fergal J
  surname: Martin
  fullname: Martin, Fergal J
– sequence: 76
  givenname: Ann
  surname: McCartney
  fullname: McCartney, Ann
– sequence: 77
  givenname: Jennifer
  surname: McDaniel
  fullname: McDaniel, Jennifer
– sequence: 78
  givenname: Karen H
  surname: Miga
  fullname: Miga, Karen H
– sequence: 79
  givenname: Matthew W
  surname: Mitchell
  fullname: Mitchell, Matthew W
– sequence: 80
  givenname: Jean
  surname: Monlong
  fullname: Monlong, Jean
– sequence: 81
  givenname: Jacquelyn
  surname: Mountcastle
  fullname: Mountcastle, Jacquelyn
– sequence: 82
  givenname: Katherine M
  surname: Munson
  fullname: Munson, Katherine M
– sequence: 83
  givenname: Moses Njagi
  surname: Mwaniki
  fullname: Mwaniki, Moses Njagi
– sequence: 84
  givenname: Maria
  surname: Nattestad
  fullname: Nattestad, Maria
– sequence: 85
  givenname: Adam M
  surname: Novak
  fullname: Novak, Adam M
– sequence: 86
  givenname: Sergey
  surname: Nurk
  fullname: Nurk, Sergey
– sequence: 87
  givenname: Hugh E
  surname: Olsen
  fullname: Olsen, Hugh E
– sequence: 88
  givenname: Nathan D
  surname: Olson
  fullname: Olson, Nathan D
– sequence: 89
  givenname: Benedict
  surname: Paten
  fullname: Paten, Benedict
– sequence: 90
  givenname: Trevor
  surname: Pesout
  fullname: Pesout, Trevor
– sequence: 91
  givenname: Adam M
  surname: Phillippy
  fullname: Phillippy, Adam M
– sequence: 92
  givenname: Alice B
  surname: Popejoy
  fullname: Popejoy, Alice B
– sequence: 93
  givenname: David
  surname: Porubsky
  fullname: Porubsky, David
– sequence: 94
  givenname: Pjotr
  surname: Prins
  fullname: Prins, Pjotr
– sequence: 95
  givenname: Daniela
  surname: Puiu
  fullname: Puiu, Daniela
– sequence: 96
  givenname: Mikko
  surname: Rautiainen
  fullname: Rautiainen, Mikko
– sequence: 97
  givenname: Allison A
  surname: Regier
  fullname: Regier, Allison A
– sequence: 98
  givenname: Arang
  surname: Rhie
  fullname: Rhie, Arang
– sequence: 99
  givenname: Samuel
  surname: Sacco
  fullname: Sacco, Samuel
– sequence: 100
  givenname: Ashley D
  surname: Sanders
  fullname: Sanders, Ashley D
Copyright 2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press.
Copyright Cold Spring Harbor Laboratory Press Apr 2023
2023
Copyright_xml – notice: 2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press.
– notice: Copyright Cold Spring Harbor Laboratory Press Apr 2023
– notice: 2023
CorporateAuthor Human Pangenome Reference Consortium
CorporateAuthor_xml – name: Human Pangenome Reference Consortium
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7TM
8FD
FR3
P64
RC3
7X8
5PM
DOI 10.1101/gr.277334.122
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Nucleic Acids Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Genetics Abstracts
Engineering Research Database
Technology Research Database
Nucleic Acids Abstracts
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList
MEDLINE
MEDLINE - Academic
CrossRef
Genetics Abstracts
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Anatomy & Physiology
Chemistry
Biology
DocumentTitleAlternate Porubsky et al
EISSN 1549-5469
EndPage 510
ExternalDocumentID PMC10234299
37164484
10_1101_gr_277334_122
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NHGRI NIH HHS
  grantid: R01 HG002385
– fundername: NHGRI NIH HHS
  grantid: U01 HG010971
– fundername: NHGRI NIH HHS
  grantid: U24 HG010262
– fundername: NHGRI NIH HHS
  grantid: R01 HG010485
– fundername: NHGRI NIH HHS
  grantid: U01 HG010963
– fundername: ;
– fundername: Marie Sklodowska-Curie
  grantid: 956229
– fundername: ;
  grantid: 5R01HG002385; 5U01HG010971; 1U01HG010973
– fundername: European Union's Horizon 2020 research and innovation programme
GroupedDBID ---
.GJ
18M
29H
2WC
39C
4.4
53G
5GY
5RE
5VS
AAFWJ
AAYOK
AAYXX
AAZTW
ABDIX
ABDNZ
ACGFO
ACLKE
ACYGS
ADBBV
ADNWM
AEILP
AENEX
AHPUY
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BTFSW
C1A
CITATION
CS3
DIK
DU5
E3Z
EBS
EJD
F5P
FRP
GX1
H13
HYE
IH2
K-O
KQ8
MV1
R.V
RCX
RHI
RNS
RPM
RXW
SJN
TAE
TR2
VH1
W8F
WOQ
YKV
ZCG
ZGI
ZXP
CGR
CUY
CVF
ECM
EIF
NPM
7TM
8FD
FR3
P64
RC3
7X8
5PM
ID FETCH-LOGICAL-c416t-7c6a216b72c7436ce30da57ca9dbf883d2e1b1ea664f0426bc4d470d94eac6ce3
ISSN 1088-9051
1549-5469
IngestDate Thu Aug 21 18:36:58 EDT 2025
Thu Jul 10 17:20:32 EDT 2025
Sun Jun 29 13:13:04 EDT 2025
Sun Jul 20 01:30:20 EDT 2025
Thu Apr 24 23:01:21 EDT 2025
Tue Jul 01 02:20:47 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License 2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c416t-7c6a216b72c7436ce30da57ca9dbf883d2e1b1ea664f0426bc4d470d94eac6ce3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
A complete list of contributing Consortium members appears at the end of this paper.
ORCID 0000-0003-0646-7528
0000-0002-9376-1030
0000-0002-8246-4014
0000-0002-9481-013X
0000-0001-7441-532X
0000-0002-2280-9404
0000-0003-2319-2482
0000-0001-8863-3539
0000-0003-3945-0677
0000-0002-2798-3794
0000-0001-8414-8966
0000-0002-8651-1615
0000-0002-5034-1773
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC10234299
PMID 37164484
PQID 2814050795
PQPubID 2049132
PageCount 15
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_10234299
proquest_miscellaneous_2812508700
proquest_journals_2814050795
pubmed_primary_37164484
crossref_primary_10_1101_gr_277334_122
crossref_citationtrail_10_1101_gr_277334_122
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-04-00
20230401
PublicationDateYYYYMMDD 2023-04-01
PublicationDate_xml – month: 04
  year: 2023
  text: 2023-04-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle Genome research
PublicationTitleAlternate Genome Res
PublicationYear 2023
Publisher Cold Spring Harbor Laboratory Press
Publisher_xml – name: Cold Spring Harbor Laboratory Press
References 2023060612100680000_33.4.496.5
2023060612100680000_33.4.496.6
2023060612100680000_33.4.496.7
2023060612100680000_33.4.496.8
2023060612100680000_33.4.496.9
2023060612100680000_33.4.496.41
2023060612100680000_33.4.496.20
2023060612100680000_33.4.496.42
2023060612100680000_33.4.496.40
2023060612100680000_33.4.496.1
2023060612100680000_33.4.496.23
2023060612100680000_33.4.496.45
2023060612100680000_33.4.496.2
2023060612100680000_33.4.496.24
2023060612100680000_33.4.496.46
2023060612100680000_33.4.496.3
2023060612100680000_33.4.496.21
2023060612100680000_33.4.496.43
2023060612100680000_33.4.496.4
2023060612100680000_33.4.496.22
2023060612100680000_33.4.496.44
2023060612100680000_33.4.496.27
2023060612100680000_33.4.496.28
2023060612100680000_33.4.496.25
2023060612100680000_33.4.496.47
2023060612100680000_33.4.496.26
2023060612100680000_33.4.496.48
2023060612100680000_33.4.496.29
2023060612100680000_33.4.496.30
2023060612100680000_33.4.496.31
2023060612100680000_33.4.496.12
2023060612100680000_33.4.496.34
2023060612100680000_33.4.496.13
2023060612100680000_33.4.496.35
2023060612100680000_33.4.496.10
2023060612100680000_33.4.496.32
2023060612100680000_33.4.496.11
2023060612100680000_33.4.496.33
2023060612100680000_33.4.496.16
2023060612100680000_33.4.496.38
2023060612100680000_33.4.496.17
2023060612100680000_33.4.496.39
2023060612100680000_33.4.496.14
2023060612100680000_33.4.496.36
2023060612100680000_33.4.496.15
2023060612100680000_33.4.496.37
2023060612100680000_33.4.496.18
2023060612100680000_33.4.496.19
References_xml – ident: 2023060612100680000_33.4.496.11
  doi: 10.1126/science.abf7117
– ident: 2023060612100680000_33.4.496.12
  doi: 10.1038/s41588-022-01043-w
– ident: 2023060612100680000_33.4.496.20
  doi: 10.1093/bioinformatics/bty191
– ident: 2023060612100680000_33.4.496.23
  doi: 10.1038/s41586-023-05896-x
– ident: 2023060612100680000_33.4.496.27
  doi: 10.1101/gr.263566.120
– ident: 2023060612100680000_33.4.496.1
  doi: 10.1038/nature15393
– ident: 2023060612100680000_33.4.496.35
  doi: 10.1038/s41587-023-01662-6
– ident: 2023060612100680000_33.4.496.44
  doi: 10.1111/ahg.12364
– ident: 2023060612100680000_33.4.496.8
  doi: 10.1038/ng.3092
– ident: 2023060612100680000_33.4.496.22
  doi: 10.1093/bioinformatics/btp352
– ident: 2023060612100680000_33.4.496.17
  doi: 10.1038/nmeth0810-576
– ident: 2023060612100680000_33.4.496.5
  doi: 10.1038/s41592-020-01056-5
– ident: 2023060612100680000_33.4.496.2
  doi: 10.1126/science.abl4178
– ident: 2023060612100680000_33.4.496.21
  doi: 10.1093/bioinformatics/btp698
– ident: 2023060612100680000_33.4.496.31
  doi: 10.1038/nbt.4235
– ident: 2023060612100680000_33.4.496.37
  doi: 10.1038/nprot.2017.029
– ident: 2023060612100680000_33.4.496.28
  doi: 10.1126/science.abj6987
– ident: 2023060612100680000_33.4.496.29
  doi: 10.1089/cmb.2014.0157
– ident: 2023060612100680000_33.4.496.47
  doi: 10.1038/s41586-022-04601-8
– ident: 2023060612100680000_33.4.496.32
  doi: 10.1101/gr.209841.116
– ident: 2023060612100680000_33.4.496.48
  doi: 10.1038/s41587-019-0217-9
– ident: 2023060612100680000_33.4.496.10
  doi: 10.1093/nar/30.11.2478
– ident: 2023060612100680000_33.4.496.4
  doi: 10.1038/s41467-018-08148-z
– ident: 2023060612100680000_33.4.496.3
  doi: 10.1016/j.cell.2022.08.004
– ident: 2023060612100680000_33.4.496.25
  doi: 10.1016/j.gpb.2016.05.004
– ident: 2023060612100680000_33.4.496.38
  doi: 10.1038/s41587-019-0366-x
– ident: 2023060612100680000_33.4.496.26
  doi: 10.1016/j.ajhg.2022.02.014
– ident: 2023060612100680000_33.4.496.19
  doi: 10.1038/s41587-019-0072-8
– ident: 2023060612100680000_33.4.496.34
  doi: 10.1016/j.cell.2022.04.017
– ident: 2023060612100680000_33.4.496.41
  doi: 10.1126/science.1197005
– ident: 2023060612100680000_33.4.496.15
  doi: 10.1101/2023.04.05.535718
– ident: 2023060612100680000_33.4.496.39
  doi: 10.1038/s41587-020-0503-6
– ident: 2023060612100680000_33.4.496.9
  doi: 10.1038/ng.909
– ident: 2023060612100680000_33.4.496.16
  doi: 10.1038/s41586-023-05976-y
– ident: 2023060612100680000_33.4.496.45
  doi: 10.1126/science.abj6965
– ident: 2023060612100680000_33.4.496.46
  doi: 10.1038/s41586-023-05895-y
– ident: 2023060612100680000_33.4.496.18
  doi: 10.1038/s41586-022-05325-5
– ident: 2023060612100680000_33.4.496.14
  doi: 10.1038/s41587-020-0711-0
– ident: 2023060612100680000_33.4.496.40
  doi: 10.1038/ng1862
– ident: 2023060612100680000_33.4.496.42
  doi: 10.1038/nature15394
– ident: 2023060612100680000_33.4.496.7
  doi: 10.1101/705616
– ident: 2023060612100680000_33.4.496.33
  doi: 10.1038/s41587-020-0719-5
– ident: 2023060612100680000_33.4.496.36
– ident: 2023060612100680000_33.4.496.13
  doi: 10.1038/nmeth.2206
– ident: 2023060612100680000_33.4.496.24
  doi: 10.1038/s41586-021-03420-7
– ident: 2023060612100680000_33.4.496.30
  doi: 10.1186/s12915-018-0535-2
– ident: 2023060612100680000_33.4.496.6
  doi: 10.1038/s41587-022-01261-x
– ident: 2023060612100680000_33.4.496.43
  doi: 10.1093/bioinformatics/btv098
SSID ssj0003488
Score 2.5488605
Snippet There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data....
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 496
SubjectTerms Chromosomes
DNA, Satellite - genetics
Genomes
Haplotypes
Humans
Polymorphism, Genetic
Satellite DNA
Segmental Duplications, Genomic
Sequence Analysis, DNA
Title Gaps and complex structurally variant loci in phased genome assemblies
URI https://www.ncbi.nlm.nih.gov/pubmed/37164484
https://www.proquest.com/docview/2814050795
https://www.proquest.com/docview/2812508700
https://pubmed.ncbi.nlm.nih.gov/PMC10234299
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagCOgFQcsjUJCRUC9LljycODlwWAptAbVCqEW9RbbjbFfaTVbbFNH-esavbLalEnCxothxJH9f7Bln_A1CbxJGUxpw4pOqgoJniZ-B2epz5YqJMqO51ik4OEz3j8mXk-RkmRVVny5p-VBc_vFcyf-gCvcAV3VK9h-Q7TqFG3AN-EIJCEP5VxjvsfmZO5g2n8pfA6MGq5Q0pheDn-AGw7gNYLma6GDxU1ixSpUzuZnJARjNcsanLobQ2qd7ptJKAHVbxd-aBUwwZpu1FwV__v4H0GhsUD-YtDqqdBmCqPIOmR1xu6uzjMj-3lwylzJ7BHUqD-JhfwciinuBK9LOmiT3E2Jyrrhp1ehbWPqQ3hxJTArb63O3zhkwXgwjSuOYDENzYLmH43ymgYyVj2ckUK-KZbuq2-gO9BKplBYfP3_tluYYpqtOaDV8t_KudXTPPb1qo1xzPK7Gz_YMkqOH6IH1JPDI0OIRuiXrDbQ5qlnbzC7wNtaxvfqnyQa6-8Fd3d9xGf420a7iDwb-YMsf3OcPtvzBij94UmPDH2z4g5f8eYyOdz8d7ez7Nq-GL8D8bn0qUhaFKaeRAPsxFTIOSpZQwfKSV1kWl5EMeShZmpJKudhckJLQoMwJrNKq-RO0Vje1fIZwJkRVZbEsq6gkGawPJSUJrUQaClalEffQWzeShbCi8yr3ybTQzmcQFuNFYTAoAAMPbXfN50Zt5aaGWw6Wwn6QZ0Wk1NvAv8kTD73uqmFM1T8wVsvmXLcBox8WqcBDTw2K3Zsc_B7KVvDtGigp9tWaenKqJdmVAIqy7J7f2OkLtL78drbQGuApX4I92_JXmqO_AfsWokA
linkProvider Flying Publisher
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Gaps+and+complex+structurally+variant+loci+in+phased+genome+assemblies&rft.jtitle=Genome+research&rft.au=Porubsky%2C+David&rft.au=Vollger%2C+Mitchell+R&rft.au=Harvey%2C+William+T&rft.au=Rozanski%2C+Allison+N&rft.date=2023-04-01&rft.eissn=1549-5469&rft.volume=33&rft.issue=4&rft.spage=496&rft_id=info:doi/10.1101%2Fgr.277334.122&rft_id=info%3Apmid%2F37164484&rft.externalDocID=37164484
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1088-9051&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1088-9051&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1088-9051&client=summon