Gaps and complex structurally variant loci in phased genome assemblies

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, asse...

Full description

Saved in:

Bibliographic Details
Published in	Genome research Vol. 33; no. 4; pp. 496 - 510
Main Authors	Porubsky, David, Vollger, Mitchell R., Harvey, William T., Rozanski, Allison N., Ebert, Peter, Hickey, Glenn, Hasenfeld, Patrick, Sanders, Ashley D., Stober, Catherine, Korbel, Jan O., Paten, Benedict, Marschall, Tobias, Eichler, Evan E.
Format	Journal Article
Language	English
Published	United States Cold Spring Harbor Laboratory Press 01.04.2023
Subjects	Chromosomes DNA, Satellite - genetics Genomes Haplotypes Humans Polymorphism, Genetic Satellite DNA Segmental Duplications, Genomic Sequence Analysis, DNA
Online Access	Get full text

Cover

Loading…

Abstract	There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
AbstractList	There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6–7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation. There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
Author	Korbel, Jan O. Ebert, Peter Stober, Catherine Marschall, Tobias Paten, Benedict Hasenfeld, Patrick Eichler, Evan E. Porubsky, David Vollger, Mitchell R. Rozanski, Allison N. Harvey, William T. Sanders, Ashley D. Hickey, Glenn
Author_xml	– sequence: 1 givenname: David orcidid: 0000-0001-8414-8966 surname: Porubsky fullname: Porubsky, David – sequence: 2 givenname: Mitchell R. orcidid: 0000-0002-8651-1615 surname: Vollger fullname: Vollger, Mitchell R. – sequence: 3 givenname: William T. orcidid: 0000-0003-0646-7528 surname: Harvey fullname: Harvey, William T. – sequence: 4 givenname: Allison N. orcidid: 0000-0002-5034-1773 surname: Rozanski fullname: Rozanski, Allison N. – sequence: 5 givenname: Peter orcidid: 0000-0001-7441-532X surname: Ebert fullname: Ebert, Peter – sequence: 6 givenname: Glenn orcidid: 0000-0002-2280-9404 surname: Hickey fullname: Hickey, Glenn – sequence: 7 givenname: Patrick orcidid: 0000-0003-2319-2482 surname: Hasenfeld fullname: Hasenfeld, Patrick – sequence: 8 givenname: Ashley D. orcidid: 0000-0003-3945-0677 surname: Sanders fullname: Sanders, Ashley D. – sequence: 9 givenname: Catherine orcidid: 0000-0002-9481-013X surname: Stober fullname: Stober, Catherine – sequence: 10 givenname: Jan O. orcidid: 0000-0002-2798-3794 surname: Korbel fullname: Korbel, Jan O. – sequence: 11 givenname: Benedict orcidid: 0000-0001-8863-3539 surname: Paten fullname: Paten, Benedict – sequence: 12 givenname: Tobias orcidid: 0000-0002-9376-1030 surname: Marschall fullname: Marschall, Tobias – sequence: 13 givenname: Evan E. orcidid: 0000-0002-8246-4014 surname: Eichler fullname: Eichler, Evan E.
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/37164484$$D View this record in MEDLINE/PubMed
BookMark	eNp1kc9LHTEQx0NRqr722GsJeOlln_m1SfYkItUWBC_2HLLZec9INlmTXdH_vnk8K63Q0wzMZ758Z74n6CCmCAh9oWRNKaFn27xmSnEu1pSxD-iYtqJrWiG7g9oTrZuOtPQInZTyQAjhQuuP6IgrKoXQ4hhdXdupYBsH7NI4BXjGZc6Lm5dsQ3jBTzZ7G2cckvPYRzzd2wID3kJMI2BbCox98FA-ocONDQU-v9YV-nX1_e7yR3Nze_3z8uKmcYLKuVFOWkZlr5hTgksHnAy2Vc52Q7_Rmg8MaE_BSik2RDDZOzEIRYZOgHU7fIXO97rT0o8wOIhzNWqm7EebX0yy3vw7if7ebNOToYRxwbquKnx7VcjpcYEym9EXByHYCGkphmnKWqJV_dUKnb5DH9KSY71vRwnSEtW1lfr6t6U3L39-XIFmD7icSsmweUMoMbsMzTabfYamZlh5_o53frazT7uLfPjP1m8UW5__
CitedBy_id	crossref_primary_10_1101_gr_277175_122 crossref_primary_10_1038_s41588_024_02051_8 crossref_primary_10_1186_s13059_023_02995_w crossref_primary_10_1038_s41576_024_00718_w crossref_primary_10_1042_ETLS20230074 crossref_primary_10_1038_s41592_024_02269_8 crossref_primary_10_1038_s41586_023_05895_y crossref_primary_10_1038_s41586_023_05896_x crossref_primary_10_17816_fm16167 crossref_primary_10_1038_s41467_025_57505_2 crossref_primary_10_1016_j_scib_2023_06_014 crossref_primary_10_1186_s13023_024_03307_6 crossref_primary_10_1016_j_gde_2024_102233 crossref_primary_10_1186_s13059_023_02919_8 crossref_primary_10_1016_j_cell_2024_01_002 crossref_primary_10_1038_s41435_024_00279_2 crossref_primary_10_1101_gr_279346_124 crossref_primary_10_1016_j_cell_2024_01_052
Cites_doi	10.1126/science.abf7117 10.1038/s41588-022-01043-w 10.1093/bioinformatics/bty191 10.1038/s41586-023-05896-x 10.1101/gr.263566.120 10.1038/nature15393 10.1038/s41587-023-01662-6 10.1111/ahg.12364 10.1038/ng.3092 10.1093/bioinformatics/btp352 10.1038/nmeth0810-576 10.1038/s41592-020-01056-5 10.1126/science.abl4178 10.1093/bioinformatics/btp698 10.1038/nbt.4235 10.1038/nprot.2017.029 10.1126/science.abj6987 10.1089/cmb.2014.0157 10.1038/s41586-022-04601-8 10.1101/gr.209841.116 10.1038/s41587-019-0217-9 10.1093/nar/30.11.2478 10.1038/s41467-018-08148-z 10.1016/j.cell.2022.08.004 10.1016/j.gpb.2016.05.004 10.1038/s41587-019-0366-x 10.1016/j.ajhg.2022.02.014 10.1038/s41587-019-0072-8 10.1016/j.cell.2022.04.017 10.1126/science.1197005 10.1101/2023.04.05.535718 10.1038/s41587-020-0503-6 10.1038/ng.909 10.1038/s41586-023-05976-y 10.1126/science.abj6965 10.1038/s41586-023-05895-y 10.1038/s41586-022-05325-5 10.1038/s41587-020-0711-0 10.1038/ng1862 10.1038/nature15394 10.1101/705616 10.1038/s41587-020-0719-5 10.1038/nmeth.2206 10.1038/s41586-021-03420-7 10.1186/s12915-018-0535-2 10.1038/s41587-022-01261-x 10.1093/bioinformatics/btv098
ContentType	Journal Article
Contributor	Ebler, Jana Prins, Pjotr Green, Richard E Martin, Fergal J Billis, Konstantinos Mountcastle, Jacquelyn Fairley, Susan Frankish, Adam Lu, Tsung-Yu Markello, Charles Mwaniki, Moses Njagi Guarracino, Andrea Baker, Carl A Jarvis, Erich D Monlong, Jean Giron, Carlos Garcia Pesout, Trevor Cornejo, Omar E Gao, Yan Paten, Benedict Colonna, Vincenza Rautiainen, Mikko Flicek, Paul Rhie, Arang Nurk, Sergey Chaisson, Mark J P Ji, Hanlee P Doerr, Daniel Kolesnikov, Alexey Olsen, Hugh E Harvey, William T Chang, Pi-Chuan Belyaeva, Anastasiya Garg, Shilpa Magalhães, Hugo Cook, Daniel E Groza, Cristian Hoekzema, Kendra Marco-Sola, Santiago Asri, Mobin Chu, Justin Lu, Shuangjia Munson, Katherine M Cheng, Haoyu Korbel, Jan O Lee, HoJoon Cody, Sarah Chang, Xian H Ebert, Peter Haussler, David Olson, Nathan D Marijon, Pierre Garrison, Nanibaa' A McDaniel, Jennifer Fedrigo, Olivier Hall, Ira M Fischer, Christian Fulton, Robert S Haukness, Marina Kordosky, Jennifer Bourque, Guillaume Carroll, Andrew Regier, Allison A Koren, Sergey Garrison, Erik Mitchell, Matthew W Nattesta
Contributor_xml	– sequence: 1 givenname: Haley J surname: Abel fullname: Abel, Haley J – sequence: 2 givenname: Lucinda L surname: Antonacci-Fulton fullname: Antonacci-Fulton, Lucinda L – sequence: 3 givenname: Mobin surname: Asri fullname: Asri, Mobin – sequence: 4 givenname: Gunjan surname: Baid fullname: Baid, Gunjan – sequence: 5 givenname: Carl A surname: Baker fullname: Baker, Carl A – sequence: 6 givenname: Anastasiya surname: Belyaeva fullname: Belyaeva, Anastasiya – sequence: 7 givenname: Konstantinos surname: Billis fullname: Billis, Konstantinos – sequence: 8 givenname: Guillaume surname: Bourque fullname: Bourque, Guillaume – sequence: 9 givenname: Silvia surname: Buonaiuto fullname: Buonaiuto, Silvia – sequence: 10 givenname: Andrew surname: Carroll fullname: Carroll, Andrew – sequence: 11 givenname: Mark J P surname: Chaisson fullname: Chaisson, Mark J P – sequence: 12 givenname: Pi-Chuan surname: Chang fullname: Chang, Pi-Chuan – sequence: 13 givenname: Xian H surname: Chang fullname: Chang, Xian H – sequence: 14 givenname: Haoyu surname: Cheng fullname: Cheng, Haoyu – sequence: 15 givenname: Justin surname: Chu fullname: Chu, Justin – sequence: 16 givenname: Sarah surname: Cody fullname: Cody, Sarah – sequence: 17 givenname: Vincenza surname: Colonna fullname: Colonna, Vincenza – sequence: 18 givenname: Daniel E surname: Cook fullname: Cook, Daniel E – sequence: 19 givenname: Robert M surname: Cook-Deegan fullname: Cook-Deegan, Robert M – sequence: 20 givenname: Omar E surname: Cornejo fullname: Cornejo, Omar E – sequence: 21 givenname: Mark surname: Diekhans fullname: Diekhans, Mark – sequence: 22 givenname: Daniel surname: Doerr fullname: Doerr, Daniel – sequence: 23 givenname: Peter surname: Ebert fullname: Ebert, Peter – sequence: 24 givenname: Jana surname: Ebler fullname: Ebler, Jana – sequence: 25 givenname: Evan E surname: Eichler fullname: Eichler, Evan E – sequence: 26 givenname: Jordan M surname: Eizenga fullname: Eizenga, Jordan M – sequence: 27 givenname: Susan surname: Fairley fullname: Fairley, Susan – sequence: 28 givenname: Olivier surname: Fedrigo fullname: Fedrigo, Olivier – sequence: 29 givenname: Adam L surname: Felsenfeld fullname: Felsenfeld, Adam L – sequence: 30 givenname: Xiaowen surname: Feng fullname: Feng, Xiaowen – sequence: 31 givenname: Christian surname: Fischer fullname: Fischer, Christian – sequence: 32 givenname: Paul surname: Flicek fullname: Flicek, Paul – sequence: 33 givenname: Giulio surname: Formenti fullname: Formenti, Giulio – sequence: 34 givenname: Adam surname: Frankish fullname: Frankish, Adam – sequence: 35 givenname: Robert S surname: Fulton fullname: Fulton, Robert S – sequence: 36 givenname: Yan surname: Gao fullname: Gao, Yan – sequence: 37 givenname: Shilpa surname: Garg fullname: Garg, Shilpa – sequence: 38 givenname: Erik surname: Garrison fullname: Garrison, Erik – sequence: 39 givenname: Nanibaa' A surname: Garrison fullname: Garrison, Nanibaa' A – sequence: 40 givenname: Carlos Garcia surname: Giron fullname: Giron, Carlos Garcia – sequence: 41 givenname: Richard E surname: Green fullname: Green, Richard E – sequence: 42 givenname: Cristian surname: Groza fullname: Groza, Cristian – sequence: 43 givenname: Andrea surname: Guarracino fullname: Guarracino, Andrea – sequence: 44 givenname: Leanne surname: Haggerty fullname: Haggerty, Leanne – sequence: 45 givenname: Ira M surname: Hall fullname: Hall, Ira M – sequence: 46 givenname: William T surname: Harvey fullname: Harvey, William T – sequence: 47 givenname: Marina surname: Haukness fullname: Haukness, Marina – sequence: 48 givenname: David surname: Haussler fullname: Haussler, David – sequence: 49 givenname: Simon surname: Heumos fullname: Heumos, Simon – sequence: 50 givenname: Glenn surname: Hickey fullname: Hickey, Glenn – sequence: 51 givenname: Kendra surname: Hoekzema fullname: Hoekzema, Kendra – sequence: 52 givenname: Thibaut surname: Hourlier fullname: Hourlier, Thibaut – sequence: 53 givenname: Kerstin surname: Howe fullname: Howe, Kerstin – sequence: 54 givenname: Miten surname: Jain fullname: Jain, Miten – sequence: 55 givenname: Erich D surname: Jarvis fullname: Jarvis, Erich D – sequence: 56 givenname: Hanlee P surname: Ji fullname: Ji, Hanlee P – sequence: 57 givenname: Eimear E surname: Kenny fullname: Kenny, Eimear E – sequence: 58 givenname: Barbara A surname: Koenig fullname: Koenig, Barbara A – sequence: 59 givenname: Alexey surname: Kolesnikov fullname: Kolesnikov, Alexey – sequence: 60 givenname: Jan O surname: Korbel fullname: Korbel, Jan O – sequence: 61 givenname: Jennifer surname: Kordosky fullname: Kordosky, Jennifer – sequence: 62 givenname: Sergey surname: Koren fullname: Koren, Sergey – sequence: 63 givenname: HoJoon surname: Lee fullname: Lee, HoJoon – sequence: 64 givenname: Alexandra P surname: Lewis fullname: Lewis, Alexandra P – sequence: 65 givenname: Heng surname: Li fullname: Li, Heng – sequence: 66 givenname: Wen-Wei surname: Liao fullname: Liao, Wen-Wei – sequence: 67 givenname: Shuangjia surname: Lu fullname: Lu, Shuangjia – sequence: 68 givenname: Tsung-Yu surname: Lu fullname: Lu, Tsung-Yu – sequence: 69 givenname: Julian K surname: Lucas fullname: Lucas, Julian K – sequence: 70 givenname: Hugo surname: Magalhães fullname: Magalhães, Hugo – sequence: 71 givenname: Santiago surname: Marco-Sola fullname: Marco-Sola, Santiago – sequence: 72 givenname: Pierre surname: Marijon fullname: Marijon, Pierre – sequence: 73 givenname: Charles surname: Markello fullname: Markello, Charles – sequence: 74 givenname: Tobias surname: Marschall fullname: Marschall, Tobias – sequence: 75 givenname: Fergal J surname: Martin fullname: Martin, Fergal J – sequence: 76 givenname: Ann surname: McCartney fullname: McCartney, Ann – sequence: 77 givenname: Jennifer surname: McDaniel fullname: McDaniel, Jennifer – sequence: 78 givenname: Karen H surname: Miga fullname: Miga, Karen H – sequence: 79 givenname: Matthew W surname: Mitchell fullname: Mitchell, Matthew W – sequence: 80 givenname: Jean surname: Monlong fullname: Monlong, Jean – sequence: 81 givenname: Jacquelyn surname: Mountcastle fullname: Mountcastle, Jacquelyn – sequence: 82 givenname: Katherine M surname: Munson fullname: Munson, Katherine M – sequence: 83 givenname: Moses Njagi surname: Mwaniki fullname: Mwaniki, Moses Njagi – sequence: 84 givenname: Maria surname: Nattestad fullname: Nattestad, Maria – sequence: 85 givenname: Adam M surname: Novak fullname: Novak, Adam M – sequence: 86 givenname: Sergey surname: Nurk fullname: Nurk, Sergey – sequence: 87 givenname: Hugh E surname: Olsen fullname: Olsen, Hugh E – sequence: 88 givenname: Nathan D surname: Olson fullname: Olson, Nathan D – sequence: 89 givenname: Benedict surname: Paten fullname: Paten, Benedict – sequence: 90 givenname: Trevor surname: Pesout fullname: Pesout, Trevor – sequence: 91 givenname: Adam M surname: Phillippy fullname: Phillippy, Adam M – sequence: 92 givenname: Alice B surname: Popejoy fullname: Popejoy, Alice B – sequence: 93 givenname: David surname: Porubsky fullname: Porubsky, David – sequence: 94 givenname: Pjotr surname: Prins fullname: Prins, Pjotr – sequence: 95 givenname: Daniela surname: Puiu fullname: Puiu, Daniela – sequence: 96 givenname: Mikko surname: Rautiainen fullname: Rautiainen, Mikko – sequence: 97 givenname: Allison A surname: Regier fullname: Regier, Allison A – sequence: 98 givenname: Arang surname: Rhie fullname: Rhie, Arang – sequence: 99 givenname: Samuel surname: Sacco fullname: Sacco, Samuel – sequence: 100 givenname: Ashley D surname: Sanders fullname: Sanders, Ashley D
Copyright	2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press. Copyright Cold Spring Harbor Laboratory Press Apr 2023 2023
Copyright_xml	– notice: 2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press. – notice: Copyright Cold Spring Harbor Laboratory Press Apr 2023 – notice: 2023
CorporateAuthor	Human Pangenome Reference Consortium
CorporateAuthor_xml	– name: Human Pangenome Reference Consortium
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7TM 8FD FR3 P64 RC3 7X8 5PM
DOI	10.1101/gr.277334.122
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Nucleic Acids Abstracts Technology Research Database Engineering Research Database Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Genetics Abstracts Engineering Research Database Technology Research Database Nucleic Acids Abstracts Biotechnology and BioEngineering Abstracts MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic CrossRef Genetics Abstracts
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Anatomy & Physiology Chemistry Biology
DocumentTitleAlternate	Porubsky et al
EISSN	1549-5469
EndPage	510
ExternalDocumentID	PMC10234299 37164484 10_1101_gr_277334_122
Genre	Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural
GrantInformation_xml	– fundername: NHGRI NIH HHS grantid: R01 HG002385 – fundername: NHGRI NIH HHS grantid: U01 HG010971 – fundername: NHGRI NIH HHS grantid: U24 HG010262 – fundername: NHGRI NIH HHS grantid: R01 HG010485 – fundername: NHGRI NIH HHS grantid: U01 HG010963 – fundername: ; – fundername: Marie Sklodowska-Curie grantid: 956229 – fundername: ; grantid: 5R01HG002385; 5U01HG010971; 1U01HG010973 – fundername: European Union's Horizon 2020 research and innovation programme
GroupedDBID	--- .GJ 18M 29H 2WC 39C 4.4 53G 5GY 5RE 5VS AAFWJ AAYOK AAYXX AAZTW ABDIX ABDNZ ACGFO ACLKE ACYGS ADBBV ADNWM AEILP AENEX AHPUY AI. ALMA_UNASSIGNED_HOLDINGS BAWUL BTFSW C1A CITATION CS3 DIK DU5 E3Z EBS EJD F5P FRP GX1 H13 HYE IH2 K-O KQ8 MV1 R.V RCX RHI RNS RPM RXW SJN TAE TR2 VH1 W8F WOQ YKV ZCG ZGI ZXP CGR CUY CVF ECM EIF NPM 7TM 8FD FR3 P64 RC3 7X8 5PM
ID	FETCH-LOGICAL-c416t-7c6a216b72c7436ce30da57ca9dbf883d2e1b1ea664f0426bc4d470d94eac6ce3
ISSN	1088-9051 1549-5469
IngestDate	Thu Aug 21 18:36:58 EDT 2025 Thu Jul 10 17:20:32 EDT 2025 Sun Jun 29 13:13:04 EDT 2025 Sun Jul 20 01:30:20 EDT 2025 Thu Apr 24 23:01:21 EDT 2025 Tue Jul 01 02:20:47 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	2023 Porubsky et al.; Published by Cold Spring Harbor Laboratory Press. This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c416t-7c6a216b72c7436ce30da57ca9dbf883d2e1b1ea664f0426bc4d470d94eac6ce3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 A complete list of contributing Consortium members appears at the end of this paper.
ORCID	0000-0003-0646-7528 0000-0002-9376-1030 0000-0002-8246-4014 0000-0002-9481-013X 0000-0001-7441-532X 0000-0002-2280-9404 0000-0003-2319-2482 0000-0001-8863-3539 0000-0003-3945-0677 0000-0002-2798-3794 0000-0001-8414-8966 0000-0002-8651-1615 0000-0002-5034-1773
OpenAccessLink	https://pubmed.ncbi.nlm.nih.gov/PMC10234299
PMID	37164484
PQID	2814050795
PQPubID	2049132
PageCount	15
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_10234299 proquest_miscellaneous_2812508700 proquest_journals_2814050795 pubmed_primary_37164484 crossref_primary_10_1101_gr_277334_122 crossref_citationtrail_10_1101_gr_277334_122
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-04-00 20230401
PublicationDateYYYYMMDD	2023-04-01
PublicationDate_xml	– month: 04 year: 2023 text: 2023-04-00
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	Genome research
PublicationTitleAlternate	Genome Res
PublicationYear	2023
Publisher	Cold Spring Harbor Laboratory Press
Publisher_xml	– name: Cold Spring Harbor Laboratory Press
References	2023060612100680000_33.4.496.5 2023060612100680000_33.4.496.6 2023060612100680000_33.4.496.7 2023060612100680000_33.4.496.8 2023060612100680000_33.4.496.9 2023060612100680000_33.4.496.41 2023060612100680000_33.4.496.20 2023060612100680000_33.4.496.42 2023060612100680000_33.4.496.40 2023060612100680000_33.4.496.1 2023060612100680000_33.4.496.23 2023060612100680000_33.4.496.45 2023060612100680000_33.4.496.2 2023060612100680000_33.4.496.24 2023060612100680000_33.4.496.46 2023060612100680000_33.4.496.3 2023060612100680000_33.4.496.21 2023060612100680000_33.4.496.43 2023060612100680000_33.4.496.4 2023060612100680000_33.4.496.22 2023060612100680000_33.4.496.44 2023060612100680000_33.4.496.27 2023060612100680000_33.4.496.28 2023060612100680000_33.4.496.25 2023060612100680000_33.4.496.47 2023060612100680000_33.4.496.26 2023060612100680000_33.4.496.48 2023060612100680000_33.4.496.29 2023060612100680000_33.4.496.30 2023060612100680000_33.4.496.31 2023060612100680000_33.4.496.12 2023060612100680000_33.4.496.34 2023060612100680000_33.4.496.13 2023060612100680000_33.4.496.35 2023060612100680000_33.4.496.10 2023060612100680000_33.4.496.32 2023060612100680000_33.4.496.11 2023060612100680000_33.4.496.33 2023060612100680000_33.4.496.16 2023060612100680000_33.4.496.38 2023060612100680000_33.4.496.17 2023060612100680000_33.4.496.39 2023060612100680000_33.4.496.14 2023060612100680000_33.4.496.36 2023060612100680000_33.4.496.15 2023060612100680000_33.4.496.37 2023060612100680000_33.4.496.18 2023060612100680000_33.4.496.19
References_xml	– ident: 2023060612100680000_33.4.496.11 doi: 10.1126/science.abf7117 – ident: 2023060612100680000_33.4.496.12 doi: 10.1038/s41588-022-01043-w – ident: 2023060612100680000_33.4.496.20 doi: 10.1093/bioinformatics/bty191 – ident: 2023060612100680000_33.4.496.23 doi: 10.1038/s41586-023-05896-x – ident: 2023060612100680000_33.4.496.27 doi: 10.1101/gr.263566.120 – ident: 2023060612100680000_33.4.496.1 doi: 10.1038/nature15393 – ident: 2023060612100680000_33.4.496.35 doi: 10.1038/s41587-023-01662-6 – ident: 2023060612100680000_33.4.496.44 doi: 10.1111/ahg.12364 – ident: 2023060612100680000_33.4.496.8 doi: 10.1038/ng.3092 – ident: 2023060612100680000_33.4.496.22 doi: 10.1093/bioinformatics/btp352 – ident: 2023060612100680000_33.4.496.17 doi: 10.1038/nmeth0810-576 – ident: 2023060612100680000_33.4.496.5 doi: 10.1038/s41592-020-01056-5 – ident: 2023060612100680000_33.4.496.2 doi: 10.1126/science.abl4178 – ident: 2023060612100680000_33.4.496.21 doi: 10.1093/bioinformatics/btp698 – ident: 2023060612100680000_33.4.496.31 doi: 10.1038/nbt.4235 – ident: 2023060612100680000_33.4.496.37 doi: 10.1038/nprot.2017.029 – ident: 2023060612100680000_33.4.496.28 doi: 10.1126/science.abj6987 – ident: 2023060612100680000_33.4.496.29 doi: 10.1089/cmb.2014.0157 – ident: 2023060612100680000_33.4.496.47 doi: 10.1038/s41586-022-04601-8 – ident: 2023060612100680000_33.4.496.32 doi: 10.1101/gr.209841.116 – ident: 2023060612100680000_33.4.496.48 doi: 10.1038/s41587-019-0217-9 – ident: 2023060612100680000_33.4.496.10 doi: 10.1093/nar/30.11.2478 – ident: 2023060612100680000_33.4.496.4 doi: 10.1038/s41467-018-08148-z – ident: 2023060612100680000_33.4.496.3 doi: 10.1016/j.cell.2022.08.004 – ident: 2023060612100680000_33.4.496.25 doi: 10.1016/j.gpb.2016.05.004 – ident: 2023060612100680000_33.4.496.38 doi: 10.1038/s41587-019-0366-x – ident: 2023060612100680000_33.4.496.26 doi: 10.1016/j.ajhg.2022.02.014 – ident: 2023060612100680000_33.4.496.19 doi: 10.1038/s41587-019-0072-8 – ident: 2023060612100680000_33.4.496.34 doi: 10.1016/j.cell.2022.04.017 – ident: 2023060612100680000_33.4.496.41 doi: 10.1126/science.1197005 – ident: 2023060612100680000_33.4.496.15 doi: 10.1101/2023.04.05.535718 – ident: 2023060612100680000_33.4.496.39 doi: 10.1038/s41587-020-0503-6 – ident: 2023060612100680000_33.4.496.9 doi: 10.1038/ng.909 – ident: 2023060612100680000_33.4.496.16 doi: 10.1038/s41586-023-05976-y – ident: 2023060612100680000_33.4.496.45 doi: 10.1126/science.abj6965 – ident: 2023060612100680000_33.4.496.46 doi: 10.1038/s41586-023-05895-y – ident: 2023060612100680000_33.4.496.18 doi: 10.1038/s41586-022-05325-5 – ident: 2023060612100680000_33.4.496.14 doi: 10.1038/s41587-020-0711-0 – ident: 2023060612100680000_33.4.496.40 doi: 10.1038/ng1862 – ident: 2023060612100680000_33.4.496.42 doi: 10.1038/nature15394 – ident: 2023060612100680000_33.4.496.7 doi: 10.1101/705616 – ident: 2023060612100680000_33.4.496.33 doi: 10.1038/s41587-020-0719-5 – ident: 2023060612100680000_33.4.496.36 – ident: 2023060612100680000_33.4.496.13 doi: 10.1038/nmeth.2206 – ident: 2023060612100680000_33.4.496.24 doi: 10.1038/s41586-021-03420-7 – ident: 2023060612100680000_33.4.496.30 doi: 10.1186/s12915-018-0535-2 – ident: 2023060612100680000_33.4.496.6 doi: 10.1038/s41587-022-01261-x – ident: 2023060612100680000_33.4.496.43 doi: 10.1093/bioinformatics/btv098
SSID	ssj0003488
Score	2.5488605
Snippet	There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data....
SourceID	pubmedcentral proquest pubmed crossref
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	496
SubjectTerms	Chromosomes DNA, Satellite - genetics Genomes Haplotypes Humans Polymorphism, Genetic Satellite DNA Segmental Duplications, Genomic Sequence Analysis, DNA
Title	Gaps and complex structurally variant loci in phased genome assemblies
URI	https://www.ncbi.nlm.nih.gov/pubmed/37164484 https://www.proquest.com/docview/2814050795 https://www.proquest.com/docview/2812508700 https://pubmed.ncbi.nlm.nih.gov/PMC10234299
Volume	33
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagCOgFQcsjUJCRUC9LljycODlwWAptAbVCqEW9RbbjbFfaTVbbFNH-esavbLalEnCxothxJH9f7Bln_A1CbxJGUxpw4pOqgoJniZ-B2epz5YqJMqO51ik4OEz3j8mXk-RkmRVVny5p-VBc_vFcyf-gCvcAV3VK9h-Q7TqFG3AN-EIJCEP5VxjvsfmZO5g2n8pfA6MGq5Q0pheDn-AGw7gNYLma6GDxU1ixSpUzuZnJARjNcsanLobQ2qd7ptJKAHVbxd-aBUwwZpu1FwV__v4H0GhsUD-YtDqqdBmCqPIOmR1xu6uzjMj-3lwylzJ7BHUqD-JhfwciinuBK9LOmiT3E2Jyrrhp1ehbWPqQ3hxJTArb63O3zhkwXgwjSuOYDENzYLmH43ymgYyVj2ckUK-KZbuq2-gO9BKplBYfP3_tluYYpqtOaDV8t_KudXTPPb1qo1xzPK7Gz_YMkqOH6IH1JPDI0OIRuiXrDbQ5qlnbzC7wNtaxvfqnyQa6-8Fd3d9xGf420a7iDwb-YMsf3OcPtvzBij94UmPDH2z4g5f8eYyOdz8d7ez7Nq-GL8D8bn0qUhaFKaeRAPsxFTIOSpZQwfKSV1kWl5EMeShZmpJKudhckJLQoMwJrNKq-RO0Vje1fIZwJkRVZbEsq6gkGawPJSUJrUQaClalEffQWzeShbCi8yr3ybTQzmcQFuNFYTAoAAMPbXfN50Zt5aaGWw6Wwn6QZ0Wk1NvAv8kTD73uqmFM1T8wVsvmXLcBox8WqcBDTw2K3Zsc_B7KVvDtGigp9tWaenKqJdmVAIqy7J7f2OkLtL78drbQGuApX4I92_JXmqO_AfsWokA
linkProvider	Flying Publisher
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Gaps+and+complex+structurally+variant+loci+in+phased+genome+assemblies&rft.jtitle=Genome+research&rft.au=Porubsky%2C+David&rft.au=Vollger%2C+Mitchell+R&rft.au=Harvey%2C+William+T&rft.au=Rozanski%2C+Allison+N&rft.date=2023-04-01&rft.eissn=1549-5469&rft.volume=33&rft.issue=4&rft.spage=496&rft_id=info:doi/10.1101%2Fgr.277334.122&rft_id=info%3Apmid%2F37164484&rft.externalDocID=37164484
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1088-9051&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1088-9051&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1088-9051&client=summon