Symphonizing pileup and full-alignment for deep learning-based long-read variant calling

Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alig...

Full description

Saved in:
Bibliographic Details
Published inNature Computational Science Vol. 2; no. 12; pp. 797 - 803
Main Authors Zheng, Zhenxian, Li, Shumin, Su, Junhao, Leung, Amy Wing-Sze, Lam, Tak-Wah, Luo, Ruibang
Format Journal Article
LanguageEnglish
Published United States Nature Publishing Group 01.12.2022
Subjects
Online AccessGet full text
ISSN2662-8457
DOI10.1038/s43588-022-00387-x

Cover

Loading…
Abstract Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.
AbstractList Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.
Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.Leveraging both the simple pileup input and full-alignment input, small variant calling using noisy long reads has improved speed and accuracy.
Author Luo, Ruibang
Leung, Amy Wing-Sze
Su, Junhao
Lam, Tak-Wah
Zheng, Zhenxian
Li, Shumin
Author_xml – sequence: 1
  givenname: Zhenxian
  orcidid: 0000-0002-6546-2324
  surname: Zheng
  fullname: Zheng, Zhenxian
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China
– sequence: 2
  givenname: Shumin
  surname: Li
  fullname: Li, Shumin
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China
– sequence: 3
  givenname: Junhao
  orcidid: 0000-0002-8560-3999
  surname: Su
  fullname: Su, Junhao
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China
– sequence: 4
  givenname: Amy Wing-Sze
  surname: Leung
  fullname: Leung, Amy Wing-Sze
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China
– sequence: 5
  givenname: Tak-Wah
  surname: Lam
  fullname: Lam, Tak-Wah
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China
– sequence: 6
  givenname: Ruibang
  orcidid: 0000-0001-9711-6533
  surname: Luo
  fullname: Luo, Ruibang
  email: rbluo@cs.hku.hk
  organization: Department of Computer Science, The University of Hong Kong, Hong Kong, China. rbluo@cs.hku.hk
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38177392$$D View this record in MEDLINE/PubMed
BookMark eNo1kE9LxDAQxYMo7rruF_AgAc_RJJOk6VEW_8GCBxW8laRJ1y5pWtOtuH56A67M4c1782MG5gwdxz56hC4YvWYU9M0oQGpNKOeEZl-Q7yM050pxooUsZmg5jltKKZcMqIJTNAPNigJKPkfvL_tu-Ohj-9PGDR7a4KcBm-hwM4VATGg3sfNxh5s-Yef9gIM3KWaWWDN6h0Of2-SNw18mtSaTtQkhz8_RSWPC6JcHXaC3-7vX1SNZPz88rW7XpAZgO-KYkr62IFgtwZZUAJVUg_NNIS00iqkyJxYKAc7RXIxaoa3wombMKAoLdPW3d0j95-THXbXtpxTzyQo4l2WpJFeZujxQk-28q4bUdibtq_9HwC8VHWCJ
CitedBy_id crossref_primary_10_1186_s12864_024_11182_5
crossref_primary_10_3389_fendo_2024_1416433
crossref_primary_10_1002_ece3_70987
crossref_primary_10_1093_ve_veae073
crossref_primary_10_1093_bioinformatics_btae712
crossref_primary_10_1186_s13073_024_01391_8
crossref_primary_10_1186_s12859_023_05596_3
crossref_primary_10_1136_jmg_2024_110115
crossref_primary_10_1038_s41467_024_50079_5
crossref_primary_10_1093_gigascience_giaf007
crossref_primary_10_1016_j_xgen_2024_100674
crossref_primary_10_1128_jcm_01576_23
crossref_primary_10_22331_q_2024_12_11_1559
crossref_primary_10_1128_aem_01892_24
crossref_primary_10_1093_bib_bbae473
crossref_primary_10_1016_j_ajhg_2024_01_002
crossref_primary_10_1016_j_jmoldx_2024_12_003
crossref_primary_10_1128_mbio_03203_23
crossref_primary_10_1128_jcm_01083_24
crossref_primary_10_1093_g3journal_jkaf044
crossref_primary_10_1038_s41431_024_01599_7
crossref_primary_10_1016_j_ymthe_2024_11_025
crossref_primary_10_1093_g3journal_jkae113
crossref_primary_10_1038_s41586_023_06842_7
crossref_primary_10_1007_s12033_024_01213_7
crossref_primary_10_38001_ijlsb_1308355
crossref_primary_10_1093_bib_bbae269
crossref_primary_10_7554_eLife_98300
crossref_primary_10_1186_s12859_023_05434_6
crossref_primary_10_1016_j_fsigen_2024_103156
crossref_primary_10_1016_j_fsigen_2024_103154
crossref_primary_10_1093_bioinformatics_btae066
crossref_primary_10_3390_genes16020116
crossref_primary_10_1186_s13073_025_01448_2
crossref_primary_10_1007_s00239_023_10102_7
crossref_primary_10_1038_s41467_024_45688_z
crossref_primary_10_1038_s41467_024_47349_7
crossref_primary_10_1186_s40104_023_00896_3
crossref_primary_10_1038_s41467_024_50159_6
crossref_primary_10_1093_jac_dkae060
crossref_primary_10_3389_fbioe_2024_1395659
crossref_primary_10_1038_s41594_024_01423_2
crossref_primary_10_5586_asbp_172516
crossref_primary_10_1038_s41598_025_85757_x
crossref_primary_10_1016_j_fochms_2024_100236
crossref_primary_10_1038_s41598_023_42600_5
crossref_primary_10_1093_bioinformatics_btae744
crossref_primary_10_1016_j_jtha_2024_12_030
crossref_primary_10_1038_s41598_024_78270_0
crossref_primary_10_1186_s13059_023_02863_7
crossref_primary_10_1111_age_13332
crossref_primary_10_1128_spectrum_02082_24
crossref_primary_10_1002_ana_27155
crossref_primary_10_1038_s41467_024_44997_7
crossref_primary_10_1093_clinchem_hvad108
crossref_primary_10_1186_s40168_024_02026_1
crossref_primary_10_3389_fgene_2024_1435087
crossref_primary_10_3389_freae_2024_1362926
crossref_primary_10_1101_gr_278730_123
crossref_primary_10_1016_j_plabm_2024_e00423
crossref_primary_10_1016_j_tig_2024_07_001
crossref_primary_10_1038_s41467_023_39784_9
crossref_primary_10_1038_s41594_025_01512_w
crossref_primary_10_1016_j_future_2024_03_050
crossref_primary_10_1139_gen_2024_0121
crossref_primary_10_1101_gr_279364_124
crossref_primary_10_3390_v15020522
crossref_primary_10_1038_s41431_024_01649_0
crossref_primary_10_7554_eLife_98300_3
crossref_primary_10_1007_s10142_025_01534_z
crossref_primary_10_1038_s41467_024_49588_0
crossref_primary_10_1186_s12864_024_11172_7
crossref_primary_10_1093_hmg_ddae111
crossref_primary_10_1101_gr_278070_123
crossref_primary_10_1016_j_lanwpc_2025_101473
crossref_primary_10_1155_humu_6657400
crossref_primary_10_1016_j_bcp_2025_116874
crossref_primary_10_1038_s41586_023_06425_6
crossref_primary_10_1038_s41576_023_00590_0
crossref_primary_10_1093_bioinformatics_btae539
crossref_primary_10_1038_s41467_024_51252_6
crossref_primary_10_1002_acn3_70008
crossref_primary_10_1038_s41431_025_01817_w
crossref_primary_10_1038_s41598_024_80068_z
crossref_primary_10_1093_molbev_msaf021
crossref_primary_10_1093_bib_bbae613
crossref_primary_10_1016_j_cub_2024_06_033
crossref_primary_10_1038_s41467_024_53260_y
crossref_primary_10_1038_s41525_024_00445_5
crossref_primary_10_1093_hr_uhae119
crossref_primary_10_1093_bioadv_vbad149
crossref_primary_10_3389_pore_2024_1611676
crossref_primary_10_3390_cancers16071275
crossref_primary_10_1093_bfgp_elae003
crossref_primary_10_1186_s13100_024_00320_1
crossref_primary_10_1093_g3journal_jkaf021
crossref_primary_10_1093_nsr_nwae335
crossref_primary_10_1101_gr_279273_124
crossref_primary_10_1128_spectrum_03584_23
crossref_primary_10_1186_s13073_024_01419_z
crossref_primary_10_1371_journal_pcbi_1010905
crossref_primary_10_1093_gigascience_giaf018
ContentType Journal Article
Copyright 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.
Copyright Nature Publishing Group Dec 2022
Copyright_xml – notice: 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.
– notice: Copyright Nature Publishing Group Dec 2022
DBID NPM
8FE
8FG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOI 10.1038/s43588-022-00387-x
DatabaseName PubMed
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database (ProQuest)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle PubMed
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList PubMed
Advanced Technologies & Aerospace Collection
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2662-8457
EndPage 803
ExternalDocumentID 38177392
Genre Journal Article
GrantInformation_xml – fundername: Research Grants Council, University Grants Committee (RGC, UGC)
  grantid: TRS T21-705/20-N
GroupedDBID 0R~
AARCD
AAYZH
ABJNI
ACBWK
AFANA
AFKRA
AFSHS
AFWHJ
AGHDO
AIBTJ
ALMA_UNASSIGNED_HOLDINGS
ATHPR
BGLVJ
CCPQU
K7-
NFIDA
NPM
ODYON
PHGZM
PHGZT
PQGLB
RNT
SNYQT
SOJ
8FE
8FG
ARAPS
AZQEC
BENPR
DWQXO
GNUQQ
HCIFZ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c331t-d165ecb341c53b904305083def75b3f6169305b3743dd0d0d10b48b4e4c11a603
IEDL.DBID 8FG
IngestDate Sat Aug 23 12:44:10 EDT 2025
Mon Jul 21 06:02:39 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c331t-d165ecb341c53b904305083def75b3f6169305b3743dd0d0d10b48b4e4c11a603
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8560-3999
0000-0002-6546-2324
0000-0001-9711-6533
PMID 38177392
PQID 3225996526
PQPubID 7343593
PageCount 7
ParticipantIDs proquest_journals_3225996526
pubmed_primary_38177392
PublicationCentury 2000
PublicationDate 2022-12-01
PublicationDateYYYYMMDD 2022-12-01
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle Nature Computational Science
PublicationTitleAlternate Nat Comput Sci
PublicationYear 2022
Publisher Nature Publishing Group
Publisher_xml – name: Nature Publishing Group
SSID ssj0002513063
Score 2.5833082
Snippet Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads....
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 797
SubjectTerms Algorithms
Alignment
Deep learning
Genomes
Neural networks
Nucleotides
Title Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
URI https://www.ncbi.nlm.nih.gov/pubmed/38177392
https://www.proquest.com/docview/3225996526
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwELagXVgQ5VkolQcGGKwmtZ3HhAC1VEitKqBStyh-pEKCNJAWAb-eu9SFCZTFtpTl7uzv8935jpCzKOQaUFqywHohEyLNWCythqkGPDE69lO8KA5HwWAi7qZy6hxupUurXJ-J1UFt5hp95B00PODmshtcFq8Mu0ZhdNW10NgkdR-QBi086t_--FgAu4ERc_dWxuNRpwR2AKaBKewYEwvZx9_MskKY_g7ZdtSQXq102SAbNt8lDbf5SnruKkRf7JHpw-cL5pQ_fQHw0AI29rKgaW4oOtMZMOtZFeOnQEipsbagrjfEjCFoGfo8hyGwRUPf4aoMsqWgKXyXvk8m_d7jzYC5FglMc-4vmPEDkKwCKNKSq7gq4AWkytgslIpnAZZa8WAEPMEYDz7fUyJSwgrt-2ng8QNSy-e5PSLUdkWWAf1SqUyFSXlkItHVUQjLntSxapLWWlCJs_My-dVKkxyuhJcUqzIZCdb-C4F-Hf__4wnZ6qI-qhSRFqkt3pb2FIB-odqVNtukft0bje9hNhoPvwHVMqoy
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwED6VMsCCKM9CAQ8gwWA1iZ3XgBACSksfC63ULSS2UyFBG0gLlB_Fb-ScpjDBVmVxIiWKfJ_vvjuf7wCOPZcJtNI2dZThUs7DmPq2Engr0J5I4ZuhdhTbHafe43d9u1-Ar_lZGJ1WOdeJmaKWI6Fj5FUNPOTmtuVcJC9Ud43Su6vzFhozWDTV9B1dtvS8cY3yPbGs2k33qk7zrgJUMGaOqTQd_JkItbewWeRnNa-Qh0gVu3bEYkdXJzFwhKZVSgMv04i4F3HFhWmGjsHwu0uwzBnzdQqhV7v9iekgV0AGzvKzOQbzqimyEYSiTpnXe3Au_fibyWYWrbYOazkVJZcz7JSgoIYbUMoXe0pO84rUZ5vQv58-6xz2x080dCRBRTJJSDiURAfvKTL5QZZTQJAAE6lUQvJeFAOqjaQkTyMcIjuV5A1dc5QlQWToc_Bb0FvI5G1DcTgaql0gyuJxjHQvCu2Qy5B50uOW8Fx8bNjCj8pQmU9UkK-rNPhFQRl2ZpMXJLOyHIGuNegi3dv7_8UjWKl3262g1eg092HV0rLJ0lMqUBy_TtQBkoxxdJhJlsDDoqH0Da-u4sQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Symphonizing+pileup+and+full-alignment+for+deep+learning-based+long-read+variant+calling&rft.jtitle=Nature+Computational+Science&rft.au=Zheng%2C+Zhenxian&rft.au=Li%2C+Shumin&rft.au=Su%2C+Junhao&rft.au=Leung%2C+Amy+Wing-Sze&rft.date=2022-12-01&rft.pub=Nature+Publishing+Group&rft.eissn=2662-8457&rft.volume=2&rft.issue=12&rft.spage=797&rft.epage=803&rft_id=info:doi/10.1038%2Fs43588-022-00387-x