LRTK: A unified and versatile toolkit for analyzing linked-read sequencing data

Linked-read sequencing technologies offering reads with both high base quality and long-range DNA connectedness have shown great success in genomic studies. The mainstream platforms include 10x Genomics linked-read (10x), Single Tube Long Fragment Read (stLFR) and Transposase Enzyme-Linked Long-read...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Yang, Chao, Zhang, Zhenmiao, Liao, Herui, Zhang, Lu
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 13.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Linked-read sequencing technologies offering reads with both high base quality and long-range DNA connectedness have shown great success in genomic studies. The mainstream platforms include 10x Genomics linked-read (10x), Single Tube Long Fragment Read (stLFR) and Transposase Enzyme-Linked Long-read Sequencing (TELL-Seq). The existing data analysis pipelines, e.g., Long Ranger, have been developed to process sequencing data from particular platforms and so are unable to fully utilize the unique characteristics of other platforms; thus, users have limited tools to choose for downstream analysis. To address these limitations, we present Linked-Read ToolKit (LRTK), a unified and versatile toolkit to process linked-read sequencing data from different platforms. LRTK provides flexible functions to perform data simulation, format conversion, data preprocessing, barcode-aware read alignment, variant calling and phasing. It also allows multi-sample batch processing and generates a HTML report with key statistics and plots. We applied LRTK to the linked-read data of NA24385 obtained from all three platforms, where the results showed the advancement of LRTK in structural variation recall rate for 10x linked-reads and in increasing phase block N50 for 10x and stLFR linked-reads. Availability: Source codes are available at https://github.com/ericcombiolab/LRTK. Anaconda supports the installation of LRTK and its dependencies. Contact: ericluzhang@hkbu.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online. Competing Interest Statement The authors have declared no competing interest.
AbstractList Linked-read sequencing technologies offering reads with both high base quality and long-range DNA connectedness have shown great success in genomic studies. The mainstream platforms include 10x Genomics linked-read (10x), Single Tube Long Fragment Read (stLFR) and Transposase Enzyme-Linked Long-read Sequencing (TELL-Seq). The existing data analysis pipelines, e.g., Long Ranger, have been developed to process sequencing data from particular platforms and so are unable to fully utilize the unique characteristics of other platforms; thus, users have limited tools to choose for downstream analysis. To address these limitations, we present Linked-Read ToolKit (LRTK), a unified and versatile toolkit to process linked-read sequencing data from different platforms. LRTK provides flexible functions to perform data simulation, format conversion, data preprocessing, barcode-aware read alignment, variant calling and phasing. It also allows multi-sample batch processing and generates a HTML report with key statistics and plots. We applied LRTK to the linked-read data of NA24385 obtained from all three platforms, where the results showed the advancement of LRTK in structural variation recall rate for 10x linked-reads and in increasing phase block N50 for 10x and stLFR linked-reads. Availability: Source codes are available at https://github.com/ericcombiolab/LRTK. Anaconda supports the installation of LRTK and its dependencies. Contact: ericluzhang@hkbu.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online. Competing Interest Statement The authors have declared no competing interest.
Author Liao, Herui
Zhang, Lu
Zhang, Zhenmiao
Yang, Chao
Author_xml – sequence: 1
  givenname: Chao
  surname: Yang
  fullname: Yang, Chao
– sequence: 2
  givenname: Zhenmiao
  surname: Zhang
  fullname: Zhang, Zhenmiao
– sequence: 3
  givenname: Herui
  surname: Liao
  fullname: Liao, Herui
– sequence: 4
  givenname: Lu
  surname: Zhang
  fullname: Zhang, Lu
BookMark eNqNjMsKwjAURLPQha8PcHfBdWuS2lrciSiCgiDdl2BuJTYkmqSCfr0R_ABXw8w5zJD0jDVIyJTRlDHK5pxyntIyjT2n2SIvB-R0PFeHFayhM6pRKEEYCU90XgSlEYK1ulUBGusiEfr1VuYKWpkWZeJQSPD46NBcvrMUQYxJvxHa4-SXIzLbbavNPrk7G0Uf6pvtXHzyNV9SVjBWZDT7z_oA82k_hA
ContentType Paper
Copyright 2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FH
AAFGM
AAMXL
ABOIG
ABUWG
ADZZV
AFKRA
AFLLJ
AFOLM
AGAJT
AQTIP
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PIMPY
PQCXX
PQEST
PQQKQ
PQUKI
PRINS
DOI 10.1101/2022.08.10.503458
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central Korea - hybrid linking
Natural Science Collection - hybrid linking
Biological Science Collection - hybrid linking
ProQuest Central (Alumni)
ProQuest Central (Alumni) - hybrid linking
ProQuest Central
SciTech Premium Collection - hybrid linking
ProQuest Central Student - hybrid linking
ProQuest Central Essentials - hybrid linking
ProQuest Women's & Gender Studies - hybrid linking
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
Biological Sciences
Biological Science Database
Publicly Available Content (ProQuest)
ProQuest Central - hybrid linking
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest One Academic
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-proquest_journals_27016116303
IEDL.DBID BENPR
IngestDate Thu Oct 10 18:19:11 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_27016116303
OpenAccessLink https://www.proquest.com/docview/2701611630?pq-origsite=%requestingapplication%
PQID 2701611630
PQPubID 2050091
ParticipantIDs proquest_journals_2701611630
PublicationCentury 2000
PublicationDate 20220813
PublicationDateYYYYMMDD 2022-08-13
PublicationDate_xml – month: 08
  year: 2022
  text: 20220813
  day: 13
PublicationDecade 2020
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2022
Publisher Cold Spring Harbor Laboratory Press
Publisher_xml – name: Cold Spring Harbor Laboratory Press
Score 3.410634
Snippet Linked-read sequencing technologies offering reads with both high base quality and long-range DNA connectedness have shown great success in genomic studies....
SourceID proquest
SourceType Aggregation Database
SubjectTerms Batch processing
Bioinformatics
Data processing
Genomics
Statistical analysis
Transposase
Title LRTK: A unified and versatile toolkit for analyzing linked-read sequencing data
URI https://www.proquest.com/docview/2701611630
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB509-LNouKjSkCvwex740WqtBSVWkqF3srmsVBcurXdHvTXO7OkeBB6TAYSMkxmvnkwA3Cn8xK_mBVcllrx2OaWI05WPMujQBWhNFa13T5H6fAjfpklMxdw27iyyp1ObBW1qTXFyO_DjMAJogfxuPriNDWKsqtuhMYh-CF6CsID_6k_Gk9c-hLFjZz7kJp04joRUZzk_5Rua0kGx-CPi5Vdd-DALk_g_W0yfX1gPbZdLkoEgwz9ekaFEsivyrKmrqvPRcMQWCKlqL5_0NAwyrlawxHtGeYqoWmbaj1P4XbQnz4P-e7uuZOVzfzvZdEZeOj023NgqUboHpSFov59SuYyk0IboYsyDNLUyAvo7jvpcj_5Co6ILxQbDaIueM16a6_RuDbqxnHwF23vgH4
link.rule.ids 783,787,21400,27937,33756,43817
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEB60OehNUfFRdUGvi3k_vIhKS7QxlhKht5DNbqAYmtqmB_31zoQtHoQedwd22WF25psHMwC3ZVjhF1Mmj6pScFeFiiNOFjwIHUsUdiSV6Lp9pn784b5OvakOuK10WeVGJ3aKWjYlxcjv7IDACaIH82HxxWlqFGVX9QiNXTCoVRU6X8bTIB1PdPoSxY2ce5uadOLaMx3XC_8p3c6SDA_AGBcLtTyEHTU_gvdkko3u2SNbz2cVgkGGfj2jQgnkV61Y2zT156xlCCyRUtTfP2hoGOVcleSI9iTTldC0TbWex3AzHGTPMd_cnWtZWeV_L3NOoIdOvzoF5pcI3a2qENS_T0RhFERmKc2yqGzL92V0Bv1tJ51vJ1_DXpy9JXnyko4uYJ94RHFSy-lDr12u1SUa2lZcaW7-Aqjqg3g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LRTK%3A+A+unified+and+versatile+toolkit+for+analyzing+linked-read+sequencing+data&rft.jtitle=bioRxiv&rft.au=Yang%2C+Chao&rft.au=Zhang%2C+Zhenmiao&rft.au=Liao%2C+Herui&rft.au=Zhang%2C+Lu&rft.date=2022-08-13&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft_id=info:doi/10.1101%2F2022.08.10.503458