Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors

Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no compre...

Full description

Saved in:
Bibliographic Details
Published inPeerJ (San Francisco, CA) Vol. 12
Main Authors Rohmer, Coralie, Touzet, Hélène, Limasset, Antoine
Format Journal Article
LanguageEnglish
Published PeerJ 20.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no comprehensive comparative assessment of existing tools has been undertaken to date. In this context, we have developed an automatic pipeline, named MSA Limit, designed to facilitate the execution and evaluation of diverse MSA methods across a spectrum of conditions representative of TGS reads. MSA Limit offers insights into alignment accuracy, time efficiency, and memory utilization. It serves as a valuable resource for both users and developers, aiding in the assessment of algorithmic performance and assisting users in selecting the most appropriate tool for their specific experimental settings. Through a series of experiments using real and simulated data, we demonstrate the value of such exploration. Our findings reveal that in certain scenarios, popular methods may not consistently exhibit optimal efficiency and that the choice of the most effective method varies depending on factors such as sequencing depth, genome characteristics, and read error patterns. MSA Limit is an open source and freely available tool. All code and data pertaining to it and this manuscript are available at https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit .
AbstractList Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no comprehensive comparative assessment of existing tools has been undertaken to date. In this context, we have developed an automatic pipeline, named MSA Limit, designed to facilitate the execution and evaluation of diverse MSA methods across a spectrum of conditions representative of TGS reads. MSA Limit offers insights into alignment accuracy, time efficiency, and memory utilization. It serves as a valuable resource for both users and developers, aiding in the assessment of algorithmic performance and assisting users in selecting the most appropriate tool for their specific experimental settings. Through a series of experiments using real and simulated data, we demonstrate the value of such exploration. Our findings reveal that in certain scenarios, popular methods may not consistently exhibit optimal efficiency and that the choice of the most effective method varies depending on factors such as sequencing depth, genome characteristics, and read error patterns. MSA Limit is an open source and freely available tool. All code and data pertaining to it and this manuscript are available at https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit .
Author Touzet, Hélène
Limasset, Antoine
Rohmer, Coralie
Author_xml – sequence: 1
  givenname: Coralie
  surname: Rohmer
  fullname: Rohmer, Coralie
  organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
– sequence: 2
  givenname: Hélène
  orcidid: 0000-0001-5305-9987
  surname: Touzet
  fullname: Touzet, Hélène
  organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
– sequence: 3
  givenname: Antoine
  orcidid: 0000-0002-0669-4141
  surname: Limasset
  fullname: Limasset, Antoine
  organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
BackLink https://hal.science/hal-04798144$$DView record in HAL
BookMark eNqVjDFvwjAUhC0EElCY-gfe2gGISYrDiCoQAyN79NQ8YiP7OdgOEv--VDCwcsudTt_dWPTZMwnxKbO5UlItWqJwnkulctkTo6VcqVmZf6_7L3kopjGes7vK5Sor85HQmy55h4lqoCvaDpPxDP4ErrPJtJYg0qUj_iVAaxp2xAkcJe3rCMmDRq7vUNIm1NAQU3g8PFeGG6AQfIgTMTihjTR9-of42m2PP_uZRlu1wTgMt8qjqfabQ_XfZYVal7IorjJ_h_0DcD1Vyw
ContentType Journal Article
Copyright Attribution
Copyright_xml – notice: Attribution
DBID 1XC
VOOES
DOI 10.7717/peerj.17731
DatabaseName Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Computer Science
EISSN 2167-8359
ExternalDocumentID oai_HAL_hal_04798144v1
GroupedDBID 1XC
53G
5VS
88I
8FE
8FH
AAFWJ
ABUWG
ADBBV
ADRAZ
AENEX
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
AZQEC
BAWUL
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
CCPQU
DIK
DWQXO
ECGQY
GNUQQ
GROUPED_DOAJ
GX1
HCIFZ
HYE
IAO
IEA
IHR
IHW
ITC
KQ8
LK8
M2P
M48
M7P
M~E
OK1
PHGZM
PHGZT
PIMPY
PQQKQ
PROAC
RPM
VOOES
W2D
YAO
ID FETCH-hal_primary_oai_HAL_hal_04798144v13
IEDL.DBID M48
ISSN 2167-8359
IngestDate Fri May 09 12:28:32 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Subjects Bioinformatics
Oxford nanopore
Multiple sequence alignment
Sequencing errors
Pacific bioscience
Benchmark
Subjects Bioinformatics Computational Biology Long reads Multiple sequence alignment Sequencing errors Heterozygosity Pacific bioscience Oxford nanopore Benchmark
Computational Biology Long reads
Heterozygosity
Language English
License Attribution: http://creativecommons.org/licenses/by
LinkModel DirectLink
MergedId FETCHMERGED-hal_primary_oai_HAL_hal_04798144v13
ORCID 0000-0001-5305-9987
0000-0002-0669-4141
0000-0001-5305-9987
0000-0002-0669-4141
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.7717/peerj.17731
ParticipantIDs hal_primary_oai_HAL_hal_04798144v1
PublicationCentury 2000
PublicationDate 2024-09-20
PublicationDateYYYYMMDD 2024-09-20
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-20
  day: 20
PublicationDecade 2020
PublicationTitle PeerJ (San Francisco, CA)
PublicationYear 2024
Publisher PeerJ
Publisher_xml – name: PeerJ
SSID ssj0000826083
Score 4.65746
Snippet Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range...
SourceID hal
SourceType Open Access Repository
SubjectTerms Computer Science
Life Sciences
Title Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors
URI https://hal.science/hal-04798144
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH6MCeLFH1Px53iIFw-d6ZY27UmGOIc4Tw52G82aboKsM-1E_3tf0hRBBIWSQ0lCSHj5vpe8fA_gkih4wEIpPMkV93gklSfZLPWSOJTBjGf-LDHnkKOncDjmD5Ng0oA6GaebwOJX187kkxrr187H2-cNGTzx144gb-R6pZQ2kTnCvKfeIEgSxkJHjufbLZlINLOSnF2j802sI67e6v1sTwizqE9ULcIMdmHbUUPsV2u5Bw21bMFOnXYBnRW2YHPk7sP3YdFflzlxTpXit2w35hnWcYJYh0oj8e25vfnHKml0gWWOlcYClosXneLcKlDbHlwrgjVUWue6OICrwd3z7dCjcU9XlULF1GhGD_uPU_PPaMhH5Da9-71DaC7zpToC5EmU0ceVn0geJGnc46lkqeBSMJ6F0TFc_N3fyX8qncJWl3iACbHosjNolnqtzgnHS9m2_i-V9xO_bdfrC62FpaI
linkProvider Scholars Portal
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automated+evaluation+of+multiple+sequence+alignment+methods+to+handle+third+generation+sequencing+errors&rft.jtitle=PeerJ+%28San+Francisco%2C+CA%29&rft.au=Rohmer%2C+Coralie&rft.au=Touzet%2C+H%C3%A9l%C3%A8ne&rft.au=Limasset%2C+Antoine&rft.date=2024-09-20&rft.pub=PeerJ&rft.issn=2167-8359&rft.eissn=2167-8359&rft.volume=12&rft_id=info:doi/10.7717%2Fpeerj.17731&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai_HAL_hal_04798144v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-8359&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-8359&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-8359&client=summon