Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors
Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no compre...
Saved in:
Published in | PeerJ (San Francisco, CA) Vol. 12 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
PeerJ
20.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no comprehensive comparative assessment of existing tools has been undertaken to date. In this context, we have developed an automatic pipeline, named MSA Limit, designed to facilitate the execution and evaluation of diverse MSA methods across a spectrum of conditions representative of TGS reads. MSA Limit offers insights into alignment accuracy, time efficiency, and memory utilization. It serves as a valuable resource for both users and developers, aiding in the assessment of algorithmic performance and assisting users in selecting the most appropriate tool for their specific experimental settings. Through a series of experiments using real and simulated data, we demonstrate the value of such exploration. Our findings reveal that in certain scenarios, popular methods may not consistently exhibit optimal efficiency and that the choice of the most effective method varies depending on factors such as sequencing depth, genome characteristics, and read error patterns. MSA Limit is an open source and freely available tool. All code and data pertaining to it and this manuscript are available at https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit . |
---|---|
AbstractList | Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range of MSA approaches available, a limited selection of implementations are commonly used in practice for this type of application, and no comprehensive comparative assessment of existing tools has been undertaken to date. In this context, we have developed an automatic pipeline, named MSA Limit, designed to facilitate the execution and evaluation of diverse MSA methods across a spectrum of conditions representative of TGS reads. MSA Limit offers insights into alignment accuracy, time efficiency, and memory utilization. It serves as a valuable resource for both users and developers, aiding in the assessment of algorithmic performance and assisting users in selecting the most appropriate tool for their specific experimental settings. Through a series of experiments using real and simulated data, we demonstrate the value of such exploration. Our findings reveal that in certain scenarios, popular methods may not consistently exhibit optimal efficiency and that the choice of the most effective method varies depending on factors such as sequencing depth, genome characteristics, and read error patterns. MSA Limit is an open source and freely available tool. All code and data pertaining to it and this manuscript are available at https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit . |
Author | Touzet, Hélène Limasset, Antoine Rohmer, Coralie |
Author_xml | – sequence: 1 givenname: Coralie surname: Rohmer fullname: Rohmer, Coralie organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 – sequence: 2 givenname: Hélène orcidid: 0000-0001-5305-9987 surname: Touzet fullname: Touzet, Hélène organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 – sequence: 3 givenname: Antoine orcidid: 0000-0002-0669-4141 surname: Limasset fullname: Limasset, Antoine organization: Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 |
BackLink | https://hal.science/hal-04798144$$DView record in HAL |
BookMark | eNqVjDFvwjAUhC0EElCY-gfe2gGISYrDiCoQAyN79NQ8YiP7OdgOEv--VDCwcsudTt_dWPTZMwnxKbO5UlItWqJwnkulctkTo6VcqVmZf6_7L3kopjGes7vK5Sor85HQmy55h4lqoCvaDpPxDP4ErrPJtJYg0qUj_iVAaxp2xAkcJe3rCMmDRq7vUNIm1NAQU3g8PFeGG6AQfIgTMTihjTR9-of42m2PP_uZRlu1wTgMt8qjqfabQ_XfZYVal7IorjJ_h_0DcD1Vyw |
ContentType | Journal Article |
Copyright | Attribution |
Copyright_xml | – notice: Attribution |
DBID | 1XC VOOES |
DOI | 10.7717/peerj.17731 |
DatabaseName | Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Computer Science |
EISSN | 2167-8359 |
ExternalDocumentID | oai_HAL_hal_04798144v1 |
GroupedDBID | 1XC 53G 5VS 88I 8FE 8FH AAFWJ ABUWG ADBBV ADRAZ AENEX AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS AZQEC BAWUL BBNVY BCNDV BENPR BHPHI BPHCQ CCPQU DIK DWQXO ECGQY GNUQQ GROUPED_DOAJ GX1 HCIFZ HYE IAO IEA IHR IHW ITC KQ8 LK8 M2P M48 M7P M~E OK1 PHGZM PHGZT PIMPY PQQKQ PROAC RPM VOOES W2D YAO |
ID | FETCH-hal_primary_oai_HAL_hal_04798144v13 |
IEDL.DBID | M48 |
ISSN | 2167-8359 |
IngestDate | Fri May 09 12:28:32 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Subjects Bioinformatics Oxford nanopore Multiple sequence alignment Sequencing errors Pacific bioscience Benchmark Subjects Bioinformatics Computational Biology Long reads Multiple sequence alignment Sequencing errors Heterozygosity Pacific bioscience Oxford nanopore Benchmark Computational Biology Long reads Heterozygosity |
Language | English |
License | Attribution: http://creativecommons.org/licenses/by |
LinkModel | DirectLink |
MergedId | FETCHMERGED-hal_primary_oai_HAL_hal_04798144v13 |
ORCID | 0000-0001-5305-9987 0000-0002-0669-4141 0000-0001-5305-9987 0000-0002-0669-4141 |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.7717/peerj.17731 |
ParticipantIDs | hal_primary_oai_HAL_hal_04798144v1 |
PublicationCentury | 2000 |
PublicationDate | 2024-09-20 |
PublicationDateYYYYMMDD | 2024-09-20 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-20 day: 20 |
PublicationDecade | 2020 |
PublicationTitle | PeerJ (San Francisco, CA) |
PublicationYear | 2024 |
Publisher | PeerJ |
Publisher_xml | – name: PeerJ |
SSID | ssj0000826083 |
Score | 4.65746 |
Snippet | Most third-generation sequencing (TGS) processing tools rely on multiple sequence alignment (MSA) methods to manage sequencing errors. Despite the broad range... |
SourceID | hal |
SourceType | Open Access Repository |
SubjectTerms | Computer Science Life Sciences |
Title | Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors |
URI | https://hal.science/hal-04798144 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH6MCeLFH1Px53iIFw-d6ZY27UmGOIc4Tw52G82aboKsM-1E_3tf0hRBBIWSQ0lCSHj5vpe8fA_gkih4wEIpPMkV93gklSfZLPWSOJTBjGf-LDHnkKOncDjmD5Ng0oA6GaebwOJX187kkxrr187H2-cNGTzx144gb-R6pZQ2kTnCvKfeIEgSxkJHjufbLZlINLOSnF2j802sI67e6v1sTwizqE9ULcIMdmHbUUPsV2u5Bw21bMFOnXYBnRW2YHPk7sP3YdFflzlxTpXit2w35hnWcYJYh0oj8e25vfnHKml0gWWOlcYClosXneLcKlDbHlwrgjVUWue6OICrwd3z7dCjcU9XlULF1GhGD_uPU_PPaMhH5Da9-71DaC7zpToC5EmU0ceVn0geJGnc46lkqeBSMJ6F0TFc_N3fyX8qncJWl3iACbHosjNolnqtzgnHS9m2_i-V9xO_bdfrC62FpaI |
linkProvider | Scholars Portal |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automated+evaluation+of+multiple+sequence+alignment+methods+to+handle+third+generation+sequencing+errors&rft.jtitle=PeerJ+%28San+Francisco%2C+CA%29&rft.au=Rohmer%2C+Coralie&rft.au=Touzet%2C+H%C3%A9l%C3%A8ne&rft.au=Limasset%2C+Antoine&rft.date=2024-09-20&rft.pub=PeerJ&rft.issn=2167-8359&rft.eissn=2167-8359&rft.volume=12&rft_id=info:doi/10.7717%2Fpeerj.17731&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai_HAL_hal_04798144v1 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2167-8359&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2167-8359&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2167-8359&client=summon |