K -mer-based machine learning method to classify LTR-retrotransposons in plant genomes

Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR ret...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ (San Francisco, CA) Vol. 9; p. e11456
Main Authors	Orozco-Arias, Simon, Candamil-Cortés, Mariana S, Jaimes, Paula A, Piña, Johan S, Tabares-Soto, Reinel, Guyot, Romain, Isaza, Gustavo
Format	Journal Article
Language	English
Published	United States PeerJ, Inc 19.05.2021 PeerJ Inc
Subjects	Algorithms Bioinformatics Classification Computational Science Data Mining and Machine Learning Data Science Datasets Discriminant analysis Flowers & plants Free-alignment approach Genomes Genomics Learning algorithms LTR retrotransposons Machine learning Neural networks Plant genomes Plant Science Principal components analysis Transposable elements LTR retrotransposons Classification Machine learning Plant genomes k-mer based method Free-alignment approach Transposable elements
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on -mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2167-8359 2167-8359
DOI:	10.7717/peerj.11456