Sequence clustering in bioinformatics: an empirical study

Abstract Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to...

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 21; no. 1; pp. 1 - 10
Main Authors Zou, Quan, Lin, Gang, Jiang, Xingpeng, Liu, Xiangrong, Zeng, Xiangxiang
Format Journal Article
LanguageEnglish
Published England Oxford University Press 17.01.2020
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to cluster the sequence data using stable, quick and accurate methods. For microbiome sequencing data, 16S ribosomal RNA operational taxonomic units are typically used. However, there is often a gap between algorithm developers and bioinformatics users. Different software tools can produce diverse results and users can find them difficult to analyze. Understanding the different clustering mechanisms is crucial to understanding the results that they produce. In this review, we selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets. Our aim is to assist bioinformatics users in employing suitable clustering tools effectively to analyze big sequencing data. Related data, codes and software tools were accessible at the link http://lab.malab.cn/∼lg/clustering/.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1467-5463
1477-4054
1477-4054
DOI:10.1093/bib/bby090