Search and clustering orders of magnitude faster than BLAST

Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 26; no. 19; pp. 2460 - 2461
Main Author	Edgar, Robert C
Format	Journal Article
Language	English
Published	England Oxford University Press 01.10.2010
Subjects	Algorithms Cluster Analysis Computational Biology - methods Databases, Protein Proteins - chemistry Sequence Alignment - methods Sequence Analysis, Protein - methods
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: robert@drive5.com Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:	ark:/67375/HXZ-GJV3Q6N3-6 ArticleID:btq461 Associate Editor: Alex Bateman istex:E899EDEA476A0A93BC2365EEC6AA3B7EEEC99891 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btq461