Clustering in non-parametric multivariate analyses

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships amon...

Full description

Saved in:
Bibliographic Details
Published inJournal of experimental marine biology and ecology Vol. 483; pp. 147 - 155
Main Authors Clarke, K. Robert, Somerfield, Paul J., Gorley, Raymond N.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.10.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum β in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach. •Dendrograms may be poor representations of inter-sample dissimilarities.•ANOSIM R and SIMPROF are combined to generate new methods of clustering.•UNCTREE is a binary divisive clustering algorithm.•k-R clustering is a flat clustering algorithm.•Robustness of clustering is assessed by applying different methods to example data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0022-0981
1879-1697
DOI:10.1016/j.jembe.2016.07.010