Comparison of penalty functions for sparse canonical correlation analysis

Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropri...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics & data analysis Vol. 56; no. 2; pp. 245 - 254
Main Authors Chalise, Prabhakar, Fridley, Brooke L.
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.02.2012
Elsevier
SeriesComputational Statistics & Data Analysis
Subjects
Online AccessGet full text
ISSN0167-9473
1872-7352
DOI10.1016/j.csda.2011.07.012

Cover

Loading…
More Information
Summary:Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropriate. In addition, when the variables are highly correlated, the sample covariance matrices become unstable or undefined. To overcome these two issues, sparse canonical correlation analysis (SCCA) for multiple data sets has been proposed using a Lasso type of penalty. However, these methods do not have direct control over the sparsity of the solution. An additional step that uses a Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant features. In this paper, a comparison of four penalty functions (Lasso, Elastic-net, smoothly clipped absolute deviation (SCAD), and Hard-threshold) for SCCA with and without the BIC filtering step have been carried out using both real and simulated genotypic and mRNA expression data. This study indicates that the SCAD penalty with a BIC filter would be a preferable penalty function for application of SCCA to genomic data. ► We compared various penalty functions for sparse canonical correlation analysis (SCCA). ► We also made an assessment using an additional Bayesian Information Criterion (BIC). ► We found the SCAD penalty with BIC filter preferable for SCCA of genomic data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0167-9473
1872-7352
DOI:10.1016/j.csda.2011.07.012