MethylPCA: a toolkit to control for confounders in methylome-wide association studies

Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is t...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 14; no. 1; p. 74
Main Authors Chen, Wenan, Gao, Guimin, Nerella, Srilaxmi, Hultman, Christina M, Magnusson, Patrik KE, Sullivan, Patrick F, Aberg, Karolina A, van den Oord, Edwin JCG
Format Journal Article
LanguageEnglish
Published London BioMed Central 02.03.2013
BioMed Central Ltd
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/1471-2105-14-74

Cover

More Information
Summary:Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. Result We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders. Conclusions MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:1471-2105
1471-2105
DOI:10.1186/1471-2105-14-74