Evaluation of pre-processing on the meta-analysis of DNA methylation data from the Illumina HumanMethylation450 BeadChip platform

Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the...

Full description

Saved in:
Bibliographic Details
Published inPLOS ONE Vol. 15; no. 3; p. e0229763
Main Authors Sala, Claudia, Di Lena, Pietro, Fernandes Durso, Danielle, Prodi, Andrea, Castellani, Gastone, Nardini, Christine
Format Journal Article Publication
LanguageEnglish
Published United States Public Library of Science (PLoS) 10.03.2020
Public Library of Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability. To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis). The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
content type line 14
ObjectType-Feature-3
ObjectType-Evidence Based Healthcare-1
ObjectType-Undefined-1
content type line 23
Competing Interests: The authors have read the journal’s policy and the authors of this manuscript have the following competing interests: CN is a part-time employee of SOL Group and part-time associate to CNR-IAC. This does not alter our adherence to PLOS ONE policies on sharing data and materials. There are no patents, products in development or marketed products to declare.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0229763