WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis
Metabolomics provides new insights into disease pathogenesis and biomarker discovery. Samples from large-scale untargeted metabolomics studies are typically analyzed using a liquid chromatography-mass spectrometry platform in several batches. Batch effects that are caused by non-biological systemati...
Saved in:
Published in | Analytica chimica acta Vol. 1061; pp. 60 - 69 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
11.07.2019
Elsevier BV |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Metabolomics provides new insights into disease pathogenesis and biomarker discovery. Samples from large-scale untargeted metabolomics studies are typically analyzed using a liquid chromatography-mass spectrometry platform in several batches. Batch effects that are caused by non-biological systematic biases are unavoidable in large-scale metabolomics studies, even with properly designed experiments. The statistical analysis of large-scale metabolomics data without managing batch effects will yield misleading results. In this study, we propose a novel algorithm, called WaveICA, which is based on the wavelet transform method with independent component analysis, as the threshold processing method to capture and remove batch effects for large-scale metabolomics data. The WaveICA method uses the time trend of samples over the injection order, decomposes the original data into multi-scale data with different features, extracts and removes the batch effect information in multi-scale data, and obtains clean data. The WaveICA method was tested on real metabolomics data. After applying the WaveICA method, scattered quality control samples (QCS) and subject samples in a PCA score plot of the original data were closely clustered, respectively. The average Pearson correlation coefficients for all peaks of the QCS increased from 0.872 to 0.972. Additionally, WaveICA significantly improved the classification accuracy for metabolomics data. The method was compared with three representative methods, and outperformed all of them. To conclude, WaveICA can efficiently remove batch effects while revealing more biological information. This method can be used in large-scale untargeted metabolomics studies to preprocess raw metabolomics data.
[Display omitted]
•Proposing a novel method to remove batch effects for metabolomics data.•The proposed method could efficiently remove batch effects.•The proposed method could reveal more biological information.•The proposed method outperformed other representative methods.•Providing an R package to easily implement this method. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 0003-2670 1873-4324 1873-4324 |
DOI: | 10.1016/j.aca.2019.02.010 |