Analysis of Metabolomic Data Using Support Vector Machines

Metabolomics is an emerging field providing insight into physiological processes. It is an effective tool to investigate disease diagnosis or conduct toxicological studies by observing changes in metabolite concentrations in various biofluids. Multivariate statistical analysis is generally employed...

Full description

Saved in:
Bibliographic Details
Published inAnalytical chemistry (Washington) Vol. 80; no. 19; pp. 7562 - 7570
Main Authors Mahadevan, Sankar, Shah, Sirish L, Marrie, Thomas J, Slupsky, Carolyn M
Format Journal Article
LanguageEnglish
Published Washington, DC American Chemical Society 01.10.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Metabolomics is an emerging field providing insight into physiological processes. It is an effective tool to investigate disease diagnosis or conduct toxicological studies by observing changes in metabolite concentrations in various biofluids. Multivariate statistical analysis is generally employed with nuclear magnetic resonance (NMR) or mass spectrometry (MS) data to determine differences between groups (for instance diseased vs healthy). Characteristic predictive models may be built based on a set of training data, and these models are subsequently used to predict whether new test data falls under a specific class. In this study, metabolomic data is obtained by doing a 1H NMR spectroscopy on urine samples obtained from healthy subjects (male and female) and patients suffering from Streptococcus pneumoniae. We compare the performance of traditional PLS-DA multivariate analysis to support vector machines (SVMs), a technique widely used in genome studies on two case studies: (1) a case where nearly complete distinction may be seen (healthy versus pneumonia) and (2) a case where distinction is more ambiguous (male versus female). We show that SVMs are superior to PLS-DA in both cases in terms of predictive accuracy with the least number of features. With fewer number of features, SVMs are able to give better predictive model when compared to that of PLS-DA.
Bibliography:Details regarding the kernel and the parameters used for SVM-RFE; the Matlab codes that were used to generate the results; summary of O-PLSDA results and details of age matching statistical analysis. This material is available free of charge via the Internet at http://pubs.acs.org.
ark:/67375/TPS-KRBM95HS-L
istex:6B8BC72F44055EE940617D6DD91E974294E2D507
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0003-2700
1520-6882
1520-6882
DOI:10.1021/ac800954c