Combination of kernel PCA and linear support vector machine for modeling a nonlinear relationship between bioactivity and molecular descriptors

In this paper, a two‐step nonlinear classification algorithm is proposed to model the structure–activity relationship (SAR) between bioactivities and molecular descriptors of compounds, which consists of kernel principal component analysis (KPCA) and linear support vector machines (KPCA + LSVM). KPC...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemometrics Vol. 25; no. 2; pp. 92 - 99
Main Authors Fu, Guang-Hui, Cao, Dong-Sheng, Xu, Qing-Song, Li, Hong-Dong, Liang, Yi-Zeng
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Ltd 01.02.2011
Wiley
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN0886-9383
1099-128X
1099-128X
DOI10.1002/cem.1364

Cover

Loading…
More Information
Summary:In this paper, a two‐step nonlinear classification algorithm is proposed to model the structure–activity relationship (SAR) between bioactivities and molecular descriptors of compounds, which consists of kernel principal component analysis (KPCA) and linear support vector machines (KPCA + LSVM). KPCA is used to remove some uninformative gradients such as noises and then exactly capture the latent structure of the training dataset using some new variables called the principal components in the kernel‐defined feature space. LSVM makes full use of the maximal margin hyperplane to give the best generalization performance in the KPCA‐transformed space. The combination of KPCA and LSVM can effectively improve the prediction performance compared with the linear SVM as well as two nonlinear methods. Three datasets related to different categorical bioactivities of compounds are used to evaluate the performance of KPCA + LSVM. The results show that our algorithm is competitive. Copyright © 2011 John Wiley & Sons, Ltd. In this paper, a two‐step nonlinear classification algorithm is proposed to model the structure‐activity relationship (SAR) between bioactivities and molecular descriptors of compounds, which consists of kernel principal component analysis (KPCA) and linear support vector machines (KPCA+ LSVM). The combination of KPCA and LSVM can effectively improve the prediction performance compared with the linear SVM as well as two nonlinear methods. Three datasets related to different categorical bioactivities of compounds are used to evaluate the performance of KPCA+LSVM. The results show that our algorithm is competitive.
Bibliography:ark:/67375/WNG-0HKV7CV3-X
istex:08D606403290E3F1CED58B36EFE825803936F5FD
ArticleID:CEM1364
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0886-9383
1099-128X
1099-128X
DOI:10.1002/cem.1364