Building Highly Reliable Quantitative Structure–Activity Relationship Classification Models Using the Rivality Index Neighborhood Algorithm with Feature Selection

Dimensionality reduction of the data set representation for the construction of the quantitative structure–activity relationship classification models is an important research subject for the interpretability of the models and the computational cost efficiency of the classification algorithms. Featu...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 60; no. 1; pp. 133 - 151
Main Authors	Ruiz, Irene Luque, Gómez-Nieto, Miguel Ángel
Format	Journal Article
Language	English
Published	United States American Chemical Society 27.01.2020
Subjects	Algorithms Classification Machine Learning Model accuracy Models, Molecular Quantitative Structure-Activity Relationship Reproducibility of Results Selectivity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Dimensionality reduction of the data set representation for the construction of the quantitative structure–activity relationship classification models is an important research subject for the interpretability of the models and the computational cost efficiency of the classification algorithms. Feature selection techniques are appropriate as only a short number of relevant features should be used in the classification process because irrelevant and redundant features should be discarded, the same as the noninterpretable ones. In this paper, we propose an embedded feature selection technique for the construction of classification models using the rivality index neighborhood (RINH) algorithm. This technique uses a filter selection in the preprocessing stage considering the selectivity of the features as a selection criterion and a wrapper technique in the processing stage based on the improvement of the accuracy and reliability of the models generated using the RINH algorithm with LTN and GTN functions. The results obtained using the RINH algorithm with and without the selection of features and compared with those results obtained using 14 machine learning algorithms have demonstrated that the feature selection technique proposed in this paper is capable of clearly building more accurate and reliable models, reducing the data dimensionality around 90%, and generating high robust and interpretable models.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1549-9596 1549-960X 1549-960X
DOI:	10.1021/acs.jcim.9b00706