RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for...

Full description

Saved in:
Bibliographic Details
Published inJournal of cheminformatics Vol. 9; no. 1; p. 34
Main Authors Kaspi, Omer, Yosipof, Abraham, Senderowitz, Hanoch
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 06.06.2017
BioMed Central Ltd
Springer Nature B.V
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a “one stop shop” algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For “future” predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1758-2946
1758-2946
DOI:10.1186/s13321-017-0224-0