Estimation of pKa for Druglike Compounds Using Semiempirical and Information-Based Descriptors

A pragmatic approach has been developed for the estimation of aqueous ionization constants (pKa) for druglike compounds. The method involves an algorithm that assigns ionization constants in a stepwise manner to the acidic and basic groups present in a compound. Predictions are made for each ionizab...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 47; no. 2; pp. 450 - 459
Main Authors JELFS, Stephen, ERTL, Peter, SELZER, Paul
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 01.03.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A pragmatic approach has been developed for the estimation of aqueous ionization constants (pKa) for druglike compounds. The method involves an algorithm that assigns ionization constants in a stepwise manner to the acidic and basic groups present in a compound. Predictions are made for each ionizable group using models derived from semiempirical quantum chemical properties and information-based descriptors. Semiempirical properties include the partial charge and electrophilic superdelocalizabilty of the atom(s) undergoing protonation or deprotonation. Importantly, the latter property has been extended to allow predictions to be made for multiprotic compounds, overcoming limitations of a previous approach described by Tehan et al. The information-based descriptions include molecular-tree structured fingerprints, based on the methodology outlined by Xing et al., with the addition of 2D substructure flags indicating the presence of other important structural features. These two classes of descriptor were found to complement one another particularly well, resulting in predictive models for a range of functional groups (including alcohols, amidines, amines, anilines, carboxylic acids, guanidines, imidazoles, imines, phenols, pyridines, and pyrimidines). A combined RMSE of 0.48 and 0.81 was obtained for the training set and an external test set compounds, respectively. The predictive models were based on compounds selected from the commercially available BioLoom database. The resultant speed and accuracy of the approach has also enabled the development of Web application on the Novartis intranet for pKa prediction.
Bibliography:istex:F1767F6377880435E10F1DC4590BA36EF3DD9A67
ark:/67375/TPS-Z327L8FV-5
ISSN:1549-9596
1549-960X
DOI:10.1021/ci600285n