Tree based machine learning framework for predicting ground state energies of molecules

We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS). The PubChem chemical compound database has been incorporated to construct a dataset of 16 242 molecules, whose electronic ground state energi...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of chemical physics Vol. 145; no. 13; p. 134101
Main Author Himmetoglu, Burak
Format Journal Article
LanguageEnglish
Published United States 07.10.2016
Online AccessGet more information

Cover

Loading…
Abstract We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS). The PubChem chemical compound database has been incorporated to construct a dataset of 16 242 molecules, whose electronic ground state energies have been computed using density functional theory. This dataset is used to train the boosted regression tree algorithm, which allows a computationally efficient and accurate prediction of molecular ground state energies. Predictions from boosted regression trees are compared with neural network regression, a widely used method in the literature, and shown to be more accurate with significantly reduced computational cost. The performance of the regression model trained using the CHNOPS set is also tested on a set of distinct molecules that contain additional Cl and Si atoms. It is shown that the learning algorithms lead to a rich and diverse possibility of applications in molecular discovery and materials informatics.
AbstractList We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS). The PubChem chemical compound database has been incorporated to construct a dataset of 16 242 molecules, whose electronic ground state energies have been computed using density functional theory. This dataset is used to train the boosted regression tree algorithm, which allows a computationally efficient and accurate prediction of molecular ground state energies. Predictions from boosted regression trees are compared with neural network regression, a widely used method in the literature, and shown to be more accurate with significantly reduced computational cost. The performance of the regression model trained using the CHNOPS set is also tested on a set of distinct molecules that contain additional Cl and Si atoms. It is shown that the learning algorithms lead to a rich and diverse possibility of applications in molecular discovery and materials informatics.
Author Himmetoglu, Burak
Author_xml – sequence: 1
  givenname: Burak
  surname: Himmetoglu
  fullname: Himmetoglu, Burak
  organization: Center for Scientific Computing, University of California, Santa Barbara, California 93106, USA and Enterprise Technology Services, University of California, Santa Barbara, California 93106, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/27782427$$D View this record in MEDLINE/PubMed
BookMark eNo1j8tKxDAYRoMozkUXvoDkBTrm1iZZyuANBtwUXA65_KnVNilJi_j2KurqWxw4nG-DTmOKgNAVJTtKGn5Dd0I3gmh-gtaUKF3JRpMV2pTyRgihkolztGJSKiaYXKOXNgNgawp4PBr32kfAA5gc-9jhkM0IHym_45AynjL43s0_oMtpiR6X2cyAIULueig4BTymAdwyQLlAZ8EMBS7_dova-7t2_1gdnh-e9reHynHF56ppBK2tpZwpYr7jWBDUGieZJ85oAK9rUQsrQ6idUlrLYDU3Umpbe6FqtkXXv9ppsSP445T70eTP4_9B9gWVPlHc
CitedBy_id crossref_primary_10_1093_nar_gkac956
crossref_primary_10_1002_ange_201910283
crossref_primary_10_1016_j_carbon_2021_11_073
crossref_primary_10_1021_acs_jpcb_7b08707
crossref_primary_10_1016_j_knosys_2019_105326
crossref_primary_10_1021_acs_jpca_0c03926
crossref_primary_10_1039_D2CS00203E
crossref_primary_10_1021_acs_jctc_3c01252
crossref_primary_10_1088_1361_648X_ac3a85
crossref_primary_10_1021_acs_jctc_8b00788
crossref_primary_10_1103_PhysRevMaterials_3_063801
crossref_primary_10_3938_jkps_77_680
crossref_primary_10_1002_anie_201910283
crossref_primary_10_1088_1742_6596_2072_1_012005
crossref_primary_10_1103_PhysRevB_102_075409
crossref_primary_10_1021_acs_jpcc_8b03405
crossref_primary_10_1039_D0CP03694C
ContentType Journal Article
DBID NPM
DOI 10.1063/1.4964093
DatabaseName PubMed
DatabaseTitle PubMed
DatabaseTitleList PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Chemistry
Physics
EISSN 1089-7690
ExternalDocumentID 27782427
Genre Journal Article
GroupedDBID ---
-DZ
-ET
-~X
123
1UP
2-P
29K
4.4
53G
5VS
85S
AAAAW
AABDS
AAEUA
AAPUP
AAYIH
ABPPZ
ABRJW
ABZEH
ACBRY
ACLYJ
ACNCT
ACZLF
ADCTM
AEJMO
AENEX
AFATG
AFHCQ
AGKCL
AGLKD
AGMXG
AGTJO
AHSDT
AJJCW
AJQPL
ALEPV
ALMA_UNASSIGNED_HOLDINGS
AQWKA
ATXIE
AWQPM
BDMKI
BPZLN
CS3
D-I
DU5
EBS
EJD
ESX
F5P
FDOHQ
FFFMQ
HAM
M6X
M71
M73
N9A
NPM
NPSNA
O-B
P2P
RIP
RNS
RQS
TN5
TWZ
UPT
WH7
YQT
YZZ
~02
ID FETCH-LOGICAL-c383t-66415bb13280a1722f41bac72d0ca9eed95454b7ff5c88997fb93a779b5d4852
IngestDate Sat Sep 28 07:59:48 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 13
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c383t-66415bb13280a1722f41bac72d0ca9eed95454b7ff5c88997fb93a779b5d4852
PMID 27782427
ParticipantIDs pubmed_primary_27782427
PublicationCentury 2000
PublicationDate 2016-Oct-07
PublicationDateYYYYMMDD 2016-10-07
PublicationDate_xml – month: 10
  year: 2016
  text: 2016-Oct-07
  day: 07
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle The Journal of chemical physics
PublicationTitleAlternate J Chem Phys
PublicationYear 2016
SSID ssj0001724
Score 2.3804512
Snippet We present an application of the boosted regression tree algorithm for predicting ground state energies of molecules made up of C, H, N, O, P, and S (CHNOPS)....
SourceID pubmed
SourceType Index Database
StartPage 134101
Title Tree based machine learning framework for predicting ground state energies of molecules
URI https://www.ncbi.nlm.nih.gov/pubmed/27782427
Volume 145
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ3JTsMwEIYtFiG4ICj7Jh-4VSl1Ysf2EVWgComKQxHcUJzYFUsXVe2Fp2dsx2lVFgGXKI3bqpnPtX9PxjMInYNCgIktIxFRWkZUUB1JruAlV5wQI2OjrL_jtpO27-nNI3uclVV0u0smqpG_f7mv5D9U4Rpwtbtk_0C2-lK4AOfAF45AGI6_YzzWum7noaLed0GROlSB6NVNiLpygYSjsX0g40Kc7T4O6yy3KrNuk07bRBHuMbuvlFsGFb7MutGcaM1DfgHvEakEefu539eTYe9t6jrMdJy9zvsTSOoi0_ycp_0Y2BQy4qmv4lkNkj7pY-gNydyYZ1PCeYfEp-EY9I_1DDSoTGEdmcy_Byw56jsuMQeVQn2GgJ9bFzJjh6ZltMyFHeM61lNTzsIgzGjIJJUmF9VvsNmfy88trCScouhuoc3SqvjSc91GS3pQQ-utUIGvhtbuvJF30IMljR1pXJLGgTSuSGMgjWeksSeNHWkcSOOhwRXpXdS9vuq22lFZEiPKE5FMojQFwaUUSWLRzOAWY0OJynIeF808k6B3JChiqrgxLBewlOZGySTjXCpWUMHiPbQyGA70AcKkYEkBi02lGPAjTPCYqWaqYypIxhQ_RPveOE8jn_bkKZjt6NuWY7Qx61EnaNXA_0yfgmibqDNH5wMcXUH6
link.rule.ids 783
linkProvider National Library of Medicine
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Tree+based+machine+learning+framework+for+predicting+ground+state+energies+of+molecules&rft.jtitle=The+Journal+of+chemical+physics&rft.au=Himmetoglu%2C+Burak&rft.date=2016-10-07&rft.eissn=1089-7690&rft.volume=145&rft.issue=13&rft.spage=134101&rft_id=info:doi/10.1063%2F1.4964093&rft_id=info%3Apmid%2F27782427&rft_id=info%3Apmid%2F27782427&rft.externalDocID=27782427