MoleculeNet: A Benchmark for Molecular Machine Learning

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...

Full description

Saved in:

Bibliographic Details
Main Authors	Wu, Zhenqin, Ramsundar, Bharath, Feinberg, Evan N, Gomes, Joseph, Geniesse, Caleb, Pappu, Aneesh S, Leswing, Karl, Pande, Vijay
Format	Journal Article
Language	English
Published	01.03.2017
Subjects	Computer Science - Learning Physics - Chemical Physics Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
AbstractList	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
Author	Wu, Zhenqin Geniesse, Caleb Ramsundar, Bharath Leswing, Karl Feinberg, Evan N Pande, Vijay Pappu, Aneesh S Gomes, Joseph
Author_xml	– sequence: 1 givenname: Zhenqin surname: Wu fullname: Wu, Zhenqin – sequence: 2 givenname: Bharath surname: Ramsundar fullname: Ramsundar, Bharath – sequence: 3 givenname: Evan N surname: Feinberg fullname: Feinberg, Evan N – sequence: 4 givenname: Joseph surname: Gomes fullname: Gomes, Joseph – sequence: 5 givenname: Caleb surname: Geniesse fullname: Geniesse, Caleb – sequence: 6 givenname: Aneesh S surname: Pappu fullname: Pappu, Aneesh S – sequence: 7 givenname: Karl surname: Leswing fullname: Leswing, Karl – sequence: 8 givenname: Vijay surname: Pande fullname: Pande, Vijay
BackLink	https://doi.org/10.48550/arXiv.1703.00564$$DView paper in arXiv
BookMark	eNotj8tOwzAURL2giz74AFb1DyRcx8-yayteUoBN99F1fN1GBKcyD8HfE0o3MyONNDozYxdpSMTYlYBSOa3hGvN391UKC7IE0EZNmX0aemo_e3qmjxu-5htK7eEN8yuPQ-bnEseE7aFLxGvCnLq0X7BJxP6dLs8-Z7u72932oahf7h-367pAY1URA7mVQHAaQgCHGNH4ilpvtaxoRUI4AyIaNWrwqgrBefIyGrQQyTg5Z8v_2RN5c8zdyPbT_D1oTg_kL4SdQh4
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY EPD GOX
DOI	10.48550/arxiv.1703.00564
DatabaseName	arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	1703_00564
GroupedDBID	AKY EPD GOX
ID	FETCH-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:41:17 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683
OpenAccessLink	https://arxiv.org/abs/1703.00564
ParticipantIDs	arxiv_primary_1703_00564
PublicationCentury	2000
PublicationDate	2017-03-01
PublicationDateYYYYMMDD	2017-03-01
PublicationDate_xml	– month: 03 year: 2017 text: 2017-03-01 day: 01
PublicationDecade	2010
PublicationYear	2017
Score	1.6578251
SecondaryResourceType	preprint
Snippet	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Physics - Chemical Physics Statistics - Machine Learning
Title	MoleculeNet: A Benchmark for Molecular Machine Learning
URI	https://arxiv.org/abs/1703.00564
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1kxy8FttsmqbeVnFdhK6XFXork2aiIorUKv58p2kXvXgJIZkc8hKYN2TyBuDczwwmzplIcY8bVBFKSiLM2HfGHrtVXbbFSi8f1F2ZliMQm78w2Hw_f_X6wPbjIsmCAGmq1RjGUnYpW7f3Zf84GaS4BvtfO-aYYeiPk1jswPbA7sS8P45dGNHbHmRFX4KWVtReirm44ovx9IrNi2DCKIpNfVpRhLxGEoPk6eM-rBc36-tlNNQriFBnKvKOTJ5gpwHnXGwQPWorqbYcEUjKiak8Rz9eK26dVZIRshzKeo1Z7Emb2QFMOOSnKQh0NpHSW8-EgPGr8yTPahOTTb1xtTGHMA27rN57SYqqA6AKABz9P3UMW7JzSiGD6gQmbfNJp-xSW3sWcP0Br913PA
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MoleculeNet%3A+A+Benchmark+for+Molecular+Machine+Learning&rft.au=Wu%2C+Zhenqin&rft.au=Ramsundar%2C+Bharath&rft.au=Feinberg%2C+Evan+N&rft.au=Gomes%2C+Joseph&rft.date=2017-03-01&rft_id=info:doi/10.48550%2Farxiv.1703.00564&rft.externalDocID=1703_00564