MoleculeNet: A Benchmark for Molecular Machine Learning

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...

Full description

Saved in:
Bibliographic Details
Main Authors Wu, Zhenqin, Ramsundar, Bharath, Feinberg, Evan N, Gomes, Joseph, Geniesse, Caleb, Pappu, Aneesh S, Leswing, Karl, Pande, Vijay
Format Journal Article
LanguageEnglish
Published 01.03.2017
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
AbstractList Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
Author Wu, Zhenqin
Geniesse, Caleb
Ramsundar, Bharath
Leswing, Karl
Feinberg, Evan N
Pande, Vijay
Pappu, Aneesh S
Gomes, Joseph
Author_xml – sequence: 1
  givenname: Zhenqin
  surname: Wu
  fullname: Wu, Zhenqin
– sequence: 2
  givenname: Bharath
  surname: Ramsundar
  fullname: Ramsundar, Bharath
– sequence: 3
  givenname: Evan N
  surname: Feinberg
  fullname: Feinberg, Evan N
– sequence: 4
  givenname: Joseph
  surname: Gomes
  fullname: Gomes, Joseph
– sequence: 5
  givenname: Caleb
  surname: Geniesse
  fullname: Geniesse, Caleb
– sequence: 6
  givenname: Aneesh S
  surname: Pappu
  fullname: Pappu, Aneesh S
– sequence: 7
  givenname: Karl
  surname: Leswing
  fullname: Leswing, Karl
– sequence: 8
  givenname: Vijay
  surname: Pande
  fullname: Pande, Vijay
BackLink https://doi.org/10.48550/arXiv.1703.00564$$DView paper in arXiv
BookMark eNotj8tOwzAURL2giz74AFb1DyRcx8-yayteUoBN99F1fN1GBKcyD8HfE0o3MyONNDozYxdpSMTYlYBSOa3hGvN391UKC7IE0EZNmX0aemo_e3qmjxu-5htK7eEN8yuPQ-bnEseE7aFLxGvCnLq0X7BJxP6dLs8-Z7u72932oahf7h-367pAY1URA7mVQHAaQgCHGNH4ilpvtaxoRUI4AyIaNWrwqgrBefIyGrQQyTg5Z8v_2RN5c8zdyPbT_D1oTg_kL4SdQh4
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.1703.00564
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1703_00564
GroupedDBID AKY
EPD
GOX
ID FETCH-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683
IEDL.DBID GOX
IngestDate Mon Jan 08 05:41:17 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683
OpenAccessLink https://arxiv.org/abs/1703.00564
ParticipantIDs arxiv_primary_1703_00564
PublicationCentury 2000
PublicationDate 2017-03-01
PublicationDateYYYYMMDD 2017-03-01
PublicationDate_xml – month: 03
  year: 2017
  text: 2017-03-01
  day: 01
PublicationDecade 2010
PublicationYear 2017
Score 1.6578251
SecondaryResourceType preprint
Snippet Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Physics - Chemical Physics
Statistics - Machine Learning
Title MoleculeNet: A Benchmark for Molecular Machine Learning
URI https://arxiv.org/abs/1703.00564
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1kxy8FttsmqbeVnFdhK6XFXork2aiIorUKv58p2kXvXgJIZkc8hKYN2TyBuDczwwmzplIcY8bVBFKSiLM2HfGHrtVXbbFSi8f1F2ZliMQm78w2Hw_f_X6wPbjIsmCAGmq1RjGUnYpW7f3Zf84GaS4BvtfO-aYYeiPk1jswPbA7sS8P45dGNHbHmRFX4KWVtReirm44ovx9IrNi2DCKIpNfVpRhLxGEoPk6eM-rBc36-tlNNQriFBnKvKOTJ5gpwHnXGwQPWorqbYcEUjKiak8Rz9eK26dVZIRshzKeo1Z7Emb2QFMOOSnKQh0NpHSW8-EgPGr8yTPahOTTb1xtTGHMA27rN57SYqqA6AKABz9P3UMW7JzSiGD6gQmbfNJp-xSW3sWcP0Br913PA
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MoleculeNet%3A+A+Benchmark+for+Molecular+Machine+Learning&rft.au=Wu%2C+Zhenqin&rft.au=Ramsundar%2C+Bharath&rft.au=Feinberg%2C+Evan+N&rft.au=Gomes%2C+Joseph&rft.date=2017-03-01&rft_id=info:doi/10.48550%2Farxiv.1703.00564&rft.externalDocID=1703_00564