MoleculeNet: A Benchmark for Molecular Machine Learning
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.03.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine
learning algorithms to make increasingly accurate predictions about molecular
properties. However, algorithmic progress has been limited due to the lack of a
standard benchmark to compare the efficacy of proposed methods; most new
algorithms are benchmarked on different datasets making it challenging to gauge
the quality of proposed methods. This work introduces MoleculeNet, a large
scale benchmark for molecular machine learning. MoleculeNet curates multiple
public datasets, establishes metrics for evaluation, and offers high quality
open-source implementations of multiple previously proposed molecular
featurization and learning algorithms (released as part of the DeepChem open
source library). MoleculeNet benchmarks demonstrate that learnable
representations are powerful tools for molecular machine learning and broadly
offer the best performance. However, this result comes with caveats. Learnable
representations still struggle to deal with complex tasks under data scarcity
and highly imbalanced classification. For quantum mechanical and biophysical
datasets, the use of physics-aware featurizations can be more important than
choice of particular learning algorithm. |
---|---|
AbstractList | Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine
learning algorithms to make increasingly accurate predictions about molecular
properties. However, algorithmic progress has been limited due to the lack of a
standard benchmark to compare the efficacy of proposed methods; most new
algorithms are benchmarked on different datasets making it challenging to gauge
the quality of proposed methods. This work introduces MoleculeNet, a large
scale benchmark for molecular machine learning. MoleculeNet curates multiple
public datasets, establishes metrics for evaluation, and offers high quality
open-source implementations of multiple previously proposed molecular
featurization and learning algorithms (released as part of the DeepChem open
source library). MoleculeNet benchmarks demonstrate that learnable
representations are powerful tools for molecular machine learning and broadly
offer the best performance. However, this result comes with caveats. Learnable
representations still struggle to deal with complex tasks under data scarcity
and highly imbalanced classification. For quantum mechanical and biophysical
datasets, the use of physics-aware featurizations can be more important than
choice of particular learning algorithm. |
Author | Wu, Zhenqin Geniesse, Caleb Ramsundar, Bharath Leswing, Karl Feinberg, Evan N Pande, Vijay Pappu, Aneesh S Gomes, Joseph |
Author_xml | – sequence: 1 givenname: Zhenqin surname: Wu fullname: Wu, Zhenqin – sequence: 2 givenname: Bharath surname: Ramsundar fullname: Ramsundar, Bharath – sequence: 3 givenname: Evan N surname: Feinberg fullname: Feinberg, Evan N – sequence: 4 givenname: Joseph surname: Gomes fullname: Gomes, Joseph – sequence: 5 givenname: Caleb surname: Geniesse fullname: Geniesse, Caleb – sequence: 6 givenname: Aneesh S surname: Pappu fullname: Pappu, Aneesh S – sequence: 7 givenname: Karl surname: Leswing fullname: Leswing, Karl – sequence: 8 givenname: Vijay surname: Pande fullname: Pande, Vijay |
BackLink | https://doi.org/10.48550/arXiv.1703.00564$$DView paper in arXiv |
BookMark | eNotj8tOwzAURL2giz74AFb1DyRcx8-yayteUoBN99F1fN1GBKcyD8HfE0o3MyONNDozYxdpSMTYlYBSOa3hGvN391UKC7IE0EZNmX0aemo_e3qmjxu-5htK7eEN8yuPQ-bnEseE7aFLxGvCnLq0X7BJxP6dLs8-Z7u72932oahf7h-367pAY1URA7mVQHAaQgCHGNH4ilpvtaxoRUI4AyIaNWrwqgrBefIyGrQQyTg5Z8v_2RN5c8zdyPbT_D1oTg_kL4SdQh4 |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.1703.00564 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1703_00564 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:41:17 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a674-fde891a0850dd08aafa6b2ecb7532e9e118601f64601db42dd8beb3f6a70fe683 |
OpenAccessLink | https://arxiv.org/abs/1703.00564 |
ParticipantIDs | arxiv_primary_1703_00564 |
PublicationCentury | 2000 |
PublicationDate | 2017-03-01 |
PublicationDateYYYYMMDD | 2017-03-01 |
PublicationDate_xml | – month: 03 year: 2017 text: 2017-03-01 day: 01 |
PublicationDecade | 2010 |
PublicationYear | 2017 |
Score | 1.6578251 |
SecondaryResourceType | preprint |
Snippet | Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Physics - Chemical Physics Statistics - Machine Learning |
Title | MoleculeNet: A Benchmark for Molecular Machine Learning |
URI | https://arxiv.org/abs/1703.00564 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NS8QwEB129-RFFJX1kxy8FttsmqbeVnFdhK6XFXork2aiIorUKv58p2kXvXgJIZkc8hKYN2TyBuDczwwmzplIcY8bVBFKSiLM2HfGHrtVXbbFSi8f1F2ZliMQm78w2Hw_f_X6wPbjIsmCAGmq1RjGUnYpW7f3Zf84GaS4BvtfO-aYYeiPk1jswPbA7sS8P45dGNHbHmRFX4KWVtReirm44ovx9IrNi2DCKIpNfVpRhLxGEoPk6eM-rBc36-tlNNQriFBnKvKOTJ5gpwHnXGwQPWorqbYcEUjKiak8Rz9eK26dVZIRshzKeo1Z7Emb2QFMOOSnKQh0NpHSW8-EgPGr8yTPahOTTb1xtTGHMA27rN57SYqqA6AKABz9P3UMW7JzSiGD6gQmbfNJp-xSW3sWcP0Br913PA |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MoleculeNet%3A+A+Benchmark+for+Molecular+Machine+Learning&rft.au=Wu%2C+Zhenqin&rft.au=Ramsundar%2C+Bharath&rft.au=Feinberg%2C+Evan+N&rft.au=Gomes%2C+Joseph&rft.date=2017-03-01&rft_id=info:doi/10.48550%2Farxiv.1703.00564&rft.externalDocID=1703_00564 |