MoleculeNet: A Benchmark for Molecular Machine Learning
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.03.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Molecular machine learning has been maturing rapidly over the last few years.
Improved methods and the presence of larger datasets have enabled machine
learning algorithms to make increasingly accurate predictions about molecular
properties. However, algorithmic progress has been limited due to the lack of a
standard benchmark to compare the efficacy of proposed methods; most new
algorithms are benchmarked on different datasets making it challenging to gauge
the quality of proposed methods. This work introduces MoleculeNet, a large
scale benchmark for molecular machine learning. MoleculeNet curates multiple
public datasets, establishes metrics for evaluation, and offers high quality
open-source implementations of multiple previously proposed molecular
featurization and learning algorithms (released as part of the DeepChem open
source library). MoleculeNet benchmarks demonstrate that learnable
representations are powerful tools for molecular machine learning and broadly
offer the best performance. However, this result comes with caveats. Learnable
representations still struggle to deal with complex tasks under data scarcity
and highly imbalanced classification. For quantum mechanical and biophysical
datasets, the use of physics-aware featurizations can be more important than
choice of particular learning algorithm. |
---|---|
DOI: | 10.48550/arxiv.1703.00564 |