Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial i...
Saved in:
Main Authors | , , , , , , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Researchers across the globe are seeking to rapidly repurpose existing drugs
or discover new drugs to counter the the novel coronavirus disease (COVID-19)
caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One
promising approach is to train machine learning (ML) and artificial
intelligence (AI) tools to screen large numbers of small molecules. As a
contribution to that effort, we are aggregating numerous small molecules from a
variety of sources, using high-performance computing (HPC) to computer diverse
properties of those molecules, using the computed properties to train ML/AI
models, and then using the resulting models for screening. In this first data
release, we make available 23 datasets collected from community sources
representing over 4.2 B molecules enriched with pre-computed: 1) molecular
fingerprints to aid similarity searches, 2) 2D images of molecules to enable
exploration and application of image-based deep learning methods, and 3) 2D and
3D molecular descriptors to speed development of machine learning models. This
data release encompasses structural information on the 4.2 B molecules and 60
TB of pre-computed data. Future releases will expand the data to include more
detailed molecular simulations, computed models, and other products. |
---|---|
DOI: | 10.48550/arxiv.2006.02431 |