ImageDataExtractor: A Tool To Extract and Quantify Data from Microscopy Images
The rise of data science is leading to new paradigms in data-driven materials discovery. This carries an essential notion that large data sources containing chemical structure and property information can be mined in a fashion that detects and exploits structure–property relationships, such that che...
Saved in:
Published in | Journal of chemical information and modeling Vol. 60; no. 5; pp. 2492 - 2509 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
American Chemical Society
26.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The rise of data science is leading to new paradigms in data-driven materials discovery. This carries an essential notion that large data sources containing chemical structure and property information can be mined in a fashion that detects and exploits structure–property relationships, such that chemicals can be predicted to suit a given material application. The success of material predictions is predicated on these large data sources of chemical structure and property information being suited to a target application. Microscopy is commonly used to characterize chemical structure, especially in fields such as nanotechnology where material properties are highly dependent on the size and shape of nanoparticles. Large data sources of nanoparticle information stemming from microscopy images would thus be highly beneficial. Millions of microscopy images exist, but they lie fragmented across the literature, typically presented individually within a paper article and usually in a qualitative fashion therein, even though they harbor a wealth of numeric information. We present the ImageDataExtractor toolkit that autoidentifies and autoextracts microscopy images from scientific documents, whereupon it autonomously analyzes each image to produce quantitative particle size and shape information about its subject material. Each image is quantified by decoding its scale bar information using optical character recognition, with help from super-resolution convolutional neural networks where required. Individual particles are detected and profiled using various thresholding, segmentation, polygon fitting, and edge correction routines. The high-throughput operational capability of ImageDataExtractor means that it can be used to generate large-data sources of particle information for data-driven materials discovery. Evaluation metrics, precision and recall, are greater than 80% for the majority of the image processing steps, and precision is above 80% for all critical steps. The ImageDataExtractor tool is released under the MIT license and is available to download from http://www.imagedataextractor.org. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 1549-9596 1549-960X 1549-960X |
DOI: | 10.1021/acs.jcim.9b00734 |