Generative Artificial Intelligence Enhancements for Reducing Image-based Training Data Requirements

Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the...

Full description

Saved in:
Bibliographic Details
Published inOphthalmology science (Online) Vol. 4; no. 5; p. 100531
Main Authors Chen, Dake, Han, Ying, Duncan, Jacque, Jia, Lin, Shan, Jing
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier Inc 01.09.2024
Subjects
Online AccessGet full text
ISSN2666-9145
2666-9145
DOI10.1016/j.xops.2024.100531

Cover

Loading…
More Information
Summary:Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data. Computational study with open-source data. The data were obtained from 6 open-source datasets comprising patients aged 40–80 years in Singapore, China, India, and Spain. The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy. Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance. Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models. Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields. The authors have no proprietary or commercial interest in any materials discussed in this article.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2666-9145
2666-9145
DOI:10.1016/j.xops.2024.100531