Deep learning-assisted literature mining for in vitro radiosensitivity data

•Integration of published radiosensitivity (RS) data is important but labor-intensive.•We developed deep learning-aided programs to extract RS data from the literature.•Programs #1–3 screen papers containing RS data obtained by clonogenic assays (CAs).•Program #4 extracts CA-derived SF2 data from se...

Full description

Saved in:
Bibliographic Details
Published inRadiotherapy and oncology Vol. 139; pp. 87 - 93
Main Authors Komatsu, Shuichiro, Oike, Takahiro, Komatsu, Yuka, Kubota, Yoshiki, Sakai, Makoto, Matsui, Toshiaki, Nuryadi, Endang, Permata, Tiara Bunga Mayang, Sato, Hiro, Kawamura, Hidemasa, Okamoto, Masahiko, Kaminuma, Takuya, Murata, Kazutoshi, Okano, Naoko, Hirota, Yuka, Ohno, Tatsuya, Saitoh, Jun-ichi, Shibata, Atsushi, Nakano, Takashi
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Integration of published radiosensitivity (RS) data is important but labor-intensive.•We developed deep learning-aided programs to extract RS data from the literature.•Programs #1–3 screen papers containing RS data obtained by clonogenic assays (CAs).•Program #4 extracts CA-derived SF2 data from semi-logarithmic survival curves.•Programs #1–4 in combination help scientists mine CA-derived RS data from papers. Integrated analysis of existing radiosensitivity data obtained by the gold-standard clonogenic assay has the potential to improve our understanding of cancer cell radioresistance. However, extraction of radiosensitivity data from the literature is highly labor-intensive. To aid in this task, using deep convolutional neural networks (CNNs) and other computer technologies, we developed an analysis pipeline that extracts radiosensitivity data derived from clonogenic assays from the literature. Three classifiers (C1–3) were developed to identify publications containing radiosensitivity data derived from clonogenic assays. C1 uses Faster Regions CNN with Inception Resnet v2 (fRCNN-IRv2), VGG-16, and Optical Character Recognition (OCR) to identify publications that contain semi-logarithmic graphs showing radiosensitivity data derived from clonogenic assays. C2 uses fRCNN-IRv2 and OCR to identify publications that contain bar graphs showing radiosensitivity data derived from clonogenic assays. C3 is a program that identifies publications containing keywords related to radiosensitivity data derived from clonogenic assays. A program (iSF2) was developed using Mask RCNN and OCR to extract surviving fraction after 2-Gy irradiation (SF2) as assessed by clonogenic assays, presented in semi-logarithmic graphs. The efficacy of C1–3 and iSF2 was tested using seven datasets (1805 and 222 publications in total, respectively). C1–3 yielded sensitivity of 91.2% ± 3.4% and specificity of 90.7% ± 3.6%. iSF2 returned SF2 values that were within 2.9% ± 2.6% of the SF2 values determined by radiation oncologists. Our analysis pipeline is potentially useful to acquire radiosensitivity data derived from clonogenic assays from the literature.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0167-8140
1879-0887
1879-0887
DOI:10.1016/j.radonc.2019.07.003