Distant supervision for treatment relation extraction by leveraging MeSH subheadings
•MeSH subheadings are leveraged to generate examples from PubMed abstracts for distantly supervised treatment relation extraction.•MeSH Subheadings based Distant Supervision (MSDS) resulted in 17% improvement in PR-AUC over traditional distant supervision.•Examples from MSDS can be used to augment c...
Saved in:
Published in | Artificial intelligence in medicine Vol. 98; pp. 18 - 26 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
01.07.2019
|
Subjects | |
Online Access | Get full text |
ISSN | 0933-3657 1873-2860 1873-2860 |
DOI | 10.1016/j.artmed.2019.06.002 |
Cover
Loading…
Summary: | •MeSH subheadings are leveraged to generate examples from PubMed abstracts for distantly supervised treatment relation extraction.•MeSH Subheadings based Distant Supervision (MSDS) resulted in 17% improvement in PR-AUC over traditional distant supervision.•Examples from MSDS can be used to augment crowd-sourced examples for additional performance gains.•Incorporating a noise-resistant loss function improves performance of models trained on examples generated by MSDS.
The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0933-3657 1873-2860 1873-2860 |
DOI: | 10.1016/j.artmed.2019.06.002 |