Distant supervision for treatment relation extraction by leveraging MeSH subheadings

•MeSH subheadings are leveraged to generate examples from PubMed abstracts for distantly supervised treatment relation extraction.•MeSH Subheadings based Distant Supervision (MSDS) resulted in 17% improvement in PR-AUC over traditional distant supervision.•Examples from MSDS can be used to augment c...

Full description

Saved in:

Bibliographic Details
Published in	Artificial intelligence in medicine Vol. 98; pp. 18 - 26
Main Authors	Tran, Tung, Kavuluru, Ramakanth
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.07.2019
Subjects	Abstracting and Indexing Area Under Curve Crowdsourcing Data Mining Distant supervision Humans Medical Subject Headings Medical treatment relation MeSH subheadings PubMed Relation extraction Semantics Supervised Machine Learning Relation extraction Medical treatment relation MeSH subheadings Distant supervision
Online Access	Get full text
ISSN	0933-3657 1873-2860 1873-2860
DOI	10.1016/j.artmed.2019.06.002

Cover

Loading…

More Information
Summary:	•MeSH subheadings are leveraged to generate examples from PubMed abstracts for distantly supervised treatment relation extraction.•MeSH Subheadings based Distant Supervision (MSDS) resulted in 17% improvement in PR-AUC over traditional distant supervision.•Examples from MSDS can be used to augment crowd-sourced examples for additional performance gains.•Incorporating a noise-resistant loss function improves performance of models trained on examples generated by MSDS. The growing body of knowledge in biomedicine is too vast for human consumption. Hence there is a need for automated systems able to navigate and distill the emerging wealth of information. One fundamental task to that end is relation extraction, whereby linguistic expressions of semantic relationships between biomedical entities are recognized and extracted. In this study, we propose a novel distant supervision approach for relation extraction of binary treatment relationships such that high quality positive/negative training examples are generated from PubMed abstracts by leveraging associated MeSH subheadings. The quality of generated examples is assessed based on the quality of supervised models they induce; that is, the mean performance of trained models (derived via bootstrapped ensembling) on a gold standard test set is used as a proxy for data quality. We show that our approach is preferable to traditional distant supervision for treatment relations and is closer to human crowd annotations in terms of annotation quality. For treatment relations, our generated training data performs at 81.38%, compared to traditional distant supervision at 64.33% and crowd-sourced annotations at 90.57% on the model-wide PR-AUC metric. We also demonstrate that examples generated using our method can be used to augment crowd-sourced datasets. Augmented models improve over non-augmented models by more than two absolute points on the more established F1 metric. We lastly demonstrate that performance can be further improved by implementing a classification loss that is resistant to label noise.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0933-3657 1873-2860 1873-2860
DOI:	10.1016/j.artmed.2019.06.002