Weakly supervised learning of information structure of scientific abstracts-is it accurate enough to benefit real-world tasks in biomedicine?

Motivation: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the methods, results or conclusions of the study in question. Several approaches have been developed to identify such information in scientific journal art...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 27; no. 22; pp. 3179 - 3185
Main Authors	Guo, Yufan, Korhonen, Anna, Silins, Ilona, Stenius, Ulla
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 15.11.2011
Subjects	Abstracting and Indexing as Topic - methods Artificial Intelligence Biological and medical sciences Data Mining - methods Fundamental and applied biological sciences. Psychology General aspects Humans Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Medicin och hälsovetenskap Neoplasms - chemically induced Risk Assessment World Acquisition process Structure Supervised learning Abstract
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the methods, results or conclusions of the study in question. Several approaches have been developed to identify such information in scientific journal articles. The best of these have yielded promising results and proved useful for biomedical text mining tasks. However, relying on fully supervised machine learning (ml) and a large body of annotated data, existing approaches are expensive to develop and port to different tasks. A potential solution to this problem is to employ weakly supervised learning instead. In this article, we investigate a weakly supervised approach to identifying information structure according to a scheme called Argumentative Zoning (az). We apply four weakly supervised classifiers to biomedical abstracts and evaluate their performance both directly and in a real-life scenario in the context of cancer risk assessment. Results: Our best weakly supervised classifier (based on the combination of active learning and self-training) performs well on the task, outperforming our best supervised classifier: it yields a high accuracy of 81% when just 10% of the labeled data is used for training. When cancer risk assessors are presented with the resulting annotated abstracts, they find relevant information in them significantly faster than when presented with unannotated abstracts. These results suggest that weakly supervised learning could be used to improve the practical usefulness of information structure for real-life tasks in biomedicine. Availability: The annotated dataset, classifiers and the user test for cancer risk assessment are available online at http://www.cl.cam.ac.uk/~yg244/11bioinfo.html. Contact: anna.korhonen@cl.cam.ac.uk
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1367-4803 1367-4811 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btr536