Sequence tagging for biomedical extractive question answering

Abstract Motivation Current studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the gener...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 38; no. 15; pp. 3794 - 3801
Main Authors	Yoon, Wonjin, Jackson, Richard, Lagerberg, Aron, Kang, Jaewoo
Format	Journal Article
Language	English
Published	England Oxford University Press 02.08.2022
Subjects	Computational Biology Original Papers Software
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Abstract Motivation Current studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize the single-span extraction setting with post-processing steps. Results In this article, we investigate the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). This necessitates the models capable of producing multiple answers for a question. Based on this preliminary study, we propose a sequence tagging approach for BioEQA, which is a multi-span extraction setting. Our approach directly tackles questions with a variable number of phrases as their answer and can learn to decide the number of answers for a question from training data. Our experimental results on the BioASQ 7b and 8b list-type questions outperformed the best-performing existing models without requiring post-processing steps. Availability and implementation Source codes and resources are freely available for download at https://github.com/dmis-lab/SeqTagQA. Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 This work was done while Wonjin Yoon worked under the Research Collaboration project at AstraZeneca.
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/btac397