A dataset for medical instructional video classification and question answering

This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medica...

Full description

Saved in:

Bibliographic Details
Published in	Scientific data Vol. 10; no. 1; pp. 158 - 16
Main Authors	Gupta, Deepak, Attal, Kush, Demner-Fushman, Dina
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 22.03.2023 Nature Publishing Group Nature Portfolio
Subjects	639/705/1046 639/705/258 Classification Data Descriptor Datasets Humanities and Social Sciences Informatics Language Localization Medical Informatics multidisciplinary Natural Language Processing Science Science (multidisciplinary) Semantics Sensory integration
Online Access	Get full text
ISSN	2052-4463 2052-4463
DOI	10.1038/s41597-023-02036-y

Cover

Loading…

More Information
Summary:	This paper introduces a new challenge and datasets to foster research toward designing systems that can understand medical videos and provide visual answers to natural language questions. We believe medical videos may provide the best possible answers to many first aid, medical emergency, and medical education questions. Toward this, we created the MedVidCL and MedVidQA datasets and introduce the tasks of Medical Video Classification (MVC) and Medical Visual Answer Localization (MVAL), two tasks that focus on cross-modal (medical language and medical video) understanding. The proposed tasks and datasets have the potential to support the development of sophisticated downstream applications that can benefit the public and medical practitioners. Our datasets consist of 6,117 fine-grained annotated videos for the MVC task and 3,010 questions and answers timestamps from 899 videos for the MVAL task. These datasets have been verified and corrected by medical informatics experts. We have also benchmarked each task with the created MedVidCL and MedVidQA datasets and propose the multimodal learning methods that set competitive baselines for future research.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23
ISSN:	2052-4463 2052-4463
DOI:	10.1038/s41597-023-02036-y