Structured abstract summarization of scientific articles: Summarization using full‐text section information

The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text u...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the American Society for Information Science and Technology Vol. 74; no. 2; pp. 234 - 248
Main Authors	Oh, Hanseok, Nam, Seojin, Zhu, Yongjun
Format	Journal Article
Language	English
Published	Hoboken Wiley Periodicals Inc 01.02.2023
Subjects	Algorithms Automatic summarization Format Text structure Unstructured data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text using a flat, unstructured abstract. However, the structured format of scientific articles and characteristics of each section have not been fully explored, despite their importance. The lack of a sufficient investigation and discussion of various characteristics for each section and their influence on summarization results has hindered the practical use of automatic summarization for scientific articles. To provide a balanced abstract proportionally emphasizing each section of a scientific article, the community introduced the structured abstract, an abstract with distinct, labeled sections. Using this information, in this study, we aim to understand tasks ranging from data preparation to model evaluation from diverse viewpoints. Specifically, we provide a preprocessed large‐scale dataset and propose a summarization method applying the introduction, methods, results, and discussion (IMRaD) format reflecting the characteristics of each section. We also discuss the objective benchmarks and perspectives of state‐of‐the‐art algorithms and present the challenges and research directions in this area.
ISSN:	2330-1635 2330-1643
DOI:	10.1002/asi.24727