Structured abstract summarization of scientific articles: Summarization using full‐text section information

The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text u...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Society for Information Science and Technology Vol. 74; no. 2; pp. 234 - 248
Main Authors Oh, Hanseok, Nam, Seojin, Zhu, Yongjun
Format Journal Article
LanguageEnglish
Published Hoboken Wiley Periodicals Inc 01.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The automatic summarization of scientific articles differs from other text genres because of the structured format and longer text length. Previous approaches have focused on tackling the lengthy nature of scientific articles, aiming to improve the computational efficiency of summarizing long text using a flat, unstructured abstract. However, the structured format of scientific articles and characteristics of each section have not been fully explored, despite their importance. The lack of a sufficient investigation and discussion of various characteristics for each section and their influence on summarization results has hindered the practical use of automatic summarization for scientific articles. To provide a balanced abstract proportionally emphasizing each section of a scientific article, the community introduced the structured abstract, an abstract with distinct, labeled sections. Using this information, in this study, we aim to understand tasks ranging from data preparation to model evaluation from diverse viewpoints. Specifically, we provide a preprocessed large‐scale dataset and propose a summarization method applying the introduction, methods, results, and discussion (IMRaD) format reflecting the characteristics of each section. We also discuss the objective benchmarks and perspectives of state‐of‐the‐art algorithms and present the challenges and research directions in this area.
ISSN:2330-1635
2330-1643
DOI:10.1002/asi.24727