Tipster: A Topic-Guided Language Model for Topic-Aware Text Segmentation

The accurate segmentation and structural topics of plain documents not only meet people’s reading habit, but also facilitate various downstream tasks. Recently, some works have consistently given positive hints that text segmentation and segment topic labeling could be regarded as a mutual task, and...

Full description

Saved in:
Bibliographic Details
Published inDatabase Systems for Advanced Applications Vol. 13247; pp. 213 - 221
Main Authors Gong, Zheng, Tong, Shiwei, Wu, Han, Liu, Qi, Tao, Hanqing, Huang, Wei, Yu, Runlong
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3031001281
9783031001284
ISSN0302-9743
1611-3349
DOI10.1007/978-3-031-00129-1_14

Cover

Loading…
More Information
Summary:The accurate segmentation and structural topics of plain documents not only meet people’s reading habit, but also facilitate various downstream tasks. Recently, some works have consistently given positive hints that text segmentation and segment topic labeling could be regarded as a mutual task, and cooperating with word distributions has the potential to model latent topics in a certain document better. To this end, we present a novel model namely Tipster to solve text segmentation and segment topic labeling collaboratively. We first utilize a neural topic model to infer latent topic distributions of sentences considering word distributions. Then, our model divides the document into topically coherent segments based on the topic-guided contextual sentence representations of the pre-trained language model and assign relevant topic labels to each segment. Finally, we conduct extensive experiments which demonstrate that Tipster achieves the state-of-the-art performance in both text segmentation and segment topic labeling tasks.
ISBN:3031001281
9783031001284
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-031-00129-1_14