Approximation of the Meaning for Thematic Subject Headings by Simple Interpretable Representations

The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing...

Full description

Saved in:
Bibliographic Details
Published inLobachevskii journal of mathematics Vol. 45; no. 3; pp. 1261 - 1274
Main Authors Sulzhenko, R. V., Dobrov, B. V.
Format Journal Article
LanguageEnglish
Published Moscow Pleiades Publishing 01.03.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1995-0802
1818-9962
DOI:10.1134/S1995080224600778