Approximation of the Meaning for Thematic Subject Headings by Simple Interpretable Representations
The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing...
Saved in:
Published in | Lobachevskii journal of mathematics Vol. 45; no. 3; pp. 1261 - 1274 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Moscow
Pleiades Publishing
01.03.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1995-0802 1818-9962 |
DOI: | 10.1134/S1995080224600778 |