An automatic and association-based procedure for hierarchical publication subject categorization

•We propose a novel procedure for publication subject hierarchical categorization based on the repetition and absence of relevant terms in association rules defined among sets of individual topics.•Our topics are extracted from the category list provided by the SCImago Journal Rank dataset in combin...

Full description

Saved in:
Bibliographic Details
Published inJournal of informetrics Vol. 18; no. 1; p. 101466
Main Authors Urdiales, Cristina, Guzmán, Eduardo
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We propose a novel procedure for publication subject hierarchical categorization based on the repetition and absence of relevant terms in association rules defined among sets of individual topics.•Our topics are extracted from the category list provided by the SCImago Journal Rank dataset in combination with the journal prior classification provided in the Scopus database to construct a data-driven hierarchy of publication categories.•The procedure can suggest new potential fields, corresponding to sets of publication topics strongly related among them that cannot be suited to any existing field.•The procedure also reveals outliers that are excluded from emerging rules and, not fitting in any other field that should lead to the creation of their own subfield.•Our proposal has been validated using the Jensen–Shannon divergence and supervised machine learning techniques, which relate the categories of a publication and its fields in the hierarchy. Our categorization procedure outperforms the results of the prior preassigned hierarchy. Subject categorization of scientific publications, i.e., journals, book series or conference proceedings, has become a main concern in academia, as publication impact and ranking are considered a basic criterion to evaluate paper quality. Publishers usually propose their own categorization, but they often include only their own publications and their categories might not be coherent with other proposals. Also, due to the dynamic nature of science, new categories may frequently appear. As traditional mechanisms for categorization have been questioned by many authors, a new research line has emerged to improve the category assignment process. Approaches usually rely on assessing publication similarity in terms of topics, co-citation, editorial boards, and/or shared author profiles. In this work, we propose a novel procedure for scientific publication hierarchical categorization based on the repetition or absence of relevant descriptors in association rules among publications. The key idea is that publication categories can be automatically defined by strong associations of nuclear topics. Also, some very specific subcategories can be defined by exclusion from any set of rules. This process can be used to construct a data-driven hierarchy of scientific publication categories from scratch or to improve any existing categorization by discovering new fields. In this paper the proposed algorithm uses SJR descriptors all journals in the SCImago dataset and the three-level classification in the Scopus dataset (covering only 35 % of publications of the SCImago dataset) to discover new categories and assign every journal to the resulting enhanced hierarchy one. We have focused on the field of “Physical Sciences and Engineering”, using the SCImago and Scopus datasets from 2019 (30,883 scientific publications). Our procedure combines data engineering techniques with association rules and generates as a result potential new categories and outlier subcategories. To evaluate the suitability of our proposal, we have analyzed classification results based on the original category list and our extended two-level categorization via the Jensen–Shannon divergence and supervised machine-learning techniques. Results reveal the consistency and suitability of our categorization procedure.
ISSN:1751-1577
1875-5879
DOI:10.1016/j.joi.2023.101466