Terminology Saturation Detection, Measurement and Use

This book highlights an innovative approach for extracting terminological cores from subject domain-bounded collections of professional texts. The approach is based on exploiting the phenomenon of terminological saturation. The book presents the formal framework for the method of detecting and measu...

Full description

Saved in:

Bibliographic Details
Main Authors	Kosa, Victoria, Ermolayev, Vadim
Format	eBook
Language	English
Published	Singapore Springer 2022 Springer Nature Singapore Springer Singapore
Edition	1
Series	Cognitive Science and Technology
Subjects	Artificial Intelligence Cognitive Psychology Computational Intelligence Computer Science Data mining Data Mining and Knowledge Discovery Engineering Neurosciences
Online Access	Get full text

Cover

Loading…

Table of Contents:

6.3 Practical Benefits and Limitations -- 6.4 Potential Business Use Scenarios in Scientific Publishing -- 6.5 Summary -- References -- 7 Conclusions and Outlook -- 7.1 Summary of Findings and Results -- 7.2 Open Research Issues and Future Work
Intro -- Preface -- Acknowledgements -- Contents -- About the Authors -- List of Figures -- List of Tables -- 1 Introduction -- 1.1 Representativeness Challenge in Ontology Engineering -- 1.2 Phenomenon of Saturation -- 1.3 Structure of the Book -- References -- 2 Related Work and Our Approach -- 2.1 Methodology for Literature Sampling -- 2.2 Domain Ontology Engineering and Requirements Elicitation -- 2.3 Ontology Learning from Texts and Community Consensus -- 2.4 Collecting Relevant Documents of Good Quality -- 2.5 Terminological Saturation and Representativeness -- 2.6 Theoretical Saturation and Ontology Learning -- 2.7 Ordering of Documents for Processing -- 2.8 Automated Term Extraction Methods -- 2.9 Software Implementations of ATE Methods -- 2.10 Text Similarity Measurement -- 2.11 Efficient Strings Matching for Searching Nested Terms -- 2.12 Research Gaps and Motivation -- 2.13 Research Questions and Objectives -- 2.13.1 Envisioned Approach for Terminological Saturation Detection and Measurement -- 2.13.2 Research Questions and Objectives -- 2.14 Summary -- References -- 3 Formal Framework -- 3.1 Preliminaries -- 3.2 Research Hypotheses -- 3.3 Terminological Difference Function (thd) -- 3.4 Metric Properties of the thd Function -- 3.5 Existence Conditions for Terminological Saturation -- 3.6 Scalability and Optimization -- 3.7 Summary -- References -- 4 Algorithmic Suite -- 4.1 Computation Flow -- 4.2 Preparatory Steps and Algorithms -- 4.2.1 Catalogue Generation -- 4.2.2 Documents Download -- 4.3 Pre-processing Steps and Algorithms -- 4.3.1 Documents Conversion (PDF to Plain Text) -- 4.3.2 Configuration and Datasets Generation -- 4.4 Algorithms for the Optimized Computation Pipeline -- 4.4.1 Use of the Aho-Corascik Algorithm in Computing C-Values -- 4.4.2 Merging Partial C-Values
4.5 Baseline Algorithm for Terminological Difference Measurement -- 4.6 Algorithms for Terms Grouping -- 4.6.1 Choice of String Similarity Measures -- 4.6.2 Term Similarity Cases and Thresholds -- 4.6.3 Terms Grouping and Similarity Measurement -- 4.6.4 Refined Algorithm for Terminological Difference Measurement -- 4.7 Algorithm for Accumulated Regular Noise Removal -- 4.8 Implementation in Software -- 4.9 Summary -- References -- 5 Experimental Evaluation -- 5.1 Experimental Objectives -- 5.2 General Experimental Settings -- 5.2.1 Experimental Workflow and Instrumental Software Toolset -- 5.2.2 Document Collections and Datasets -- 5.2.3 Measurable Aspects and Measures -- 5.2.4 Experimental Environment -- 5.3 Correctness Check Using Synthetic Collections -- 5.3.1 Results and Discussion -- 5.3.2 Recommendation of the ATE Tool -- 5.4 Choice of Software for ATE -- 5.4.1 Results of Experiments -- 5.4.2 Discussion and Recommendation -- 5.5 Influence of Document Ordering -- 5.5.1 Particularities in Experimental Settings -- 5.5.2 Results of Terminological Saturation Study -- 5.5.3 Results of the Regular Noise Sensitivity Study -- 5.5.4 Overall Ranking and Recommendation -- 5.6 Influence of Term Grouping -- 5.6.1 Particularities in Experimental Settings -- 5.6.2 Results and Discussion -- 5.6.3 Overall Ranking and Recommendation -- 5.7 Validity and Scalability of the Optimized Term Extraction Pipeline -- 5.7.1 Particularities in Experimental Settings -- 5.7.2 Results and Discussion -- 5.8 Summary -- References -- 6 Saturated Terminology Extraction and Analysis in Use -- 6.1 Checking Gartner Trend Prediction -- 6.1.1 Questions and Method of the Study -- 6.1.2 Experimental Results and Discussion -- 6.2 Instrumenting the Literature Review Activity of Master Students -- 6.2.1 Task for Students -- 6.2.2 Method Adoption Results