Automated Prediction of Good Dictionary EXamples (GDEX): A Comprehensive Experiment with Distant Supervision, Machine Learning, and Word Embedding-Based Deep Learning Techniques

Dictionaries not only are the source of getting meanings of the word but also serve the purpose of comprehending the context in which the words are used. For such purpose, we see a small sentence as an example for the very word in comprehensive book-dictionaries and more recently in online dictionar...

Full description

Saved in:
Bibliographic Details
Published inComplexity (New York, N.Y.) Vol. 2021; no. 1
Main Authors Khan, Muhammad Yaseen, Qayoom, Abdul, Nizami, Muhammad Suffian, Siddiqui, Muhammad Shoaib, Wasi, Shaukat, Raazi, Syed Muhammad Khaliq-ur-Rahman
Format Journal Article
LanguageEnglish
Published Hoboken Hindawi 2021
Hindawi Limited
Hindawi-Wiley
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Dictionaries not only are the source of getting meanings of the word but also serve the purpose of comprehending the context in which the words are used. For such purpose, we see a small sentence as an example for the very word in comprehensive book-dictionaries and more recently in online dictionaries. The lexicographers perform a very meticulous activity for the elicitation of Good Dictionary EXamples (GDEX)—a sentence that is best fit in a dictionary for the word’s definition. The rules for the elicitation of GDEX are very strenuous and require a lot of time for committing the manual process. In this regard, this paper focuses on two major tasks, i.e., the development of labelled corpora for top 3K English words through the usage of distant supervision approach and devising a state-of-the-art artificial intelligence-based automated procedure for discriminating Good Dictionary EXamples from the bad ones. The proposed methodology involves a suite of five machine learning (ML) and five word embedding-based deep learning (DL) architectures. A thorough analysis of the results shows that GDEX elicitation can be done by both ML and DL models; however, DL-based models show a trivial improvement of 3.5% over the conventional ML models. We find that the random forests with parts-of-speech information and word2vec-based bidirectional LSTM are the most optimal ML and DL combinations for automated GDEX elicitation; on the test set, these models, respectively, secured a balanced accuracy of 73% and 77%.
ISSN:1076-2787
1099-0526
DOI:10.1155/2021/2553199