Using Word2Vec for news articles recommendations: Considering evaluation options for hyperparameter optimization and different input options

Evaluation of unsupervised and semi-supervised learning methods, especially in the field of information retrieval and recommender systems is a problematic and resource-intensive task. Often, there is no way to evaluate the used machine learning model until user testing is performed. We investigated...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE 16th International Scientific Conference on Informatics (Informatics) pp. 358 - 367
Main Authors Walek, Bogdan, Muller, Patrik
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Evaluation of unsupervised and semi-supervised learning methods, especially in the field of information retrieval and recommender systems is a problematic and resource-intensive task. Often, there is no way to evaluate the used machine learning model until user testing is performed. We investigated hyperparameter optimization options of Gensim's Word2Vec implementation by evaluating model performance on word analogies and word pairs tests and statistics of out-of-vocabulary ratio. These automatic and task-independent offline (pre-) evaluations techniques could provide a simple way to reduce the set of final model variants used for resource-demanding user testing or hybrid recommender models, thus we investigated whether those tests were useful for the accuracy of our final task of providing similar articles to a chosen article. We also consider options of using Wikipedia articles for the model training input or the pre-trained FastText model.
DOI:10.1109/Informatics57926.2022.10083395