ArSphere: Arabic word vectors embedded in a polar sphere

Word embeddings mean the mapping of words into vectors in an N -dimensional space. ArSphere: is an approach that designs word embeddings for the Arabic language. This approach overcomes one of the shortcomings of word embeddings (for English language too), namely their inability to handle opposites...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of speech technology Vol. 26; no. 1; pp. 95 - 111
Main Authors Rizkallah, Sandra, Atiya, Amir F., Shaheen, Samir, Mahgoub, Hossam ElDin
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Word embeddings mean the mapping of words into vectors in an N -dimensional space. ArSphere: is an approach that designs word embeddings for the Arabic language. This approach overcomes one of the shortcomings of word embeddings (for English language too), namely their inability to handle opposites (and differentiate those from unrelated word pairs). To achieve that goal the vectors are embedded onto the unit sphere, rather than onto the entire space. The sphere embedding is suitable in the sense that polarity can be addressed by embedding vectors at opposite poles of the sphere. The proposed approach has several advantages. It utilizes the extensive resources developed by linguistic experts, including classic dictionaries. This is in contrast to the prevailing approach of designing the word embedding using the concept of word co-occurrence. Another advantage is that it is successful in distinguishing between synonyms, antonyms and unrelated word pairs. An algorithm to design the word embedding has been derived, and it is a simple relaxation algorithm. Being a fast algorithm allows easy update of the word vector collection, when adding new words or synonyms. The vectors are tested against a number of other published models and the results show that ArSphere outperforms the other models.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-022-09966-9