Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis

Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of...

Full description

Saved in:
Bibliographic Details
Published inNatural language engineering Vol. 29; no. 1; pp. 138 - 161
Main Authors Chebel, Mohamed, Latiri, Chiraz, Gaussier, Eric
Format Journal Article
LanguageEnglish
Published Cambridge, UK Cambridge University Press 01.01.2023
Cambridge University Press (CUP)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1351-3249
1469-8110
DOI:10.1017/S135132492100022X