Cross Language Information Retrieval for Accessing the English Web in Sinhala

Searching the web in Sinhala language does not provide satisfactory results and hence, Sri Lankans who are not fluent in English find it difficult to browse the web for knowledge. This issue can be solved by Cross Language Information Retrieval (CLIR) where the query in Sinhala is matched with docum...

Full description

Saved in:
Bibliographic Details
Published in2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer) pp. 244 - 249
Main Authors Hisan, M. H. M., Weerasinghe, A. R., Pushpananda, B. H. R.
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.11.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Searching the web in Sinhala language does not provide satisfactory results and hence, Sri Lankans who are not fluent in English find it difficult to browse the web for knowledge. This issue can be solved by Cross Language Information Retrieval (CLIR) where the query in Sinhala is matched with documents in English using a query translation approach. This study has experimented with different models which uses the concept of word embeddings to transform the Sinhala query to English where results were retrieved using the Google Search API by providing the equivalent English query obtained. The retrieved results were translated back to Sinhala and re-ranked using two different approaches. A user evaluation showed that re-ranking the results did not show a positive impact but obtaining results using the equivalent English query proved to be effective. Hence this study shows that the quality of the results obtained when searching the web in Sinhala can be improved by performing CLIR.
ISSN:2472-7598
DOI:10.1109/ICTer51097.2020.9325441