Urdu Named Entity Recognition System Using Deep Learning Approaches

Abstract Named entity recognition (NER) is a fundamental part of other natural language processing tasks such as information retrieval, question answering systems and machine translation. Progress and success have already been achieved in research on the English NER systems. However, the Urdu NER sy...

Full description

Saved in:
Bibliographic Details
Published inComputer journal Vol. 66; no. 8; pp. 1856 - 1869
Main Authors Haq, Rafiul, Zhang, Xiaowang, Khan, Wahab, Feng, Zhiyong
Format Journal Article
LanguageEnglish
Published Oxford University Press 14.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Named entity recognition (NER) is a fundamental part of other natural language processing tasks such as information retrieval, question answering systems and machine translation. Progress and success have already been achieved in research on the English NER systems. However, the Urdu NER system is still in its infancy due to the complexity and morphological richness of the Urdu language. Existing Urdu NER systems are highly dependent on manual feature engineering and word embedding to capture similarity. Their performance lags if the words are previously unknown or infrequent. The feature-based models suffer from complicated feature engineering and are often highly reliant on external resources. To overcome these limitations in this study, we present several deep neural approaches that automatically learn features from the data and eliminate manual feature engineering. Our extension involved convolutional neural network to extract character-level features and combine them with word embedding to handle out-of-vocabulary words. The study also presents a tweets dataset in Urdu, annotated manually for five named entity classes. The effectiveness of the deep learning approaches is demonstrated on four benchmarks datasets. The proposed method demonstrates notable progress upon current state-of-the-art NER approaches in Urdu. The results show an improvement of 6.26% in the F1 score.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/bxac047