Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding

In this paper, we utilize the pre-trained embedding, sub-word embedding and closely related languages of languages in the code mixed corpus to create a meta-embedding. We then use the Transformer to encode the code mixed sentence and use Conditional Random Field to predict the Named Entities in the...

Full description

Saved in:
Bibliographic Details
Published inInternational Conference on Advanced Computing and Communication Systems (Online) pp. 68 - 72
Main Authors Priyadharshini, Ruba, Chakravarthi, Bharathi Raja, Vegupatti, Mani, McCrae, John P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we utilize the pre-trained embedding, sub-word embedding and closely related languages of languages in the code mixed corpus to create a meta-embedding. We then use the Transformer to encode the code mixed sentence and use Conditional Random Field to predict the Named Entities in the code-mixed text. In contrast to classical Named Entity recognition where the text is monolingual, our approach can predict the Named Entities in code-mixed corpus written both in the native script as well as Roman script. Our method is a novel method to combine the embeddings of closely related languages to identify Named Entity from Code-Mixed Indian text written using native script and Roman script in social media.
ISBN:1728151961
9781728151960
ISSN:2575-7288
DOI:10.1109/ICACCS48705.2020.9074379