Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding

In this paper, we utilize the pre-trained embedding, sub-word embedding and closely related languages of languages in the code mixed corpus to create a meta-embedding. We then use the Transformer to encode the code mixed sentence and use Conditional Random Field to predict the Named Entities in the...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Advanced Computing and Communication Systems (Online) pp. 68 - 72
Main Authors	Priyadharshini, Ruba, Chakravarthi, Bharathi Raja, Vegupatti, Mani, McCrae, John P.
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2020
Subjects	code-mixing code-switching conditional random field Data science embedding Encoding Indian code mixing meta embedding named entity recognition Natural language processing Syntactics Task analysis Text recognition Vocabulary
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we utilize the pre-trained embedding, sub-word embedding and closely related languages of languages in the code mixed corpus to create a meta-embedding. We then use the Transformer to encode the code mixed sentence and use Conditional Random Field to predict the Named Entities in the code-mixed text. In contrast to classical Named Entity recognition where the text is monolingual, our approach can predict the Named Entities in code-mixed corpus written both in the native script as well as Roman script. Our method is a novel method to combine the embeddings of closely related languages to identify Named Entity from Code-Mixed Indian text written using native script and Roman script in social media.
ISBN:	1728151961 9781728151960
ISSN:	2575-7288
DOI:	10.1109/ICACCS48705.2020.9074379