A Survey of Current Datasets for Code-Switching Research
Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together languages from low resourced languages to high resourced languages in the same tex...
Saved in:
Published in | 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) pp. 136 - 141 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together languages from low resourced languages to high resourced languages in the same text, sometimes written in a non-native script. This increases the demand for processing code-switched data to assist users in various natural language processing tasks such as part-of-speech tagging, named entity recognition, sentiment analysis, conversational systems, and machine translation, etc. The available corpora for code switching research played a major role in advancing this area of research. In this paper, we propose a set of quality metrics to evaluate the dataset and categorize them accordingly. |
---|---|
ISBN: | 1728151961 9781728151960 |
ISSN: | 2575-7288 |
DOI: | 10.1109/ICACCS48705.2020.9074205 |