A Survey of Current Datasets for Code-Switching Research

Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together languages from low resourced languages to high resourced languages in the same tex...

Full description

Saved in:
Bibliographic Details
Published in2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) pp. 136 - 141
Main Authors Jose, Navya, Chakravarthi, Bharathi Raja, Suryawanshi, Shardul, Sherly, Elizabeth, McCrae, John P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together languages from low resourced languages to high resourced languages in the same text, sometimes written in a non-native script. This increases the demand for processing code-switched data to assist users in various natural language processing tasks such as part-of-speech tagging, named entity recognition, sentiment analysis, conversational systems, and machine translation, etc. The available corpora for code switching research played a major role in advancing this area of research. In this paper, we propose a set of quality metrics to evaluate the dataset and categorize them accordingly.
AbstractList Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of code switched data in the social media that brings together languages from low resourced languages to high resourced languages in the same text, sometimes written in a non-native script. This increases the demand for processing code-switched data to assist users in various natural language processing tasks such as part-of-speech tagging, named entity recognition, sentiment analysis, conversational systems, and machine translation, etc. The available corpora for code switching research played a major role in advancing this area of research. In this paper, we propose a set of quality metrics to evaluate the dataset and categorize them accordingly.
Author Suryawanshi, Shardul
McCrae, John P.
Sherly, Elizabeth
Chakravarthi, Bharathi Raja
Jose, Navya
Author_xml – sequence: 1
  givenname: Navya
  surname: Jose
  fullname: Jose, Navya
  organization: Indian Institute of Information Technology and Management-Kerala,Machine Intelligence,Trivandrum,India
– sequence: 2
  givenname: Bharathi Raja
  surname: Chakravarthi
  fullname: Chakravarthi, Bharathi Raja
  organization: Data Science Institute, National University of Ireland,Galway,Ireland
– sequence: 3
  givenname: Shardul
  surname: Suryawanshi
  fullname: Suryawanshi, Shardul
  organization: Data Science Institute, National University of Ireland,Galway,Ireland
– sequence: 4
  givenname: Elizabeth
  surname: Sherly
  fullname: Sherly, Elizabeth
  organization: Indian Institute of Information Technology and Management-Kerala,Machine Intelligence,Trivandrum,India
– sequence: 5
  givenname: John P.
  surname: McCrae
  fullname: McCrae, John P.
  organization: Data Science Institute, National University of Ireland,Galway,Ireland
BookMark eNpFj9tKAzEUReMNbGu_wJf8wNRzksntsUSrQkFwFHwrMXPGjuiMJFOlf--ABZ82rA2btafstOs7YowjLBDBXd37pfdVaQ2ohQABCwemFKCO2BSNsKjQmZdjNhHKqGIE9uS_0HjO5jm_A4BE66y1E2aXvNqlb9rzvuF-lxJ1A78OQ8g0ZN70ifu-pqL6aYe4bbs3_kiZQorbC3bWhI9M80PO2PPq5snfFeuH21FyXbRCyaGodaxB2lejqRmBgIhaxwaD0AIwkJMagyNZjoYylJFqRyJoFGCjKrGWM3b5t9sS0eYrtZ8h7TeH1_IXngpK5g
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICACCS48705.2020.9074205
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 172815197X
9781728151977
EISSN 2575-7288
EndPage 141
ExternalDocumentID 9074205
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i253t-d6cd038b76ef25320c166cf1a26201ae9361a9e348153a4ced9e2a61208c541d3
IEDL.DBID RIE
ISBN 1728151961
9781728151960
IngestDate Wed Jun 26 19:26:59 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i253t-d6cd038b76ef25320c166cf1a26201ae9361a9e348153a4ced9e2a61208c541d3
OpenAccessLink https://aran.library.nuigalway.ie/bitstream/10379/16090/5/A_Survey_of_Current_Datasets_for_Code_Switching_research.pdf
PageCount 6
ParticipantIDs ieee_primary_9074205
PublicationCentury 2000
PublicationDate 2020-March
PublicationDateYYYYMMDD 2020-03-01
PublicationDate_xml – month: 03
  year: 2020
  text: 2020-March
PublicationDecade 2020
PublicationTitle 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)
PublicationTitleAbbrev ICACCS
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003189888
Score 2.2022226
Snippet Code switching is a prevalent phenomenon in the multilingual community and social media interaction. In the past ten years, we have witnessed an explosion of...
SourceID ieee
SourceType Publisher
StartPage 136
SubjectTerms code switching
dataset
Measurement
Natural language processing
Social network services
Switches
Tagging
Task analysis
Vocabulary
Title A Survey of Current Datasets for Code-Switching Research
URI https://ieeexplore.ieee.org/document/9074205
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ08qm_ibHDyaLk2btDmOqkxhIszBbiNNXmEIq8xW0b_eJG0nigd7agstSV_hey_5vu8hdCl4Yg8WE6NMRGITGuLCTFiahklhM3TlFXLTBzGZx_cLvuihq60WBgA8-QwCd-r38k2pa7dUNvKFnDMs3UmkbLRa2_UU-29KW8057VbCUotjUoStpVN3TTsmD5Wju2ycZTObrVNui0RGg_bdP5qseIy53UPTbnQNteQ5qKs80J-_jBv_O_x9NPxW8-HHLU4doB6sBygd41m9eYMPXBa4NWnC16qyoFa9YpvJ4qw0QGbvq8qzLXFH0Rui-e3NUzYhbRcFsmI8qogR2tAozRMBBXNtIHQohC5C5azoQwUyEjYg4AS5PFKxBiOBKZv40FTzODTRIeqvyzUcIWyfNEYyDqATW1bHykCRqpwmUAhuIn6MBm7Wy5fGKGPZTvjk79unaNd9-YbQdYb61aaGc4vwVX7hQ_sFs-SfAQ
link.rule.ids 310,311,783,787,792,793,799,23942,23943,25152,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5jHvSksom_zcGj6dq0SdPjqMqm2xC2wW4jTV5hCKvMVtG_3qQ_JooHe2oDLX28wvte-n3fQ-ias9AcNCBaap8E2tPEpplQIbwwNQhdlgq58YQP5sHDgi1a6GarhQGAknwGjj0t_-XrTBV2q6xXNnLWsHTH4GrBK7XWdkfFfJ2R6eeseiukwlSyiHu1qVNz7TZcHjfqDeN-HE8NXneZaROp69RP_zFmpawy9_to3LxfRS55doo8cdTnL-vG_wZwgLrfej78tK1Uh6gF6w4SfTwtNm_wgbMU1zZN-Fbmpqzlr9hgWRxnGsj0fZWXfEvckPS6aH5_N4sHpJ6jQFaU-TnRXGnXF0nIIaV2EITyOFepJ60ZvSch8rlJCVhJLvNloEBHQKWBPq5QLPC0f4Ta62wNxwibO7WOKANQoWmsA6khFTJxQ0g50z47QR0b9fKlsspY1gGf_r18hXYHs_FoORpOHs_Qns1CRe86R-18U8CFqfd5clmm-Qs8j6JM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2020+6th+International+Conference+on+Advanced+Computing+and+Communication+Systems+%28ICACCS%29&rft.atitle=A+Survey+of+Current+Datasets+for+Code-Switching+Research&rft.au=Jose%2C+Navya&rft.au=Chakravarthi%2C+Bharathi+Raja&rft.au=Suryawanshi%2C+Shardul&rft.au=Sherly%2C+Elizabeth&rft.date=2020-03-01&rft.pub=IEEE&rft.isbn=1728151961&rft.eissn=2575-7288&rft.spage=136&rft.epage=141&rft_id=info:doi/10.1109%2FICACCS48705.2020.9074205&rft.externalDocID=9074205
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781728151960/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781728151960/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781728151960/sc.gif&client=summon&freeimage=true