INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multi...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
29.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed. |
---|---|
AbstractList | The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed. |
Author | Daniel Fernando Erazo Florez Dalmia, Aditya Kumar Snegha, A Duwal, Sharad Diress, Abraham Yilmaz, Serhan Chim, Jenny Boiko, Danylo Chen, Zeming Nelaturu, Sree Harsha Chang, Michael Purbey, Jebish Sharma, Drishti Koto, Fajri Selvan Sunitha Ravi Karlsson, Börje F Altomare, Micol Hooker, Sara Shayekh Bin Islam de Melo, Gabriel Adriano Weerasinghe, Thenuka Ovin Klamm, Christopher Rajwal, Swati Tarun, Ayush Kumar Cohen, Gal Haggag, Mohamed A Amayuelas, Alfonso Montariol, Syrielle Ploeger, Esther Farestam, Fabian Fadaee, Marzieh Schlag, Imanol Maheshwary, Rishabh Singh, Shivalika Niklaus, Joel Sotnikova, Anna Imperial, Joseph Marvin Tamir, Ran Novikova, Jekaterina Arshia Soltani Moakhar Zhang, Mike Dzenhaliou, Daniil Bardia Soltani Moakhar Jabbarishiviari, Maral Bosselut, Antoine Khalilov, Eldar Skenduli, Marjana Prifti Krzemiński, Dominik Azril Hafizi Amirudin Isotalo, Perttu Roshan Santhosh Wasi, Azmine Toushik Romanou, Angelika Rydell, Sara outan, Negar Yiyang Nan Johan Samir Obando Ceron Debjit, Paul Aryabumi, Viraat |
Author_xml | – sequence: 1 givenname: Angelika surname: Romanou fullname: Romanou, Angelika – sequence: 2 givenname: Negar surname: outan fullname: outan, Negar – sequence: 3 givenname: Anna surname: Sotnikova fullname: Sotnikova, Anna – sequence: 4 givenname: Zeming surname: Chen fullname: Chen, Zeming – sequence: 5 givenname: Sree surname: Nelaturu middlename: Harsha fullname: Nelaturu, Sree Harsha – sequence: 6 givenname: Shivalika surname: Singh fullname: Singh, Shivalika – sequence: 7 givenname: Rishabh surname: Maheshwary fullname: Maheshwary, Rishabh – sequence: 8 givenname: Micol surname: Altomare fullname: Altomare, Micol – sequence: 9 givenname: Mohamed surname: Haggag middlename: A fullname: Haggag, Mohamed A – sequence: 10 givenname: A surname: Snegha fullname: Snegha, A – sequence: 11 givenname: Alfonso surname: Amayuelas fullname: Amayuelas, Alfonso – sequence: 12 fullname: Azril Hafizi Amirudin – sequence: 13 givenname: Viraat surname: Aryabumi fullname: Aryabumi, Viraat – sequence: 14 givenname: Danylo surname: Boiko fullname: Boiko, Danylo – sequence: 15 givenname: Michael surname: Chang fullname: Chang, Michael – sequence: 16 givenname: Jenny surname: Chim fullname: Chim, Jenny – sequence: 17 givenname: Gal surname: Cohen fullname: Cohen, Gal – sequence: 18 givenname: Aditya surname: Dalmia middlename: Kumar fullname: Dalmia, Aditya Kumar – sequence: 19 givenname: Abraham surname: Diress fullname: Diress, Abraham – sequence: 20 givenname: Sharad surname: Duwal fullname: Duwal, Sharad – sequence: 21 givenname: Daniil surname: Dzenhaliou fullname: Dzenhaliou, Daniil – sequence: 22 fullname: Daniel Fernando Erazo Florez – sequence: 23 givenname: Fabian surname: Farestam fullname: Farestam, Fabian – sequence: 24 givenname: Joseph surname: Imperial middlename: Marvin fullname: Imperial, Joseph Marvin – sequence: 25 fullname: Shayekh Bin Islam – sequence: 26 givenname: Perttu surname: Isotalo fullname: Isotalo, Perttu – sequence: 27 givenname: Maral surname: Jabbarishiviari fullname: Jabbarishiviari, Maral – sequence: 28 givenname: Börje surname: Karlsson middlename: F fullname: Karlsson, Börje F – sequence: 29 givenname: Eldar surname: Khalilov fullname: Khalilov, Eldar – sequence: 30 givenname: Christopher surname: Klamm fullname: Klamm, Christopher – sequence: 31 givenname: Fajri surname: Koto fullname: Koto, Fajri – sequence: 32 givenname: Dominik surname: Krzemiński fullname: Krzemiński, Dominik – sequence: 33 givenname: Gabriel surname: de Melo middlename: Adriano fullname: de Melo, Gabriel Adriano – sequence: 34 givenname: Syrielle surname: Montariol fullname: Montariol, Syrielle – sequence: 35 fullname: Yiyang Nan – sequence: 36 givenname: Joel surname: Niklaus fullname: Niklaus, Joel – sequence: 37 givenname: Jekaterina surname: Novikova fullname: Novikova, Jekaterina – sequence: 38 fullname: Johan Samir Obando Ceron – sequence: 39 givenname: Paul surname: Debjit fullname: Debjit, Paul – sequence: 40 givenname: Esther surname: Ploeger fullname: Ploeger, Esther – sequence: 41 givenname: Jebish surname: Purbey fullname: Purbey, Jebish – sequence: 42 givenname: Swati surname: Rajwal fullname: Rajwal, Swati – sequence: 43 fullname: Selvan Sunitha Ravi – sequence: 44 givenname: Sara surname: Rydell fullname: Rydell, Sara – sequence: 45 fullname: Roshan Santhosh – sequence: 46 givenname: Drishti surname: Sharma fullname: Sharma, Drishti – sequence: 47 givenname: Marjana surname: Skenduli middlename: Prifti fullname: Skenduli, Marjana Prifti – sequence: 48 fullname: Arshia Soltani Moakhar – sequence: 49 fullname: Bardia Soltani Moakhar – sequence: 50 givenname: Ran surname: Tamir fullname: Tamir, Ran – sequence: 51 givenname: Ayush surname: Tarun middlename: Kumar fullname: Tarun, Ayush Kumar – sequence: 52 givenname: Azmine surname: Wasi middlename: Toushik fullname: Wasi, Azmine Toushik – sequence: 53 givenname: Thenuka surname: Weerasinghe middlename: Ovin fullname: Weerasinghe, Thenuka Ovin – sequence: 54 givenname: Serhan surname: Yilmaz fullname: Yilmaz, Serhan – sequence: 55 givenname: Mike surname: Zhang fullname: Zhang, Mike – sequence: 56 givenname: Imanol surname: Schlag fullname: Schlag, Imanol – sequence: 57 givenname: Marzieh surname: Fadaee fullname: Fadaee, Marzieh – sequence: 58 givenname: Sara surname: Hooker fullname: Hooker, Sara – sequence: 59 givenname: Antoine surname: Bosselut fullname: Bosselut, Antoine |
BookMark | eNqNy9EKgjAYhuERBVl5D4OOhblpZadmFFlB5LEM_FvK2MptefspdAEdvd_B883QWGkFI-RRxsJgE1E6Rb4xDSGErtY0jpmHrsdLmhe7bIuzD5eO21oJfHbS1rJfjkuc86ECcKEqaI3lqhpMV9snvoGoterRSelOQiVggSYPLg34v87Rcp_d00PwavXbgbFlo13bX0zJQhYlCSU0ZP-pLz0ZQAI |
ContentType | Paper |
Copyright | 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_31349920213 |
IEDL.DBID | 8FG |
IngestDate | Thu Dec 05 10:27:51 EST 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_31349920213 |
OpenAccessLink | https://www.proquest.com/docview/3134992021?pq-origsite=%requestingapplication% |
PQID | 3134992021 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_3134992021 |
PublicationCentury | 2000 |
PublicationDate | 20241129 |
PublicationDateYYYYMMDD | 2024-11-29 |
PublicationDate_xml | – month: 11 year: 2024 text: 20241129 day: 29 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2024 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – sequence: 0 name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.5795262 |
SecondaryResourceType | preprint |
Snippet | The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Benchmarks English language Generative artificial intelligence Large language models Non-English languages Performance evaluation Quality assessment Regional development |
Title | INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge |
URI | https://www.proquest.com/docview/3134992021 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED50RfDNn_hjjoC-BtemzTpfBLV16DbHsLC3kbbJXsac6_bq3-5dSFUQ9hRCICQhfJf7ct8dwE2IJlGGRcRjFRgeysggDmrFRaRVTuqZUpJ2eDCUvSx8mUQTR7hVLqyyxkQL1OVHQRz5raA8el101f375SenqlH0u-pKaOyC5wcdSSF9cfr8w7EEsoMvZvEPZq3tSA_AG6mlXh3Cjl4cwZ4NuSyqY3hDf7qfPSV3LHEZtxczZvWwpBDfqDnrOy6RZX8VKIyoUzbWM0visdeaFDuB6zR5f-zxehVTd0-q6e-uxCk00OHXZ8CELk1M-TnbhUbvwMR5W-MrKfCN8kVk4nNobpvpYvvwJexjE5KeLug2obFebfQVGtZ13rKn1wLvIRmOxtgbfCXfvsmC-A |
link.rule.ids | 780,784,12765,21388,33373,33744,43600,43805 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1bS8MwFD5oi-ibV7xMDehrcG2a2vkiqB3VdXWMFfZW0jbZi8y5bv_fnJCqIOw1gZCEcC5fzvcdgNtAu8QwqDiNhK9oEHKl7aAUlHEpSmTP1CFyh4dZmOTB25RPLeDW2LLK1iYaQ11_VoiR3zHU0evpVN17XHxR7BqFv6u2hcY2uKiczh1wn-JsNP5BWfzwXsfM7J-hNd6jvw_uSCzk8gC25PwQdkzRZdUcwbvOqNP8JX4gsdXcns-IYcQiR3wtPkhq0USS_-WgEARPyVjODIxHBi0sdgw3_XjynNB2F4V9KU3xey52Ao5O-eUpECZrFaFCZ7eSOj9QUdmVOk7yPSU8xlV0Bp1NK51vnr6G3WQyTIv0NRtcwJ4eCpBd5_c64KyWa3mp3eyqvLJ3-Q0IW4R- |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=INCLUDE%3A+Evaluating+Multilingual+Language+Understanding+with+Regional+Knowledge&rft.jtitle=arXiv.org&rft.au=Romanou%2C+Angelika&rft.au=outan%2C+Negar&rft.au=Sotnikova%2C+Anna&rft.au=Chen%2C+Zeming&rft.date=2024-11-29&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |