Data Governance in the Age of Large-Scale Data-Driven Language Technology
Jernite, Yacine, Nguyen, Huu, Biderman, Stella, Rogers, Anna, Maraim Masoud, Danchev, Valentin, Tan, Samson, Luccioni, Alexandra Sasha, Subramani, Nishant, Dupont, Gérard, Dodge, Jesse, Lo, Kyle, Zeerak Talat, Johnson, Isaac, Radev, Dragomir, Nikpoor, Somaieh, Frohberg, Jörg, Gokaslan, Aaron, Henderson, Peter, Bommasani, Rishi, Mitchell, Margaret
Published in arXiv.org (02.11.2022)
Published in arXiv.org (02.11.2022)
Get full text
Paper
Journal Article
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Laurençon, Hugo, Saulnier, Lucile, Wang, Thomas, Akiki, Christopher, del Moral, Albert Villanova, Scao, Teven Le, Von Werra, Leandro, Mou, Chenghao, Ponferrada, Eduardo González, Nguyen, Huu, Frohberg, Jörg, Šaško, Mario, Lhoest, Quentin, McMillan-Major, Angelina, Dupont, Gerard, Biderman, Stella, Rogers, Anna, allal, Loubna Ben, De Toni, Francesco, Pistilli, Giada, Nguyen, Olivier, Nikpoor, Somaieh, Masoud, Maraim, Colombo, Pierre, de la Rosa, Javier, Villegas, Paulo, Thrush, Tristan, Longpre, Shayne, Nagel, Sebastian, Weber, Leon, Muñoz, Manuel, Zhu, Jian, Van Strien, Daniel, Alyafeai, Zaid, Almubarak, Khalid, Vu, Minh Chien, Gonzalez-Dios, Itziar, Soroa, Aitor, Lo, Kyle, Dey, Manan, Suarez, Pedro Ortiz, Gokaslan, Aaron, Bose, Shamik, Adelani, David, Phan, Long, Tran, Hieu, Yu, Ian, Pai, Suhas, Chim, Jenny, Lepercq, Violette, Ilic, Suzana, Mitchell, Margaret, Luccioni, Sasha Alexandra, Jernite, Yacine
Year of Publication 07.03.2023
Year of Publication 07.03.2023
Get full text
Journal Article
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Laurençon, Hugo, Saulnier, Lucile, Wang, Thomas, Akiki, Christopher, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Mou, Chenghao, Eduardo González Ponferrada, Nguyen, Huu, Frohberg, Jörg, Šaško, Mario, Lhoest, Quentin, McMillan-Major, Angelina, Dupont, Gerard, Biderman, Stella, Rogers, Anna, Loubna Ben allal, De Toni, Francesco, Pistilli, Giada, Nguyen, Olivier, Nikpoor, Somaieh, Maraim Masoud, Colombo, Pierre, de la Rosa, Javier, Villegas, Paulo, Thrush, Tristan, Longpre, Shayne, Nagel, Sebastian, Weber, Leon, Muñoz, Manuel, Zhu, Jian, Daniel Van Strien, Alyafeai, Zaid, Almubarak, Khalid, Minh Chien Vu, Gonzalez-Dios, Itziar, Soroa, Aitor, Lo, Kyle, Dey, Manan, Pedro Ortiz Suarez, Gokaslan, Aaron, Bose, Shamik, Adelani, David, Long, Phan, Tran, Hieu, Yu, Ian, Pai, Suhas, Chim, Jenny, Lepercq, Violette, Ilic, Suzana, Mitchell, Margaret, Luccioni, Sasha Alexandra, Jernite, Yacine
Published in arXiv.org (07.03.2023)
Get full text
Published in arXiv.org (07.03.2023)
Paper