The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Penedo, Guilherme, Kydlíček, Hynek, allal, Loubna Ben, Lozhkov, Anton, Mitchell, Margaret, Raffel, Colin, Von Werra, Leandro, Wolf, Thomas
Year of Publication 25.06.2024
Year of Publication 25.06.2024
Get full text
Journal Article
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Cassano, Federico, Li, Luisa, Sethi, Akul, Shinn, Noah, Brennan-Jones, Abby, Ginesin, Jacob, Berman, Edward, Chakhnashvili, George, Lozhkov, Anton, Anderson, Carolyn Jane, Guha, Arjun
Year of Publication 10.12.2023
Year of Publication 10.12.2023
Get full text
Journal Article
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Laurençon, Hugo, Saulnier, Lucile, Tronchon, Léo, Bekman, Stas, Singh, Amanpreet, Lozhkov, Anton, Wang, Thomas, Karamcheti, Siddharth, Rush, Alexander M, Kiela, Douwe, Cord, Matthieu, Sanh, Victor
Year of Publication 21.06.2023
Year of Publication 21.06.2023
Get full text
Journal Article
XTREME-S: Evaluating Cross-lingual Speech Representations
Conneau, Alexis, Bapna, Ankur, Zhang, Yu, Ma, Min, von Platen, Patrick, Lozhkov, Anton, Cherry, Colin, Jia, Ye, Rivera, Clara, Kale, Mihir, Van Esch, Daan, Axelrod, Vera, Khanuja, Simran, Clark, Jonathan H, Firat, Orhan, Auli, Michael, Ruder, Sebastian, Riesa, Jason, Johnson, Melvin
Year of Publication 21.03.2022
Year of Publication 21.03.2022
Get full text
Journal Article
StarCoder 2 and The Stack v2: The Next Generation
Lozhkov, Anton, Li, Raymond, Allal, Loubna Ben, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Belkada, Younes, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Paul, Indraneil, Li, Zhuang, Li, Wen-Ding, Risdal, Megan, Li, Jia, Zhu, Jian, Zhuo, Terry Yue, Zheltonozhskii, Evgenii, Dade, Nii Osae Osae, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Hui, Binyuan, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Ferrandis, Carlos Muñoz, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, de Vries, Harm
Year of Publication 29.02.2024
Year of Publication 29.02.2024
Get full text
Journal Article
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Cassano, Federico, Li, Luisa, Sethi, Akul, Shinn, Noah, Brennan-Jones, Abby, Ginesin, Jacob, Berman, Edward, Chakhnashvili, George, Lozhkov, Anton, Anderson, Carolyn Jane, Guha, Arjun
Published in arXiv.org (20.03.2024)
Get full text
Published in arXiv.org (20.03.2024)
Paper
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Laurençon, Hugo, Saulnier, Lucile, Tronchon, Léo, Bekman, Stas, Singh, Amanpreet, Lozhkov, Anton, Wang, Thomas, Karamcheti, Siddharth, Rush, Alexander M, Kiela, Douwe, Cord, Matthieu, Sanh, Victor
Published in arXiv.org (21.08.2023)
Get full text
Published in arXiv.org (21.08.2023)
Paper
XTREME-S: Evaluating Cross-lingual Speech Representations
Conneau, Alexis, Bapna, Ankur, Zhang, Yu, Ma, Min, Patrick von Platen, Lozhkov, Anton, Cherry, Colin, Ye Jia, Rivera, Clara, Kale, Mihir, Daan Van Esch, Axelrod, Vera, Khanuja, Simran, Clark, Jonathan H, Firat, Orhan, Auli, Michael, Ruder, Sebastian, Riesa, Jason, Johnson, Melvin
Published in arXiv.org (13.04.2022)
Get full text
Published in arXiv.org (13.04.2022)
Paper
StarCoder 2 and The Stack v2: The Next Generation
Lozhkov, Anton, Li, Raymond, Loubna Ben Allal, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Younes Belkada, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Indraneil, Paul, Zhuang, Li, Wen-Ding, Li, Risdal, Megan, Li, Jia, Zhu, Jian, Terry Yue Zhuo, Zheltonozhskii, Evgenii, Nii Osae Osae Dade, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Binyuan Hui, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Carlos Muñoz Ferrandis, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, Leandro von Werra, de Vries, Harm
Published in arXiv.org (29.02.2024)
Get full text
Published in arXiv.org (29.02.2024)
Paper