OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5...

Full description

Saved in:
Bibliographic Details
Main Authors Huang, Zhen, Wang, Zengzhi, Xia, Shijie, Liu, Pengfei
Format Journal Article
LanguageEnglish
Published 24.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).
AbstractList In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).
Author Wang, Zengzhi
Xia, Shijie
Huang, Zhen
Liu, Pengfei
Author_xml – sequence: 1
  givenname: Zhen
  surname: Huang
  fullname: Huang, Zhen
– sequence: 2
  givenname: Zengzhi
  surname: Wang
  fullname: Wang, Zengzhi
– sequence: 3
  givenname: Shijie
  surname: Xia
  fullname: Xia, Shijie
– sequence: 4
  givenname: Pengfei
  surname: Liu
  fullname: Liu, Pengfei
BackLink https://doi.org/10.48550/arXiv.2406.16772$$DView paper in arXiv
BookMark eNotz81Kw0AUhuFZ6EKrF-DKuYHEk_nJJG40FFsDLQUtuAwnMyc2OJ2USRB792p19W1ePngu2VkYAjF2k0GqCq3hDuNX_5kKBXma5caIC_a48cf9obdVpIB8TQ49f8HwMd7zt93A65FPO-LrYZx4HSbyvn-nMPGq5q8DX2B8uGLnHfqRrv93xraLp-38OVltlvW8WiWYG5G0aLURZG0nBGSlKAjanFAa66yykOmyxcK5DjSQUSAcSCEJy5_eKUGtnLHbv9sToTnEfo_x2PxSmhNFfgO8VURp
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2406.16772
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2406_16772
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3
IEDL.DBID GOX
IngestDate Sat Jun 29 03:45:00 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3
OpenAccessLink https://arxiv.org/abs/2406.16772
ParticipantIDs arxiv_primary_2406_16772
PublicationCentury 2000
PublicationDate 2024-06-24
PublicationDateYYYYMMDD 2024-06-24
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-24
  day: 24
PublicationDecade 2020
PublicationYear 2024
Score 1.920449
SecondaryResourceType preprint
Snippet In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level,...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Title OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
URI https://arxiv.org/abs/2406.16772
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3PT8MgGCVzJy9Go2b-DAev1Q4oFC_aGOtqUpfojL0tUGhcnNvSVqP_vR9t_XHxCi8kfCS89_HBA6GTUOohF5Z4PteFxwLjioRKegKol3JfSKPc0UB6x0eP7DYLsh7C329hVPkxe2_9gXV15ujmFMYSsMmuEeKubN2Ms7Y42VhxdfhfHGjMpukPScSbaKNTdzhql2ML9exiG12O55-vq1kelXahcGoNIO7V4qU6x0_PS5xUGEQYTpdVjZMfg8waRwl-WOJYlRc7aBJfT65GXvdxgae4IJ5WeSAIxKYgxCmo0PqaW0VFbnKWA_9KrUJjCj_wrYD8zfiUUKsk4A0jkN3uoj7k_naAMA1CUQgphR5SpoHOGWfKckqJzIVldA8NmulOV603xdRFYtpEYv__rgO0ToCb3Y0nwg5Rvy7f7BFwa62PmwB_AeZ5dsc
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OlympicArena+Medal+Ranks%3A+Who+Is+the+Most+Intelligent+AI+So+Far%3F&rft.au=Huang%2C+Zhen&rft.au=Wang%2C+Zengzhi&rft.au=Xia%2C+Shijie&rft.au=Liu%2C+Pengfei&rft.date=2024-06-24&rft_id=info:doi/10.48550%2Farxiv.2406.16772&rft.externalDocID=2406_16772