OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this report, we pose the following question: Who is the most intelligent
AI model to date, as measured by the OlympicArena (an Olympic-level,
multi-discipline, multi-modal benchmark for superintelligent AI)? We
specifically focus on the most recently released models: Claude-3.5-Sonnet,
Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic
medal Table approach to rank AI models based on their comprehensive performance
across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet
shows highly competitive overall performance over GPT-4o, even surpassing
GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2)
Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and
Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The
performance of AI models from the open-source community significantly lags
behind these proprietary models. (4) The performance of these models on this
benchmark has been less than satisfactory, indicating that we still have a long
way to go before achieving superintelligence. We remain committed to
continuously tracking and evaluating the performance of the latest powerful
models on this benchmark (available at
https://github.com/GAIR-NLP/OlympicArena). |
---|---|
AbstractList | In this report, we pose the following question: Who is the most intelligent
AI model to date, as measured by the OlympicArena (an Olympic-level,
multi-discipline, multi-modal benchmark for superintelligent AI)? We
specifically focus on the most recently released models: Claude-3.5-Sonnet,
Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic
medal Table approach to rank AI models based on their comprehensive performance
across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet
shows highly competitive overall performance over GPT-4o, even surpassing
GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2)
Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and
Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The
performance of AI models from the open-source community significantly lags
behind these proprietary models. (4) The performance of these models on this
benchmark has been less than satisfactory, indicating that we still have a long
way to go before achieving superintelligence. We remain committed to
continuously tracking and evaluating the performance of the latest powerful
models on this benchmark (available at
https://github.com/GAIR-NLP/OlympicArena). |
Author | Wang, Zengzhi Xia, Shijie Huang, Zhen Liu, Pengfei |
Author_xml | – sequence: 1 givenname: Zhen surname: Huang fullname: Huang, Zhen – sequence: 2 givenname: Zengzhi surname: Wang fullname: Wang, Zengzhi – sequence: 3 givenname: Shijie surname: Xia fullname: Xia, Shijie – sequence: 4 givenname: Pengfei surname: Liu fullname: Liu, Pengfei |
BackLink | https://doi.org/10.48550/arXiv.2406.16772$$DView paper in arXiv |
BookMark | eNotz81Kw0AUhuFZ6EKrF-DKuYHEk_nJJG40FFsDLQUtuAwnMyc2OJ2USRB792p19W1ePngu2VkYAjF2k0GqCq3hDuNX_5kKBXma5caIC_a48cf9obdVpIB8TQ49f8HwMd7zt93A65FPO-LrYZx4HSbyvn-nMPGq5q8DX2B8uGLnHfqRrv93xraLp-38OVltlvW8WiWYG5G0aLURZG0nBGSlKAjanFAa66yykOmyxcK5DjSQUSAcSCEJy5_eKUGtnLHbv9sToTnEfo_x2PxSmhNFfgO8VURp |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2406.16772 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2406_16772 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3 |
IEDL.DBID | GOX |
IngestDate | Sat Jun 29 03:45:00 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3 |
OpenAccessLink | https://arxiv.org/abs/2406.16772 |
ParticipantIDs | arxiv_primary_2406_16772 |
PublicationCentury | 2000 |
PublicationDate | 2024-06-24 |
PublicationDateYYYYMMDD | 2024-06-24 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-06-24 day: 24 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 1.920449 |
SecondaryResourceType | preprint |
Snippet | In this report, we pose the following question: Who is the most intelligent
AI model to date, as measured by the OlympicArena (an Olympic-level,... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
Title | OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far? |
URI | https://arxiv.org/abs/2406.16772 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3PT8MgGCVzJy9Go2b-DAev1Q4oFC_aGOtqUpfojL0tUGhcnNvSVqP_vR9t_XHxCi8kfCS89_HBA6GTUOohF5Z4PteFxwLjioRKegKol3JfSKPc0UB6x0eP7DYLsh7C329hVPkxe2_9gXV15ujmFMYSsMmuEeKubN2Ms7Y42VhxdfhfHGjMpukPScSbaKNTdzhql2ML9exiG12O55-vq1kelXahcGoNIO7V4qU6x0_PS5xUGEQYTpdVjZMfg8waRwl-WOJYlRc7aBJfT65GXvdxgae4IJ5WeSAIxKYgxCmo0PqaW0VFbnKWA_9KrUJjCj_wrYD8zfiUUKsk4A0jkN3uoj7k_naAMA1CUQgphR5SpoHOGWfKckqJzIVldA8NmulOV603xdRFYtpEYv__rgO0ToCb3Y0nwg5Rvy7f7BFwa62PmwB_AeZ5dsc |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OlympicArena+Medal+Ranks%3A+Who+Is+the+Most+Intelligent+AI+So+Far%3F&rft.au=Huang%2C+Zhen&rft.au=Wang%2C+Zengzhi&rft.au=Xia%2C+Shijie&rft.au=Liu%2C+Pengfei&rft.date=2024-06-24&rft_id=info:doi/10.48550%2Farxiv.2406.16772&rft.externalDocID=2406_16772 |