OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5...

Full description

Saved in:

Bibliographic Details
Main Authors	Huang, Zhen, Wang, Zengzhi, Xia, Shijie, Liu, Pengfei
Format	Journal Article
Language	English
Published	24.06.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

Abstract	In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).
AbstractList	In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level, multi-discipline, multi-modal benchmark for superintelligent AI)? We specifically focus on the most recently released models: Claude-3.5-Sonnet, Gemini-1.5-Pro, and GPT-4o. For the first time, we propose using an Olympic medal Table approach to rank AI models based on their comprehensive performance across various disciplines. Empirical results reveal: (1) Claude-3.5-Sonnet shows highly competitive overall performance over GPT-4o, even surpassing GPT-4o on a few subjects (i.e., Physics, Chemistry, and Biology). (2) Gemini-1.5-Pro and GPT-4V are ranked consecutively just behind GPT-4o and Claude-3.5-Sonnet, but with a clear performance gap between them. (3) The performance of AI models from the open-source community significantly lags behind these proprietary models. (4) The performance of these models on this benchmark has been less than satisfactory, indicating that we still have a long way to go before achieving superintelligence. We remain committed to continuously tracking and evaluating the performance of the latest powerful models on this benchmark (available at https://github.com/GAIR-NLP/OlympicArena).
Author	Wang, Zengzhi Xia, Shijie Huang, Zhen Liu, Pengfei
Author_xml	– sequence: 1 givenname: Zhen surname: Huang fullname: Huang, Zhen – sequence: 2 givenname: Zengzhi surname: Wang fullname: Wang, Zengzhi – sequence: 3 givenname: Shijie surname: Xia fullname: Xia, Shijie – sequence: 4 givenname: Pengfei surname: Liu fullname: Liu, Pengfei
BackLink	https://doi.org/10.48550/arXiv.2406.16772$$DView paper in arXiv
BookMark	eNotz81Kw0AUhuFZ6EKrF-DKuYHEk_nJJG40FFsDLQUtuAwnMyc2OJ2USRB792p19W1ePngu2VkYAjF2k0GqCq3hDuNX_5kKBXma5caIC_a48cf9obdVpIB8TQ49f8HwMd7zt93A65FPO-LrYZx4HSbyvn-nMPGq5q8DX2B8uGLnHfqRrv93xraLp-38OVltlvW8WiWYG5G0aLURZG0nBGSlKAjanFAa66yykOmyxcK5DjSQUSAcSCEJy5_eKUGtnLHbv9sToTnEfo_x2PxSmhNFfgO8VURp
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2406.16772
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2406_16772
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3
IEDL.DBID	GOX
IngestDate	Sat Jun 29 03:45:00 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a672-bac572eccf2201928e0b6ea37cdc4c0159ba8ddf050e7402d0323ea9cf2d42eb3
OpenAccessLink	https://arxiv.org/abs/2406.16772
ParticipantIDs	arxiv_primary_2406_16772
PublicationCentury	2000
PublicationDate	2024-06-24
PublicationDateYYYYMMDD	2024-06-24
PublicationDate_xml	– month: 06 year: 2024 text: 2024-06-24 day: 24
PublicationDecade	2020
PublicationYear	2024
Score	1.920449
SecondaryResourceType	preprint
Snippet	In this report, we pose the following question: Who is the most intelligent AI model to date, as measured by the OlympicArena (an Olympic-level,...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Title	OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
URI	https://arxiv.org/abs/2406.16772
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV3PT8MgGCVzJy9Go2b-DAev1Q4oFC_aGOtqUpfojL0tUGhcnNvSVqP_vR9t_XHxCi8kfCS89_HBA6GTUOohF5Z4PteFxwLjioRKegKol3JfSKPc0UB6x0eP7DYLsh7C329hVPkxe2_9gXV15ujmFMYSsMmuEeKubN2Ms7Y42VhxdfhfHGjMpukPScSbaKNTdzhql2ML9exiG12O55-vq1kelXahcGoNIO7V4qU6x0_PS5xUGEQYTpdVjZMfg8waRwl-WOJYlRc7aBJfT65GXvdxgae4IJ5WeSAIxKYgxCmo0PqaW0VFbnKWA_9KrUJjCj_wrYD8zfiUUKsk4A0jkN3uoj7k_naAMA1CUQgphR5SpoHOGWfKckqJzIVldA8NmulOV603xdRFYtpEYv__rgO0ToCb3Y0nwg5Rvy7f7BFwa62PmwB_AeZ5dsc
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=OlympicArena+Medal+Ranks%3A+Who+Is+the+Most+Intelligent+AI+So+Far%3F&rft.au=Huang%2C+Zhen&rft.au=Wang%2C+Zengzhi&rft.au=Xia%2C+Shijie&rft.au=Liu%2C+Pengfei&rft.date=2024-06-24&rft_id=info:doi/10.48550%2Farxiv.2406.16772&rft.externalDocID=2406_16772