MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnit...

Full description

Saved in:

Bibliographic Details
Published in	2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) pp. 446 - 459
Main Authors	Reddi, Vijay Janapa, Cheng, Christine, Kanter, David, Mattson, Peter, Schmuelling, Guenther, Wu, Carole-Jean, Anderson, Brian, Breughe, Maximilien, Charlebois, Mark, Chou, William, Chukka, Ramesh, Coleman, Cody, Davis, Sam, Deng, Pan, Diamos, Greg, Duke, Jared, Fick, Dave, Gardner, J. Scott, Hubara, Itay, Idgunji, Sachin, Jablin, Thomas B., Jiao, Jeff, John, Tom St, Kanwar, Pankaj, Lee, David, Liao, Jeffery, Lokhmotov, Anton, Massa, Francisco, Meng, Peng, Micikevicius, Paulius, Osborne, Colin, Pekhimenko, Gennady, Rajan, Arun Tejusve Raghunath, Sequeira, Dilip, Sirasao, Ashish, Sun, Fei, Tang, Hanlin, Thomson, Michael, Wei, Frank, Wu, Ephrem, Xu, Lingjie, Yamada, Koichi, Yu, Bing, Yuan, George, Zhong, Aaron, Zhang, Peizhao, Zhou, Yuchen
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Accuracy Benchmark testing Benchmarking Computer architecture Degradation Hardware Inference Machine Learning Optimization Organizations Servers Standards organizations Throughput
Online Access	Get full text
DOI	10.1109/ISCA45697.2020.00045

Cover

Loading…

Abstract	Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.
AbstractList	Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.
Author	Massa, Francisco Chukka, Ramesh Kanwar, Pankaj Thomson, Michael Wu, Carole-Jean Lokhmotov, Anton Zhou, Yuchen Lee, David Kanter, David Anderson, Brian Meng, Peng Micikevicius, Paulius Chou, William Liao, Jeffery Pekhimenko, Gennady Sirasao, Ashish Duke, Jared Gardner, J. Scott Hubara, Itay Zhang, Peizhao Tang, Hanlin Deng, Pan Breughe, Maximilien Sun, Fei Yamada, Koichi Jablin, Thomas B. Cheng, Christine Xu, Lingjie Fick, Dave Osborne, Colin Zhong, Aaron Schmuelling, Guenther Yu, Bing Davis, Sam Charlebois, Mark Coleman, Cody Diamos, Greg Idgunji, Sachin Wei, Frank John, Tom St Mattson, Peter Jiao, Jeff Rajan, Arun Tejusve Raghunath Yuan, George Reddi, Vijay Janapa Sequeira, Dilip Wu, Ephrem
Author_xml	– sequence: 1 givenname: Vijay Janapa surname: Reddi fullname: Reddi, Vijay Janapa organization: Harvard University – sequence: 2 givenname: Christine surname: Cheng fullname: Cheng, Christine organization: Intel – sequence: 3 givenname: David surname: Kanter fullname: Kanter, David organization: Real World Insights – sequence: 4 givenname: Peter surname: Mattson fullname: Mattson, Peter organization: Google – sequence: 5 givenname: Guenther surname: Schmuelling fullname: Schmuelling, Guenther organization: Microsoft – sequence: 6 givenname: Carole-Jean surname: Wu fullname: Wu, Carole-Jean organization: Facebook – sequence: 7 givenname: Brian surname: Anderson fullname: Anderson, Brian organization: Google – sequence: 8 givenname: Maximilien surname: Breughe fullname: Breughe, Maximilien organization: NVIDIA – sequence: 9 givenname: Mark surname: Charlebois fullname: Charlebois, Mark organization: Qualcomm – sequence: 10 givenname: William surname: Chou fullname: Chou, William organization: Qualcomm – sequence: 11 givenname: Ramesh surname: Chukka fullname: Chukka, Ramesh organization: Intel – sequence: 12 givenname: Cody surname: Coleman fullname: Coleman, Cody organization: Stanford University – sequence: 13 givenname: Sam surname: Davis fullname: Davis, Sam organization: Myrtle – sequence: 14 givenname: Pan surname: Deng fullname: Deng, Pan organization: Tencent – sequence: 15 givenname: Greg surname: Diamos fullname: Diamos, Greg organization: Landing AI – sequence: 16 givenname: Jared surname: Duke fullname: Duke, Jared organization: Google – sequence: 17 givenname: Dave surname: Fick fullname: Fick, Dave organization: Mythic – sequence: 18 givenname: J. Scott surname: Gardner fullname: Gardner, J. Scott organization: Advantage Engineering – sequence: 19 givenname: Itay surname: Hubara fullname: Hubara, Itay organization: Habana Labs – sequence: 20 givenname: Sachin surname: Idgunji fullname: Idgunji, Sachin organization: NVIDIA – sequence: 21 givenname: Thomas B. surname: Jablin fullname: Jablin, Thomas B. organization: Google – sequence: 22 givenname: Jeff surname: Jiao fullname: Jiao, Jeff organization: Alibaba T-Head – sequence: 23 givenname: Tom St surname: John fullname: John, Tom St organization: Tesla – sequence: 24 givenname: Pankaj surname: Kanwar fullname: Kanwar, Pankaj organization: Google – sequence: 25 givenname: David surname: Lee fullname: Lee, David organization: Facebook (formerly at MediaTek) – sequence: 26 givenname: Jeffery surname: Liao fullname: Liao, Jeffery organization: OPPO (formerly at Synopsys) – sequence: 27 givenname: Anton surname: Lokhmotov fullname: Lokhmotov, Anton organization: dividiti – sequence: 28 givenname: Francisco surname: Massa fullname: Massa, Francisco organization: Facebook – sequence: 29 givenname: Peng surname: Meng fullname: Meng, Peng organization: Tencent – sequence: 30 givenname: Paulius surname: Micikevicius fullname: Micikevicius, Paulius organization: NVIDIA – sequence: 31 givenname: Colin surname: Osborne fullname: Osborne, Colin organization: Arm – sequence: 32 givenname: Gennady surname: Pekhimenko fullname: Pekhimenko, Gennady organization: University of Toronto & Vector Institute – sequence: 33 givenname: Arun Tejusve Raghunath surname: Rajan fullname: Rajan, Arun Tejusve Raghunath organization: Intel – sequence: 34 givenname: Dilip surname: Sequeira fullname: Sequeira, Dilip organization: NVIDIA – sequence: 35 givenname: Ashish surname: Sirasao fullname: Sirasao, Ashish organization: Xilinx – sequence: 36 givenname: Fei surname: Sun fullname: Sun, Fei organization: Alibaba (formerly at Facebook) – sequence: 37 givenname: Hanlin surname: Tang fullname: Tang, Hanlin organization: Intel – sequence: 38 givenname: Michael surname: Thomson fullname: Thomson, Michael organization: Centaur Technology – sequence: 39 givenname: Frank surname: Wei fullname: Wei, Frank organization: Alibaba Cloud – sequence: 40 givenname: Ephrem surname: Wu fullname: Wu, Ephrem organization: Xilinx – sequence: 41 givenname: Lingjie surname: Xu fullname: Xu, Lingjie organization: Biren Research (formerly at Alibaba) – sequence: 42 givenname: Koichi surname: Yamada fullname: Yamada, Koichi organization: Intel – sequence: 43 givenname: Bing surname: Yu fullname: Yu, Bing organization: Facebook (formerly at MediaTek) – sequence: 44 givenname: George surname: Yuan fullname: Yuan, George organization: NVIDIA – sequence: 45 givenname: Aaron surname: Zhong fullname: Zhong, Aaron organization: Alibaba T-Head – sequence: 46 givenname: Peizhao surname: Zhang fullname: Zhang, Peizhao organization: Facebook – sequence: 47 givenname: Yuchen surname: Zhou fullname: Zhou, Yuchen organization: General Motors
BookMark	eNotjctOwzAQRY0EEvTxA8CiP5B0xh47nmWJKI2UqpWAdWW7YxEBASVs-HuKYHPP5ujciTrvP3pR6hahRAReNo_1iqzjqtSgoQQAsmdqgpX2SM6hvVTzcewiEFrjNeKVut62exnyoumzDNInWdyd9uU9DK8zdZHD2yjzf07V8_r-qd4U7e6hqVdtEbS3XwVCDDajVAmzHIE5OIFM7FOMyWeO7I4sjiWzo1-VwEbyJroUBIjMVN38dTsROXwO3en8-8BoPHs2P2D-PHE
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ISCA45697.2020.00045
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1728146615 9781728146614
EndPage	459
ExternalDocumentID	9138989
Genre	orig-research
GroupedDBID	6IE 6IH ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIO
ID	FETCH-LOGICAL-a285t-10ba5f1e7c1fed099a6e0f498cbbc8f9b96d9e69ef964ba5f405b483b6cae0443
IEDL.DBID	RIE
IngestDate	Wed Aug 06 17:54:09 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a285t-10ba5f1e7c1fed099a6e0f498cbbc8f9b96d9e69ef964ba5f405b483b6cae0443
PageCount	14
ParticipantIDs	ieee_primary_9138989
PublicationCentury	2000
PublicationDate	2020-May
PublicationDateYYYYMMDD	2020-05-01
PublicationDate_xml	– month: 05 year: 2020 text: 2020-May
PublicationDecade	2020
PublicationTitle	2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev	ISCA
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib041538211
Score	2.6062334
Snippet	Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded....
SourceID	ieee
SourceType	Publisher
StartPage	446
SubjectTerms	Accuracy Benchmark testing Benchmarking Computer architecture Degradation Hardware Inference Machine Learning Optimization Organizations Servers Standards organizations Throughput
Title	MLPerf Inference Benchmark
URI	https://ieeexplore.ieee.org/document/9138989
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR3LTgIxcAKcPKkBo_gIB48u7C7dbueoRAJGDImScCOd7jQmRDCb5cLX2-4CRuPBS9M007TT18x0XgC3MZIxJqQgIxSBSIgCcmQviJLMMQvaCUXa_0NOXuRoJp7mybwGdwdfGGYujc-466ulLj9bm43_Kuuh16oprEPdCW6Vr9b-7Ah_c50ws_OOi0LsjV8H9449wNRJgXFYhuVMfuRQKUnI8Bgm-8Ery5Fld1NQ12x_xWX87-xOoPXtrNeZHsjQKdR41YT25HnKue2MDyAPrnz_0PmyBbPh49tgFOzSIAQ6VknhHkrSiY04NZHlzHF0WnJoBSpDZJRFQpkhS2SLUnhQx4ORUH2SRnMoRP8MGqv1is-hk0WKrY0jK_skpLIUpowpKpsaw67HBTQ9XovPKtLFYodS--_mSzjyK1uZ_11Bo8g3fO1IdEE35d58AcsSkNw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSkBozii4NHF7ZL222PSjSgLCEREm5kpzuNCREMWS7-ettdwGg8eGmapk3f_WY6L4DbSKMxJsQgQ80DLhADdLAXMJE5YiF1TFHq_yGToexN-PNUTCtwt7OFIaJC-YxaPlvI8rOlWfuvsrb2UjWl92Df4b5gpbXW9vRwf3cdO7Oxj2Ohbvdfu_eOQNCx4wOjsHDMKX5EUSlA5OkIkm33pe7IvLXOsWU-f3lm_O_4jqH-ba7XHO2A6AQqtKhBIxmMaGWb_V2VB5e-vaereR0mT4_jbi_YBEII0kiJ3D2VmArLKDbMUuZoulRSaLlWBtEoq1HLTJPUZLXkvqqjwpCrDkqTUsh55xSqi-WCzqCZMUXWRszKDnKpLIYx6VgrGxtDrsU51Py8Zh-lr4vZZkqNv4tv4KA3TgazQX_4cgGHfpVLZcBLqOarNV05wM7xutinL5CwlCU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+ACM%2FIEEE+47th+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=MLPerf+Inference+Benchmark&rft.au=Reddi%2C+Vijay+Janapa&rft.au=Cheng%2C+Christine&rft.au=Kanter%2C+David&rft.au=Mattson%2C+Peter&rft.date=2020-05-01&rft.pub=IEEE&rft.spage=446&rft.epage=459&rft_id=info:doi/10.1109%2FISCA45697.2020.00045&rft.externalDocID=9138989