High Performance and Portable Convolution Operators for Multicore Processors

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (Symposium on Computer Architecture and High Performance Computing) pp. 91 - 98
Main Authors	San Juan, Pablo, Castello, Adrian, Dolz, Manuel F., Alonso-Jorda, Pedro, Quintana-Orti, Enrique S.
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2020
Subjects	Convolutional neural networks high performance multicore processors
Online Access	Get full text
ISSN	2643-3001
DOI	10.1109/SBAC-PAD49847.2020.00023

Cover

Abstract	The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
AbstractList	The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
Author	Dolz, Manuel F. Alonso-Jorda, Pedro Castello, Adrian San Juan, Pablo Quintana-Orti, Enrique S.
Author_xml	– sequence: 1 givenname: Pablo surname: San Juan fullname: San Juan, Pablo organization: Universitat Politècnica de Valéncia – sequence: 2 givenname: Adrian surname: Castello fullname: Castello, Adrian organization: Universitat Jaume 1 – sequence: 3 givenname: Manuel F. surname: Dolz fullname: Dolz, Manuel F. organization: Universitat Jaume 1 – sequence: 4 givenname: Pedro surname: Alonso-Jorda fullname: Alonso-Jorda, Pedro organization: Universitat Politècnica de Valéncia – sequence: 5 givenname: Enrique S. surname: Quintana-Orti fullname: Quintana-Orti, Enrique S. organization: Universitat Politècnica de Valéncia
BookMark	eNotjMtOwzAQAA0CCVr4Ai7-gZT1bmLHxxIeRSpqJOBcOfYGgtK4clIk_p5KcJrDjGYmzoY4sBBSwUIpsLevd8sqq5f3uS1zs0BAWAAA0omYKYOlshZzcyouUeeUEYC6ELNx_AIgjVZfivWq-_iUNac2pp0bPEs3BFnHNLmmZ1nF4Tv2h6mLg9zsObkpplEeW_ly6KfOx8SyTtHzOB7FlThvXT_y9T_n4v3x4a1aZevN03O1XGcdajtlpTfWkGkZlKGgkcrGazRQhFxbaNrgAxk0NhA7Vxo2vrEebVl434LmQHNx8_ftmHm7T93OpZ-tRSqgIPoFQL5RXA
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/SBAC-PAD49847.2020.00023
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1728199247 9781728199245
EISSN	2643-3001
EndPage	98
ExternalDocumentID	9235053
Genre	orig-research
GroupedDBID	23M 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:27:29 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3
PageCount	8
ParticipantIDs	ieee_primary_9235053
PublicationCentury	2000
PublicationDate	2020-09-01
PublicationDateYYYYMMDD	2020-09-01
PublicationDate_xml	– month: 09 year: 2020 text: 2020-09-01 day: 01
PublicationDecade	2020
PublicationTitle	Proceedings (Symposium on Computer Architecture and High Performance Computing)
PublicationTitleAbbrev	SBAC-PAD
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0036296
Score	2.2427852
Snippet	The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance...
SourceID	ieee
SourceType	Publisher
StartPage	91
SubjectTerms	Convolutional neural networks high performance multicore processors
Title	High Performance and Portable Convolution Operators for Multicore Processors
URI	https://ieeexplore.ieee.org/document/9235053
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-AkydUMH6nB48OlrVb946IEmNESZSEG1nbt8RogOjw4F_va7eBMR68NevSLa_t-2h_v_cYu8isBGEyHegoogAF4yiAOBQBaIRcaWWk53GPH5LbqbybxbMGu9xwYRDRg8-w55r-Lt8uzdodlfXJGSGDLZqsScus5GrVWpf0MCQ1UieE_tPVYBhMBtcSSP1SGBg5BFfoihL9KKLibciozcb110voyGtvXeie-fqVmPG_v7fLulu2Hp9s7NAea-Bin7Xrcg282r0ddu8wHXyyZQrwbGG5h5LqN-Q02me1DvnjCv31-wend7kn6bp0l7yiFVBHl01HN8_D26AqphC8RAkUQWoUKKFypC0oLDk9qTZJpMLYugBZ59ZYQb4MWIFZlipURoOJII2NycMErThgrcVygYeMaxrKSi3jFKREAB0aYaWMQ03eTwbqiHWccOarMl_GvJLL8d-PT9iOm54St3XKWsX7Gs_I0Bf63M_wN1ICqR8
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4gHvSECsa3e_Booeluu50jogQVkERIuJHug8RoCtHiwV_v7LYFYzx4a7bNttnHfDPd75sh5CrRHJhKpCeDAAMUEwYehD7zQBqYCykUdzruwTDqTfjDNJxWyPVaC2OMceQz07SX7ixfL9TK_iproTOCgM22yDbiPg9ztVZpd9ESQ1RydXxoPd-0O96ofcsBDTAGgoHlcPm2LNGPMioORbo1Mijfn5NHXpurTDbV16_UjP_9wD3S2Oj16GiNRPukYtIDUisLNtBi_9ZJ37I66GijFaBJqqkjk8o3Q7G3z2Il0qelcQfwHxSfpU6maxNe0kJYgDcaZNK9G3d6XlFOwXsJIsi8WAkQTMwNbkKm0e2JpYoC4YfahshyrpVm6M2AZiZJYmGEkqACiEOl5n5kNDsk1XSRmiNCJXalueRhDJwbAOkrpjkPfYn-TwLimNTt4MyWecaMWTEuJ383X5Kd3njQn_Xvh4-nZNdOVc7iOiPV7H1lzhH2M3nhZvsbXAasbA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28Symposium+on+Computer+Architecture+and+High+Performance+Computing%29&rft.atitle=High+Performance+and+Portable+Convolution+Operators+for+Multicore+Processors&rft.au=San+Juan%2C+Pablo&rft.au=Castello%2C+Adrian&rft.au=Dolz%2C+Manuel+F.&rft.au=Alonso-Jorda%2C+Pedro&rft.date=2020-09-01&rft.pub=IEEE&rft.eissn=2643-3001&rft.spage=91&rft.epage=98&rft_id=info:doi/10.1109%2FSBAC-PAD49847.2020.00023&rft.externalDocID=9235053