High Performance and Portable Convolution Operators for Multicore Processors

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general...

Full description

Saved in:
Bibliographic Details
Published inProceedings (Symposium on Computer Architecture and High Performance Computing) pp. 91 - 98
Main Authors San Juan, Pablo, Castello, Adrian, Dolz, Manuel F., Alonso-Jorda, Pedro, Quintana-Orti, Enrique S.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2020
Subjects
Online AccessGet full text
ISSN2643-3001
DOI10.1109/SBAC-PAD49847.2020.00023

Cover

Abstract The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
AbstractList The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.
Author Dolz, Manuel F.
Alonso-Jorda, Pedro
Castello, Adrian
San Juan, Pablo
Quintana-Orti, Enrique S.
Author_xml – sequence: 1
  givenname: Pablo
  surname: San Juan
  fullname: San Juan, Pablo
  organization: Universitat Politècnica de Valéncia
– sequence: 2
  givenname: Adrian
  surname: Castello
  fullname: Castello, Adrian
  organization: Universitat Jaume 1
– sequence: 3
  givenname: Manuel F.
  surname: Dolz
  fullname: Dolz, Manuel F.
  organization: Universitat Jaume 1
– sequence: 4
  givenname: Pedro
  surname: Alonso-Jorda
  fullname: Alonso-Jorda, Pedro
  organization: Universitat Politècnica de Valéncia
– sequence: 5
  givenname: Enrique S.
  surname: Quintana-Orti
  fullname: Quintana-Orti, Enrique S.
  organization: Universitat Politècnica de Valéncia
BookMark eNotjMtOwzAQAA0CCVr4Ai7-gZT1bmLHxxIeRSpqJOBcOfYGgtK4clIk_p5KcJrDjGYmzoY4sBBSwUIpsLevd8sqq5f3uS1zs0BAWAAA0omYKYOlshZzcyouUeeUEYC6ELNx_AIgjVZfivWq-_iUNac2pp0bPEs3BFnHNLmmZ1nF4Tv2h6mLg9zsObkpplEeW_ly6KfOx8SyTtHzOB7FlThvXT_y9T_n4v3x4a1aZevN03O1XGcdajtlpTfWkGkZlKGgkcrGazRQhFxbaNrgAxk0NhA7Vxo2vrEebVl434LmQHNx8_ftmHm7T93OpZ-tRSqgIPoFQL5RXA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SBAC-PAD49847.2020.00023
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728199247
9781728199245
EISSN 2643-3001
EndPage 98
ExternalDocumentID 9235053
Genre orig-research
GroupedDBID 23M
29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:27:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3
PageCount 8
ParticipantIDs ieee_primary_9235053
PublicationCentury 2000
PublicationDate 2020-09-01
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 2020-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Proceedings (Symposium on Computer Architecture and High Performance Computing)
PublicationTitleAbbrev SBAC-PAD
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0036296
Score 2.2427852
Snippet The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance...
SourceID ieee
SourceType Publisher
StartPage 91
SubjectTerms Convolutional neural networks
high performance
multicore processors
Title High Performance and Portable Convolution Operators for Multicore Processors
URI https://ieeexplore.ieee.org/document/9235053
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-AkydUMH6nB48OlrVb946IEmNESZSEG1nbt8RogOjw4F_va7eBMR68NevSLa_t-2h_v_cYu8isBGEyHegoogAF4yiAOBQBaIRcaWWk53GPH5LbqbybxbMGu9xwYRDRg8-w55r-Lt8uzdodlfXJGSGDLZqsScus5GrVWpf0MCQ1UieE_tPVYBhMBtcSSP1SGBg5BFfoihL9KKLibciozcb110voyGtvXeie-fqVmPG_v7fLulu2Hp9s7NAea-Bin7Xrcg282r0ddu8wHXyyZQrwbGG5h5LqN-Q02me1DvnjCv31-wend7kn6bp0l7yiFVBHl01HN8_D26AqphC8RAkUQWoUKKFypC0oLDk9qTZJpMLYugBZ59ZYQb4MWIFZlipURoOJII2NycMErThgrcVygYeMaxrKSi3jFKREAB0aYaWMQ03eTwbqiHWccOarMl_GvJLL8d-PT9iOm54St3XKWsX7Gs_I0Bf63M_wN1ICqR8
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4gHvSECsa3e_Booeluu50jogQVkERIuJHug8RoCtHiwV_v7LYFYzx4a7bNttnHfDPd75sh5CrRHJhKpCeDAAMUEwYehD7zQBqYCykUdzruwTDqTfjDNJxWyPVaC2OMceQz07SX7ixfL9TK_iproTOCgM22yDbiPg9ztVZpd9ESQ1RydXxoPd-0O96ofcsBDTAGgoHlcPm2LNGPMioORbo1Mijfn5NHXpurTDbV16_UjP_9wD3S2Oj16GiNRPukYtIDUisLNtBi_9ZJ37I66GijFaBJqqkjk8o3Q7G3z2Il0qelcQfwHxSfpU6maxNe0kJYgDcaZNK9G3d6XlFOwXsJIsi8WAkQTMwNbkKm0e2JpYoC4YfahshyrpVm6M2AZiZJYmGEkqACiEOl5n5kNDsk1XSRmiNCJXalueRhDJwbAOkrpjkPfYn-TwLimNTt4MyWecaMWTEuJ383X5Kd3njQn_Xvh4-nZNdOVc7iOiPV7H1lzhH2M3nhZvsbXAasbA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28Symposium+on+Computer+Architecture+and+High+Performance+Computing%29&rft.atitle=High+Performance+and+Portable+Convolution+Operators+for+Multicore+Processors&rft.au=San+Juan%2C+Pablo&rft.au=Castello%2C+Adrian&rft.au=Dolz%2C+Manuel+F.&rft.au=Alonso-Jorda%2C+Pedro&rft.date=2020-09-01&rft.pub=IEEE&rft.eissn=2643-3001&rft.spage=91&rft.epage=98&rft_id=info:doi/10.1109%2FSBAC-PAD49847.2020.00023&rft.externalDocID=9235053