High Performance and Portable Convolution Operators for Multicore Processors
The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general...
Saved in:
Published in | Proceedings (Symposium on Computer Architecture and High Performance Computing) pp. 91 - 98 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.09.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 2643-3001 |
DOI | 10.1109/SBAC-PAD49847.2020.00023 |
Cover
Abstract | The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS. |
---|---|
AbstractList | The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the IM2COL transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS. |
Author | Dolz, Manuel F. Alonso-Jorda, Pedro Castello, Adrian San Juan, Pablo Quintana-Orti, Enrique S. |
Author_xml | – sequence: 1 givenname: Pablo surname: San Juan fullname: San Juan, Pablo organization: Universitat Politècnica de Valéncia – sequence: 2 givenname: Adrian surname: Castello fullname: Castello, Adrian organization: Universitat Jaume 1 – sequence: 3 givenname: Manuel F. surname: Dolz fullname: Dolz, Manuel F. organization: Universitat Jaume 1 – sequence: 4 givenname: Pedro surname: Alonso-Jorda fullname: Alonso-Jorda, Pedro organization: Universitat Politècnica de Valéncia – sequence: 5 givenname: Enrique S. surname: Quintana-Orti fullname: Quintana-Orti, Enrique S. organization: Universitat Politècnica de Valéncia |
BookMark | eNotjMtOwzAQAA0CCVr4Ai7-gZT1bmLHxxIeRSpqJOBcOfYGgtK4clIk_p5KcJrDjGYmzoY4sBBSwUIpsLevd8sqq5f3uS1zs0BAWAAA0omYKYOlshZzcyouUeeUEYC6ELNx_AIgjVZfivWq-_iUNac2pp0bPEs3BFnHNLmmZ1nF4Tv2h6mLg9zsObkpplEeW_ly6KfOx8SyTtHzOB7FlThvXT_y9T_n4v3x4a1aZevN03O1XGcdajtlpTfWkGkZlKGgkcrGazRQhFxbaNrgAxk0NhA7Vxo2vrEebVl434LmQHNx8_ftmHm7T93OpZ-tRSqgIPoFQL5RXA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/SBAC-PAD49847.2020.00023 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 1728199247 9781728199245 |
EISSN | 2643-3001 |
EndPage | 98 |
ExternalDocumentID | 9235053 |
Genre | orig-research |
GroupedDBID | 23M 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
ID | FETCH-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:27:29 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i269t-8c79737fe0173d6238bc62705d4690bfdcd37279d3eaa87e7cb9c2985ccf06ed3 |
PageCount | 8 |
ParticipantIDs | ieee_primary_9235053 |
PublicationCentury | 2000 |
PublicationDate | 2020-09-01 |
PublicationDateYYYYMMDD | 2020-09-01 |
PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | Proceedings (Symposium on Computer Architecture and High Performance Computing) |
PublicationTitleAbbrev | SBAC-PAD |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0036296 |
Score | 2.2427852 |
Snippet | The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 91 |
SubjectTerms | Convolutional neural networks high performance multicore processors |
Title | High Performance and Portable Convolution Operators for Multicore Processors |
URI | https://ieeexplore.ieee.org/document/9235053 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-AkydUMH6nB48OlrVb946IEmNESZSEG1nbt8RogOjw4F_va7eBMR68NevSLa_t-2h_v_cYu8isBGEyHegoogAF4yiAOBQBaIRcaWWk53GPH5LbqbybxbMGu9xwYRDRg8-w55r-Lt8uzdodlfXJGSGDLZqsScus5GrVWpf0MCQ1UieE_tPVYBhMBtcSSP1SGBg5BFfoihL9KKLibciozcb110voyGtvXeie-fqVmPG_v7fLulu2Hp9s7NAea-Bin7Xrcg282r0ddu8wHXyyZQrwbGG5h5LqN-Q02me1DvnjCv31-wend7kn6bp0l7yiFVBHl01HN8_D26AqphC8RAkUQWoUKKFypC0oLDk9qTZJpMLYugBZ59ZYQb4MWIFZlipURoOJII2NycMErThgrcVygYeMaxrKSi3jFKREAB0aYaWMQ03eTwbqiHWccOarMl_GvJLL8d-PT9iOm54St3XKWsX7Gs_I0Bf63M_wN1ICqR8 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4gHvSECsa3e_Booeluu50jogQVkERIuJHug8RoCtHiwV_v7LYFYzx4a7bNttnHfDPd75sh5CrRHJhKpCeDAAMUEwYehD7zQBqYCykUdzruwTDqTfjDNJxWyPVaC2OMceQz07SX7ixfL9TK_iproTOCgM22yDbiPg9ztVZpd9ESQ1RydXxoPd-0O96ofcsBDTAGgoHlcPm2LNGPMioORbo1Mijfn5NHXpurTDbV16_UjP_9wD3S2Oj16GiNRPukYtIDUisLNtBi_9ZJ37I66GijFaBJqqkjk8o3Q7G3z2Il0qelcQfwHxSfpU6maxNe0kJYgDcaZNK9G3d6XlFOwXsJIsi8WAkQTMwNbkKm0e2JpYoC4YfahshyrpVm6M2AZiZJYmGEkqACiEOl5n5kNDsk1XSRmiNCJXalueRhDJwbAOkrpjkPfYn-TwLimNTt4MyWecaMWTEuJ383X5Kd3njQn_Xvh4-nZNdOVc7iOiPV7H1lzhH2M3nhZvsbXAasbA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28Symposium+on+Computer+Architecture+and+High+Performance+Computing%29&rft.atitle=High+Performance+and+Portable+Convolution+Operators+for+Multicore+Processors&rft.au=San+Juan%2C+Pablo&rft.au=Castello%2C+Adrian&rft.au=Dolz%2C+Manuel+F.&rft.au=Alonso-Jorda%2C+Pedro&rft.date=2020-09-01&rft.pub=IEEE&rft.eissn=2643-3001&rft.spage=91&rft.epage=98&rft_id=info:doi/10.1109%2FSBAC-PAD49847.2020.00023&rft.externalDocID=9235053 |