LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical s...
Saved in:
Published in | Parallel Computational Technologies Vol. 1063; pp. 139 - 151 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2019
Springer International Publishing |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783030281625 3030281620 |
ISSN | 1865-0929 1865-0937 |
DOI | 10.1007/978-3-030-28163-2_10 |
Cover
Loading…
Abstract | We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting. |
---|---|
AbstractList | We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting. |
Author | Zakirov, Andrey Levchenko, Vadim Perepelkina, Anastasia |
Author_xml | – sequence: 1 givenname: Vadim orcidid: 0000-0003-3623-0556 surname: Levchenko fullname: Levchenko, Vadim email: lev@keldysh.ru organization: Keldysh Institute of Applied Mathematics, Moscow, Russia – sequence: 2 givenname: Andrey orcidid: 0000-0001-7346-6635 surname: Zakirov fullname: Zakirov, Andrey email: mogmi@narod.ru organization: Keldysh Institute of Applied Mathematics, Moscow, Russia – sequence: 3 givenname: Anastasia orcidid: 0000-0003-2517-6064 surname: Perepelkina fullname: Perepelkina, Anastasia email: zakirov@kintechlab.ru organization: Kintech Lab Ltd., Moscow, Russia |
BookMark | eNo9kMtOIzEQRc0AIwjkD1j4BxpsVz9sdiEaHlIjIkQ2bCy7uwyBxG7ang1fj8NrVdK5dUuqMyF7Pngk5ISzU85Yc6YaWUDBgBVC8hoKoTnbIRPI5BOoP-SQy7oqmIJml0zz_k8mqr3fTKi_ZMI5LwHqUqkDMo3xhTEmBFMcmkPy2N77dkZbk9KqQ3oR1ul9Y7ynt5ieQ39OZ3SBowtjhjmfh81gxlUMngZHbzbDGjfok0mr4CPN9GqxpMb3dL5YHpN9Z9YRp9_ziCwv_z3Mr4v27upmPmuLQZSQCmuBNapqXK-ctJYrrLG3nFvsnZUSRedAlqUQFq2DqjNGQSlr4VCxvoEejoj4uhuHceWfcNQ2hNeYhemtSp3VaNBZjv40p7cqc6n8Kg1jePuPMWnctrr8zGjW3bMZEo5RV0qAqGRuKM1LCR8gGXPJ |
ContentType | Book Chapter |
Copyright | Springer Nature Switzerland AG 2019 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2019 |
DBID | FFUUA |
DEWEY | 4.3499999999999996 |
DOI | 10.1007/978-3-030-28163-2_10 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 3030281639 9783030281632 |
EISSN | 1865-0937 |
Editor | Zymbler, Mikhail Sokolinsky, Leonid |
Editor_xml | – sequence: 1 fullname: Zymbler, Mikhail – sequence: 2 fullname: Sokolinsky, Leonid |
EndPage | 151 |
ExternalDocumentID | EBC5923258_109_148 |
GroupedDBID | 38. 9-X AABBV AEJLV AEKFX AIFIR ALEXF ALMA_UNASSIGNED_HOLDINGS AYMPB BBABE CXBFT CZZ EXGDT FCSXQ FFUUA I4C IEZ MGZZY NSQWD OORQV SBO SNUHX TPJZQ Z83 Z84 Z88 AAJYQ AATVQ ABBUY ABCYT ACDTA ACDUY AEHEY AHNNE ATJMZ |
ID | FETCH-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3 |
ISBN | 9783030281625 3030281620 |
ISSN | 1865-0929 |
IngestDate | Tue Jul 29 20:04:34 EDT 2025 Thu May 29 16:16:58 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | QA76.9.S88 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3 |
OCLC | 1114336499 |
ORCID | 0000-0003-2517-6064 0000-0001-7346-6635 0000-0003-3623-0556 |
PQID | EBC5923258_109_148 |
PageCount | 13 |
ParticipantIDs | springer_books_10_1007_978_3_030_28163_2_10 proquest_ebookcentralchapters_5923258_109_148 |
PublicationCentury | 2000 |
PublicationDate | 2019 |
PublicationDateYYYYMMDD | 2019-01-01 |
PublicationDate_xml | – year: 2019 text: 2019 |
PublicationDecade | 2010 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesTitle | Communications in Computer and Information Science |
PublicationSeriesTitleAlternate | Communic.Comp.Inf.Science |
PublicationSubtitle | 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2-4, 2019, Revised Selected Papers |
PublicationTitle | Parallel Computational Technologies |
PublicationYear | 2019 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Barbosa, Simone Diniz Junqueira Zhou, Lizhu Kotenko, Igor Filipe, Joaquim Ghosh, Ashish Yuan, Junsong |
RelatedPersons_xml | – sequence: 1 givenname: Simone Diniz Junqueira surname: Barbosa fullname: Barbosa, Simone Diniz Junqueira organization: Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil – sequence: 2 givenname: Joaquim surname: Filipe fullname: Filipe, Joaquim organization: Polytechnic Institute of Setúbal, Setúbal, Portugal – sequence: 3 givenname: Ashish surname: Ghosh fullname: Ghosh, Ashish organization: Indian Statistical Institute, Kolkata, India – sequence: 4 givenname: Igor surname: Kotenko fullname: Kotenko, Igor organization: St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia – sequence: 5 givenname: Junsong surname: Yuan fullname: Yuan, Junsong organization: University at Buffalo, The State University of New York, Buffalo, USA – sequence: 6 givenname: Lizhu surname: Zhou fullname: Zhou, Lizhu organization: Tsinghua University , Beijing, China |
SSID | ssj0002209137 ssj0000580895 ssib054953581 |
Score | 1.9267608 |
Snippet | We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 139 |
SubjectTerms | GPU LBM LRnLA Temporal blocking Time skewing Vectorization |
Title | LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5923258&ppg=148 http://link.springer.com/10.1007/978-3-030-28163-2_10 |
Volume | 1063 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb5wwEIatZnupeui3mqatfOht5QpsA3Zum9U2UbStVlW2inqxbLBPCRsl9JJf3zE2LNBc0gtCFiDLD5jxjN8ZhL4w6lzBdELS0uYELHBBdKLhw8t1LjUThqde7_z9R3625eeX2WVX3j2qSxrztbx_UFfyP1ShDbh6lewjyPYPhQY4B75wBMJwnBi_YzdrSHqhb30dlKt5KMzQOfV6X3m_OzCKnX_W68V8rRu_221-srtq7q917SM1voR0EKhvBiqC5bBAYUgifB11SnUbYjjdbNvIw3KzHboOvFpp5DroXIcT5-PA_7U4HS034XcH5kiaB6lyP38mYYr6ZzIe7r-AO4m_lRGq4j7WUe7rNOTbnOS-Xp0sM7BAaSZ8pByWKeIAHRSCz9DTxep8_av3plHq85v6Aox9H2NCr32fB8LJh_o0WmJMouKtsXHxEj33AhTslSHQy1foia1foxdd-Q0cZ-M36HeLFEekuEeKA9JjvMADoHgPFO8cngDF0ApAMQDFAPQt2n5bXSzPSCyVQW4oZw0xBuZqmRWukk4Yk0qb28qkqbGVM0JYWjomOKfUWONYVmotGYfFrLMyqQpWsXdoVu9q-x5hzYtCZK7UVOQ8oVqWknMtq6RkOofv_RCRbpxUG9CPu4jLMCp3akLsEM27wVT-8jvVZcoGCoopoKBaCspT-PDIpx-hZ_s3-yOaNbd_7CcwExvzOb4jfwGygmMz |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Parallel+Computational+Technologies&rft.atitle=LRnLA+Lattice+Boltzmann+Method%3A+A+Performance+Comparison+of+Implementations+on+GPU+and+CPU&rft.date=2019-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030281625&rft.volume=1063&rft_id=info:doi/10.1007%2F978-3-030-28163-2_10&rft.externalDBID=148&rft.externalDocID=EBC5923258_109_148 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5923258-l.jpg |