LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical s...

Full description

Saved in:

Bibliographic Details
Published in	Parallel Computational Technologies Vol. 1063; pp. 139 - 151
Main Authors	Levchenko, Vadim, Zakirov, Andrey, Perepelkina, Anastasia
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2019 Springer International Publishing
Series	Communications in Computer and Information Science
Subjects	GPU LBM LRnLA Temporal blocking Time skewing Vectorization
Online Access	Get full text
ISBN	9783030281625 3030281620
ISSN	1865-0929 1865-0937
DOI	10.1007/978-3-030-28163-2_10

Cover

Loading…

Abstract	We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting.
AbstractList	We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting.
Author	Zakirov, Andrey Levchenko, Vadim Perepelkina, Anastasia
Author_xml	– sequence: 1 givenname: Vadim orcidid: 0000-0003-3623-0556 surname: Levchenko fullname: Levchenko, Vadim email: lev@keldysh.ru organization: Keldysh Institute of Applied Mathematics, Moscow, Russia – sequence: 2 givenname: Andrey orcidid: 0000-0001-7346-6635 surname: Zakirov fullname: Zakirov, Andrey email: mogmi@narod.ru organization: Keldysh Institute of Applied Mathematics, Moscow, Russia – sequence: 3 givenname: Anastasia orcidid: 0000-0003-2517-6064 surname: Perepelkina fullname: Perepelkina, Anastasia email: zakirov@kintechlab.ru organization: Kintech Lab Ltd., Moscow, Russia
BookMark	eNo9kMtOIzEQRc0AIwjkD1j4BxpsVz9sdiEaHlIjIkQ2bCy7uwyBxG7ang1fj8NrVdK5dUuqMyF7Pngk5ISzU85Yc6YaWUDBgBVC8hoKoTnbIRPI5BOoP-SQy7oqmIJml0zz_k8mqr3fTKi_ZMI5LwHqUqkDMo3xhTEmBFMcmkPy2N77dkZbk9KqQ3oR1ul9Y7ynt5ieQ39OZ3SBowtjhjmfh81gxlUMngZHbzbDGjfok0mr4CPN9GqxpMb3dL5YHpN9Z9YRp9_ziCwv_z3Mr4v27upmPmuLQZSQCmuBNapqXK-ctJYrrLG3nFvsnZUSRedAlqUQFq2DqjNGQSlr4VCxvoEejoj4uhuHceWfcNQ2hNeYhemtSp3VaNBZjv40p7cqc6n8Kg1jePuPMWnctrr8zGjW3bMZEo5RV0qAqGRuKM1LCR8gGXPJ
ContentType	Book Chapter
Copyright	Springer Nature Switzerland AG 2019
Copyright_xml	– notice: Springer Nature Switzerland AG 2019
DBID	FFUUA
DEWEY	4.3499999999999996
DOI	10.1007/978-3-030-28163-2_10
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	3030281639 9783030281632
EISSN	1865-0937
Editor	Zymbler, Mikhail Sokolinsky, Leonid
Editor_xml	– sequence: 1 fullname: Zymbler, Mikhail – sequence: 2 fullname: Sokolinsky, Leonid
EndPage	151
ExternalDocumentID	EBC5923258_109_148
GroupedDBID	38. 9-X AABBV AEJLV AEKFX AIFIR ALEXF ALMA_UNASSIGNED_HOLDINGS AYMPB BBABE CXBFT CZZ EXGDT FCSXQ FFUUA I4C IEZ MGZZY NSQWD OORQV SBO SNUHX TPJZQ Z83 Z84 Z88 AAJYQ AATVQ ABBUY ABCYT ACDTA ACDUY AEHEY AHNNE ATJMZ
ID	FETCH-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3
ISBN	9783030281625 3030281620
ISSN	1865-0929
IngestDate	Tue Jul 29 20:04:34 EDT 2025 Thu May 29 16:16:58 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	QA76.9.S88
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3
OCLC	1114336499
ORCID	0000-0003-2517-6064 0000-0001-7346-6635 0000-0003-3623-0556
PQID	EBC5923258_109_148
PageCount	13
ParticipantIDs	springer_books_10_1007_978_3_030_28163_2_10 proquest_ebookcentralchapters_5923258_109_148
PublicationCentury	2000
PublicationDate	2019
PublicationDateYYYYMMDD	2019-01-01
PublicationDate_xml	– year: 2019 text: 2019
PublicationDecade	2010
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesTitle	Communications in Computer and Information Science
PublicationSeriesTitleAlternate	Communic.Comp.Inf.Science
PublicationSubtitle	13th International Conference, PCT 2019, Kaliningrad, Russia, April 2-4, 2019, Revised Selected Papers
PublicationTitle	Parallel Computational Technologies
PublicationYear	2019
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Barbosa, Simone Diniz Junqueira Zhou, Lizhu Kotenko, Igor Filipe, Joaquim Ghosh, Ashish Yuan, Junsong
RelatedPersons_xml	– sequence: 1 givenname: Simone Diniz Junqueira surname: Barbosa fullname: Barbosa, Simone Diniz Junqueira organization: Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil – sequence: 2 givenname: Joaquim surname: Filipe fullname: Filipe, Joaquim organization: Polytechnic Institute of Setúbal, Setúbal, Portugal – sequence: 3 givenname: Ashish surname: Ghosh fullname: Ghosh, Ashish organization: Indian Statistical Institute, Kolkata, India – sequence: 4 givenname: Igor surname: Kotenko fullname: Kotenko, Igor organization: St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia – sequence: 5 givenname: Junsong surname: Yuan fullname: Yuan, Junsong organization: University at Buffalo, The State University of New York, Buffalo, USA – sequence: 6 givenname: Lizhu surname: Zhou fullname: Zhou, Lizhu organization: Tsinghua University , Beijing, China
SSID	ssj0002209137 ssj0000580895 ssib054953581
Score	1.9267608
Snippet	We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The...
SourceID	springer proquest
SourceType	Publisher
StartPage	139
SubjectTerms	GPU LBM LRnLA Temporal blocking Time skewing Vectorization
Title	LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5923258&ppg=148 http://link.springer.com/10.1007/978-3-030-28163-2_10
Volume	1063
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb5wwEIatZnupeui3mqatfOht5QpsA3Zum9U2UbStVlW2inqxbLBPCRsl9JJf3zE2LNBc0gtCFiDLD5jxjN8ZhL4w6lzBdELS0uYELHBBdKLhw8t1LjUThqde7_z9R3625eeX2WVX3j2qSxrztbx_UFfyP1ShDbh6lewjyPYPhQY4B75wBMJwnBi_YzdrSHqhb30dlKt5KMzQOfV6X3m_OzCKnX_W68V8rRu_221-srtq7q917SM1voR0EKhvBiqC5bBAYUgifB11SnUbYjjdbNvIw3KzHboOvFpp5DroXIcT5-PA_7U4HS034XcH5kiaB6lyP38mYYr6ZzIe7r-AO4m_lRGq4j7WUe7rNOTbnOS-Xp0sM7BAaSZ8pByWKeIAHRSCz9DTxep8_av3plHq85v6Aox9H2NCr32fB8LJh_o0WmJMouKtsXHxEj33AhTslSHQy1foia1foxdd-Q0cZ-M36HeLFEekuEeKA9JjvMADoHgPFO8cngDF0ApAMQDFAPQt2n5bXSzPSCyVQW4oZw0xBuZqmRWukk4Yk0qb28qkqbGVM0JYWjomOKfUWONYVmotGYfFrLMyqQpWsXdoVu9q-x5hzYtCZK7UVOQ8oVqWknMtq6RkOofv_RCRbpxUG9CPu4jLMCp3akLsEM27wVT-8jvVZcoGCoopoKBaCspT-PDIpx-hZ_s3-yOaNbd_7CcwExvzOb4jfwGygmMz
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Parallel+Computational+Technologies&rft.atitle=LRnLA+Lattice+Boltzmann+Method%3A+A+Performance+Comparison+of+Implementations+on+GPU+and+CPU&rft.date=2019-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030281625&rft.volume=1063&rft_id=info:doi/10.1007%2F978-3-030-28163-2_10&rft.externalDBID=148&rft.externalDocID=EBC5923258_109_148
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5923258-l.jpg