LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU

We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical s...

Full description

Saved in:
Bibliographic Details
Published inParallel Computational Technologies Vol. 1063; pp. 139 - 151
Main Authors Levchenko, Vadim, Zakirov, Andrey, Perepelkina, Anastasia
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesCommunications in Computer and Information Science
Subjects
Online AccessGet full text
ISBN9783030281625
3030281620
ISSN1865-0929
1865-0937
DOI10.1007/978-3-030-28163-2_10

Cover

Loading…
Abstract We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting.
AbstractList We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The algorithm is based on the recursive subdivision of the domain of the dD1T space-time simulation and loosens the memory-bound limit for numerical schemes with local dependencies. We show that LRnLA algorithm allows to overcome the main memory bandwidth limitations in both CPU and GPU implementations. For CPU, we find the data layout that provides alignment for the full use of AVX2/AVX512 vectorization. For GPU, we devise a procedure for pairwise CUDA-block synchronization applied to the implementation of the LRnLA algorithm, which previously worked only on CPU. The performance on GPU is higher, as it is usual in modern implementations. However, the performance gap in our implementation is smaller, thanks to a more efficient CPU version. Through a detailed comparison, we show possible future applications for both the CPU and the GPU implementations of the lattice Boltzmann method in the complex setting.
Author Zakirov, Andrey
Levchenko, Vadim
Perepelkina, Anastasia
Author_xml – sequence: 1
  givenname: Vadim
  orcidid: 0000-0003-3623-0556
  surname: Levchenko
  fullname: Levchenko, Vadim
  email: lev@keldysh.ru
  organization: Keldysh Institute of Applied Mathematics, Moscow, Russia
– sequence: 2
  givenname: Andrey
  orcidid: 0000-0001-7346-6635
  surname: Zakirov
  fullname: Zakirov, Andrey
  email: mogmi@narod.ru
  organization: Keldysh Institute of Applied Mathematics, Moscow, Russia
– sequence: 3
  givenname: Anastasia
  orcidid: 0000-0003-2517-6064
  surname: Perepelkina
  fullname: Perepelkina, Anastasia
  email: zakirov@kintechlab.ru
  organization: Kintech Lab Ltd., Moscow, Russia
BookMark eNo9kMtOIzEQRc0AIwjkD1j4BxpsVz9sdiEaHlIjIkQ2bCy7uwyBxG7ang1fj8NrVdK5dUuqMyF7Pngk5ISzU85Yc6YaWUDBgBVC8hoKoTnbIRPI5BOoP-SQy7oqmIJml0zz_k8mqr3fTKi_ZMI5LwHqUqkDMo3xhTEmBFMcmkPy2N77dkZbk9KqQ3oR1ul9Y7ynt5ieQ39OZ3SBowtjhjmfh81gxlUMngZHbzbDGjfok0mr4CPN9GqxpMb3dL5YHpN9Z9YRp9_ziCwv_z3Mr4v27upmPmuLQZSQCmuBNapqXK-ctJYrrLG3nFvsnZUSRedAlqUQFq2DqjNGQSlr4VCxvoEejoj4uhuHceWfcNQ2hNeYhemtSp3VaNBZjv40p7cqc6n8Kg1jePuPMWnctrr8zGjW3bMZEo5RV0qAqGRuKM1LCR8gGXPJ
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2019
Copyright_xml – notice: Springer Nature Switzerland AG 2019
DBID FFUUA
DEWEY 4.3499999999999996
DOI 10.1007/978-3-030-28163-2_10
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 3030281639
9783030281632
EISSN 1865-0937
Editor Zymbler, Mikhail
Sokolinsky, Leonid
Editor_xml – sequence: 1
  fullname: Zymbler, Mikhail
– sequence: 2
  fullname: Sokolinsky, Leonid
EndPage 151
ExternalDocumentID EBC5923258_109_148
GroupedDBID 38.
9-X
AABBV
AEJLV
AEKFX
AIFIR
ALEXF
ALMA_UNASSIGNED_HOLDINGS
AYMPB
BBABE
CXBFT
CZZ
EXGDT
FCSXQ
FFUUA
I4C
IEZ
MGZZY
NSQWD
OORQV
SBO
SNUHX
TPJZQ
Z83
Z84
Z88
AAJYQ
AATVQ
ABBUY
ABCYT
ACDTA
ACDUY
AEHEY
AHNNE
ATJMZ
ID FETCH-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3
ISBN 9783030281625
3030281620
ISSN 1865-0929
IngestDate Tue Jul 29 20:04:34 EDT 2025
Thu May 29 16:16:58 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum QA76.9.S88
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-bb307957fd9f8bb19e6edb11bedfb88e2cf384422bebf35caa934862fe90d73d3
OCLC 1114336499
ORCID 0000-0003-2517-6064
0000-0001-7346-6635
0000-0003-3623-0556
PQID EBC5923258_109_148
PageCount 13
ParticipantIDs springer_books_10_1007_978_3_030_28163_2_10
proquest_ebookcentralchapters_5923258_109_148
PublicationCentury 2000
PublicationDate 2019
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – year: 2019
  text: 2019
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Communications in Computer and Information Science
PublicationSeriesTitleAlternate Communic.Comp.Inf.Science
PublicationSubtitle 13th International Conference, PCT 2019, Kaliningrad, Russia, April 2-4, 2019, Revised Selected Papers
PublicationTitle Parallel Computational Technologies
PublicationYear 2019
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Barbosa, Simone Diniz Junqueira
Zhou, Lizhu
Kotenko, Igor
Filipe, Joaquim
Ghosh, Ashish
Yuan, Junsong
RelatedPersons_xml – sequence: 1
  givenname: Simone Diniz Junqueira
  surname: Barbosa
  fullname: Barbosa, Simone Diniz Junqueira
  organization: Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil
– sequence: 2
  givenname: Joaquim
  surname: Filipe
  fullname: Filipe, Joaquim
  organization: Polytechnic Institute of Setúbal, Setúbal, Portugal
– sequence: 3
  givenname: Ashish
  surname: Ghosh
  fullname: Ghosh, Ashish
  organization: Indian Statistical Institute, Kolkata, India
– sequence: 4
  givenname: Igor
  surname: Kotenko
  fullname: Kotenko, Igor
  organization: St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
– sequence: 5
  givenname: Junsong
  surname: Yuan
  fullname: Yuan, Junsong
  organization: University at Buffalo, The State University of New York, Buffalo, USA
– sequence: 6
  givenname: Lizhu
  surname: Zhou
  fullname: Zhou, Lizhu
  organization: Tsinghua University , Beijing, China
SSID ssj0002209137
ssj0000580895
ssib054953581
Score 1.9267608
Snippet We present an implementation of the Lattice Boltzmann Method (LBM) with Locally Recursive non-Locally Asynchronous (LRnLA) algorithms on GPU and CPU. The...
SourceID springer
proquest
SourceType Publisher
StartPage 139
SubjectTerms GPU
LBM
LRnLA
Temporal blocking
Time skewing
Vectorization
Title LRnLA Lattice Boltzmann Method: A Performance Comparison of Implementations on GPU and CPU
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5923258&ppg=148
http://link.springer.com/10.1007/978-3-030-28163-2_10
Volume 1063
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Nb5wwEIatZnupeui3mqatfOht5QpsA3Zum9U2UbStVlW2inqxbLBPCRsl9JJf3zE2LNBc0gtCFiDLD5jxjN8ZhL4w6lzBdELS0uYELHBBdKLhw8t1LjUThqde7_z9R3625eeX2WVX3j2qSxrztbx_UFfyP1ShDbh6lewjyPYPhQY4B75wBMJwnBi_YzdrSHqhb30dlKt5KMzQOfV6X3m_OzCKnX_W68V8rRu_221-srtq7q917SM1voR0EKhvBiqC5bBAYUgifB11SnUbYjjdbNvIw3KzHboOvFpp5DroXIcT5-PA_7U4HS034XcH5kiaB6lyP38mYYr6ZzIe7r-AO4m_lRGq4j7WUe7rNOTbnOS-Xp0sM7BAaSZ8pByWKeIAHRSCz9DTxep8_av3plHq85v6Aox9H2NCr32fB8LJh_o0WmJMouKtsXHxEj33AhTslSHQy1foia1foxdd-Q0cZ-M36HeLFEekuEeKA9JjvMADoHgPFO8cngDF0ApAMQDFAPQt2n5bXSzPSCyVQW4oZw0xBuZqmRWukk4Yk0qb28qkqbGVM0JYWjomOKfUWONYVmotGYfFrLMyqQpWsXdoVu9q-x5hzYtCZK7UVOQ8oVqWknMtq6RkOofv_RCRbpxUG9CPu4jLMCp3akLsEM27wVT-8jvVZcoGCoopoKBaCspT-PDIpx-hZ_s3-yOaNbd_7CcwExvzOb4jfwGygmMz
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Parallel+Computational+Technologies&rft.atitle=LRnLA+Lattice+Boltzmann+Method%3A+A+Performance+Comparison+of+Implementations+on+GPU+and+CPU&rft.date=2019-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030281625&rft.volume=1063&rft_id=info:doi/10.1007%2F978-3-030-28163-2_10&rft.externalDBID=148&rft.externalDocID=EBC5923258_109_148
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5923258-l.jpg