Hybrid Reinforcement Learning for STAR-RISs: A Coupled Phase-Shift Model Based Beamformer

A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted multi-user downlink multiple-input single-output (MISO) communication system is investigated. In contrast to the existing ideal STAR-RIS model assuming an independent transmission and reflection phase-s...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal on selected areas in communications Vol. 40; no. 9; pp. 2556 - 2569
Main Authors Zhong, Ruikang, Liu, Yuanwei, Mu, Xidong, Chen, Yue, Wang, Xianbin, Hanzo, Lajos
Format Journal Article
LanguageEnglish
Published New York IEEE 01.09.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted multi-user downlink multiple-input single-output (MISO) communication system is investigated. In contrast to the existing ideal STAR-RIS model assuming an independent transmission and reflection phase-shift control, a practical coupled phase-shift model is considered. Then, a joint active and passive beamforming optimization problem is formulated for minimizing the long-term transmission power consumption, subject to the coupled phase-shift constraint and the minimum data rate constraint. Despite the coupled nature of the phase-shift model, the formulated problem is solved by invoking a hybrid continuous and discrete phase-shift control policy. Inspired by this observation, a pair of hybrid reinforcement learning (RL) algorithms, namely the hybrid deep deterministic policy gradient (hybrid DDPG) algorithm and the joint DDPG & deep-Q network (DDPG-DQN) based algorithm are proposed. The hybrid DDPG algorithm controls the associated high-dimensional continuous and discrete actions by relying on the hybrid action mapping. By contrast, the joint DDPG-DQN algorithm constructs two Markov decision processes (MDPs) relying on an inner and an outer environment, thereby amalgamating the two agents to accomplish a joint hybrid control. Simulation results demonstrate that the STAR-RIS has superiority over other conventional RISs in terms of its energy consumption. Furthermore, both the proposed algorithms outperform the baseline DDPG algorithm, and the joint DDPG-DQN algorithm achieves a superior performance, albeit at an increased computational complexity.
AbstractList A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted multi-user downlink multiple-input single-output (MISO) communication system is investigated. In contrast to the existing ideal STAR-RIS model assuming an independent transmission and reflection phase-shift control, a practical coupled phase-shift model is considered. Then, a joint active and passive beamforming optimization problem is formulated for minimizing the long-term transmission power consumption, subject to the coupled phase-shift constraint and the minimum data rate constraint. Despite the coupled nature of the phase-shift model, the formulated problem is solved by invoking a hybrid continuous and discrete phase-shift control policy. Inspired by this observation, a pair of hybrid reinforcement learning (RL) algorithms, namely the hybrid deep deterministic policy gradient (hybrid DDPG) algorithm and the joint DDPG & deep-Q network (DDPG-DQN) based algorithm are proposed. The hybrid DDPG algorithm controls the associated high-dimensional continuous and discrete actions by relying on the hybrid action mapping. By contrast, the joint DDPG-DQN algorithm constructs two Markov decision processes (MDPs) relying on an inner and an outer environment, thereby amalgamating the two agents to accomplish a joint hybrid control. Simulation results demonstrate that the STAR-RIS has superiority over other conventional RISs in terms of its energy consumption. Furthermore, both the proposed algorithms outperform the baseline DDPG algorithm, and the joint DDPG-DQN algorithm achieves a superior performance, albeit at an increased computational complexity.
Author Mu, Xidong
Zhong, Ruikang
Hanzo, Lajos
Liu, Yuanwei
Chen, Yue
Wang, Xianbin
Author_xml – sequence: 1
  givenname: Ruikang
  orcidid: 0000-0003-4914-6425
  surname: Zhong
  fullname: Zhong, Ruikang
  email: r.zhong@qmul.ac.uk
  organization: School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K
– sequence: 2
  givenname: Yuanwei
  orcidid: 0000-0002-6389-8941
  surname: Liu
  fullname: Liu, Yuanwei
  email: yuanwei.liu@qmul.ac.uk
  organization: School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K
– sequence: 3
  givenname: Xidong
  orcidid: 0000-0001-8351-360X
  surname: Mu
  fullname: Mu, Xidong
  email: xidong.mu@qmul.ac.uk
  organization: School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K
– sequence: 4
  givenname: Yue
  surname: Chen
  fullname: Chen, Yue
  email: yue.chen@qmul.ac.uk
  organization: School of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K
– sequence: 5
  givenname: Xianbin
  orcidid: 0000-0003-4890-0748
  surname: Wang
  fullname: Wang, Xianbin
  email: xianbin.wang@uwo.ca
  organization: Department of Electrical and Computer Engineering, Western University, London, Canada
– sequence: 6
  givenname: Lajos
  orcidid: 0000-0002-2636-5214
  surname: Hanzo
  fullname: Hanzo, Lajos
  email: lh@ecs.soton.ac.uk
  organization: School of Electronics and Computer Science, University of Southampton, Southampton, U.K
BookMark eNp9kD1PwzAQhi1UJErhByAWS8wpZzsfNlsaAS0qAjVlYIqc5ExdNUlx0qH_nlStGBiYTnp1z72n55IM6qZGQm4YjBkDdf-SxsmYA-djwRSHQJyRIQsC6QGAHJAhREJ4MmLhBbls2zUA833Jh-Rzus-dLekCbW0aV2CFdUfnqF1t6y_aRzRdxgtvMUvbBxrTpNltN1jS95Vu0UtX1nT0tSlxQyd9UNIJ6qqHKnRX5NzoTYvXpzkiH0-Py2Tqzd-eZ0k89wquROfxQqpQFyqXITAVAY84CMl9LVGY3Jg8zDE3Ouw_Rq59zUuugGOpSmn8MAcxInfHu1vXfO-w7bJ1s3N1X5nxCHxQURD6_RY7bhWuaVuHJts6W2m3zxhkB4PZwWB2MJidDPZM9IcpbKc729Sd03bzL3l7JC0i_jYpKSIlAvEDINd-Wg
CODEN ISACEM
CitedBy_id crossref_primary_10_1109_JIOT_2024_3376543
crossref_primary_10_1109_TCOMM_2024_3418910
crossref_primary_10_1109_JPROC_2024_3405351
crossref_primary_10_1109_TCOMM_2024_3364988
crossref_primary_10_1109_TSP_2024_3413017
crossref_primary_10_1109_LCOMM_2023_3324488
crossref_primary_10_1109_TWC_2023_3321395
crossref_primary_10_1016_j_adhoc_2023_103370
crossref_primary_10_1109_TVT_2024_3349509
crossref_primary_10_1109_MNET_129_2200389
crossref_primary_10_3390_e27020210
crossref_primary_10_1016_j_comnet_2024_110960
crossref_primary_10_1109_LWC_2023_3242449
crossref_primary_10_1109_JSTSP_2024_3449124
crossref_primary_10_1016_j_jksuci_2024_102215
crossref_primary_10_1109_TVT_2023_3336260
crossref_primary_10_1145_3571072
crossref_primary_10_1109_LCOMM_2024_3462798
crossref_primary_10_1109_JIOT_2023_3309859
crossref_primary_10_1109_MNET_004_2300271
crossref_primary_10_1109_JIOT_2023_3279357
crossref_primary_10_1109_TWC_2024_3476383
crossref_primary_10_1109_TVT_2024_3419554
crossref_primary_10_1109_LWC_2023_3251357
crossref_primary_10_1109_TWC_2023_3349230
crossref_primary_10_26599_TST_2024_9010086
crossref_primary_10_1109_JIOT_2024_3416334
crossref_primary_10_1109_TCCN_2024_3384500
crossref_primary_10_1109_JIOT_2023_3297241
crossref_primary_10_1016_j_jiixd_2023_06_003
crossref_primary_10_1109_TCOMM_2023_3335411
Cites_doi 10.1109/TVT.2021.3058995
10.1109/LCOMM.2021.3063464
10.1109/JSAC.2020.3018823
10.1109/TWC.2019.2922609
10.1109/COMST.2020.3004197
10.1109/JSAC.2020.3000814
10.1038/s41598-021-99722-x
10.1109/TAP.2015.2481479
10.1109/COMST.2021.3063822
10.1109/TCOMM.2020.3001125
10.1109/TCOMM.2021.3106686
10.1109/LCOMM.2020.3025345
10.1109/LWC.2021.3107547
10.1038/srep04971
10.1109/icc45855.2022.9838767
10.1109/MWC.011.2100016
10.1109/LCOMM.2021.3082214
10.1109/JIOT.2019.2921159
10.1109/TWC.2020.3006915
10.1109/LCOMM.2020.3041510
10.1109/JSAC.2020.3000835
10.1109/TWC.2020.3024860
10.1109/TVT.2021.3109786
10.1109/LCOMM.2021.3091807
10.1109/MSP.2017.2743240
10.1109/TVT.2020.3024756
10.1109/ACCESS.2019.2957706
10.1109/TCCN.2020.2992604
10.1109/TASE.2020.3043636
10.1109/TWC.2021.3118225
10.1109/TVT.2021.3063953
10.1109/SPAWC51858.2021.9593172
10.1109/COMST.2020.2965856
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/JSAC.2022.3192053
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-0008
EndPage 2569
ExternalDocumentID 10_1109_JSAC_2022_3192053
9837935
Genre orig-research
GrantInformation_xml – fundername: China Scholarship Council
  grantid: 201908610187
  funderid: 10.13039/501100004543
– fundername: Engineering and Physical Sciences Research Council
  grantid: EP/P034284/1; EP/P003990/1 (COALESCE)
  funderid: 10.13039/501100000266
– fundername: European Research Council’s Advanced Fellow
  grantid: QuantCom (789028)
  funderid: 10.13039/501100000781
– fundername: Engineering and Physical Sciences Research Council
  grantid: EP/W035588/1
  funderid: 10.13039/501100000266
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
41~
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
ADRHT
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
IBMZZ
ICLAB
IES
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
7SP
8FD
L7M
ID FETCH-LOGICAL-c293t-2c896ac9b860197027203824a8e3fbffb6bebfa6001e2a4a2d2902ed9d8f46b03
IEDL.DBID RIE
ISSN 0733-8716
IngestDate Mon Jun 30 10:20:21 EDT 2025
Tue Jul 01 02:06:32 EDT 2025
Thu Apr 24 23:06:29 EDT 2025
Wed Aug 27 02:22:58 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-2c896ac9b860197027203824a8e3fbffb6bebfa6001e2a4a2d2902ed9d8f46b03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-6389-8941
0000-0003-4914-6425
0000-0001-8351-360X
0000-0002-2636-5214
0000-0003-4890-0748
PQID 2704097564
PQPubID 85481
PageCount 14
ParticipantIDs ieee_primary_9837935
crossref_primary_10_1109_JSAC_2022_3192053
crossref_citationtrail_10_1109_JSAC_2022_3192053
proquest_journals_2704097564
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-09-01
PublicationDateYYYYMMDD 2022-09-01
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE journal on selected areas in communications
PublicationTitleAbbrev J-SAC
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref37
ref14
ref36
Delalleau (ref30) 2019
ref11
ref10
ref32
ref2
ref1
ref17
ref39
ref16
ref38
ref19
ref18
Neunert (ref29); 100
Li (ref31) 2021
Ni (ref28) 2021
ref24
ref23
ref26
ref25
ref20
(ref33) 2018
ref22
ref21
ref27
Li (ref35) 2021
ref8
ref7
ref9
ref4
ref3
ref6
ref5
ref40
Lillicrap (ref34) 2015
References_xml – ident: ref40
  doi: 10.1109/TVT.2021.3058995
– ident: ref22
  doi: 10.1109/LCOMM.2021.3063464
– ident: ref26
  doi: 10.1109/JSAC.2020.3018823
– ident: ref6
  doi: 10.1109/TWC.2019.2922609
– ident: ref11
  doi: 10.1109/COMST.2020.3004197
– ident: ref32
  doi: 10.1109/JSAC.2020.3000814
– ident: ref7
  doi: 10.1038/s41598-021-99722-x
– ident: ref15
  doi: 10.1109/TAP.2015.2481479
– ident: ref36
  doi: 10.1109/COMST.2021.3063822
– ident: ref20
  doi: 10.1109/TCOMM.2020.3001125
– ident: ref5
  doi: 10.1109/TCOMM.2021.3106686
– ident: ref13
  doi: 10.1109/LCOMM.2020.3025345
– ident: ref4
  doi: 10.1109/LWC.2021.3107547
– ident: ref14
  doi: 10.1038/srep04971
– ident: ref19
  doi: 10.1109/icc45855.2022.9838767
– ident: ref10
  doi: 10.1109/MWC.011.2100016
– volume: 100
  start-page: 735
  volume-title: Proc. CoRL
  ident: ref29
  article-title: Continuous-discrete reinforcement learning for hybrid control in robotics
– volume-title: Study on 3D Channel Model for LTE
  year: 2018
  ident: ref33
– ident: ref16
  doi: 10.1109/LCOMM.2021.3082214
– ident: ref39
  doi: 10.1109/JIOT.2019.2921159
– ident: ref2
  doi: 10.1109/TWC.2020.3006915
– ident: ref23
  doi: 10.1109/LCOMM.2020.3041510
– ident: ref24
  doi: 10.1109/JSAC.2020.3000835
– ident: ref25
  doi: 10.1109/TWC.2020.3024860
– ident: ref9
  doi: 10.1109/TVT.2021.3109786
– ident: ref18
  doi: 10.1109/LCOMM.2021.3091807
– ident: ref38
  doi: 10.1109/MSP.2017.2743240
– ident: ref17
  doi: 10.1109/TVT.2020.3024756
– year: 2019
  ident: ref30
  article-title: Discrete and continuous action representation for practical RL in video games
  publication-title: arXiv:1912.11077
– ident: ref1
  doi: 10.1109/ACCESS.2019.2957706
– year: 2021
  ident: ref28
  article-title: STAR-RIS integrated non-orthogonal multiple access and over-the-air federated learning: Framework, analysis, and optimization
  publication-title: arXiv:2106.08592
– ident: ref3
  doi: 10.1109/TCCN.2020.2992604
– ident: ref37
  doi: 10.1109/TASE.2020.3043636
– year: 2015
  ident: ref34
  article-title: Continuous control with deep reinforcement learning
  publication-title: arXiv:1509.02971
– ident: ref12
  doi: 10.1109/TWC.2021.3118225
– year: 2021
  ident: ref35
  article-title: Radio resource management for cellular-connected UAV: A DRL solution
  publication-title: arXiv:2102.13222
– ident: ref27
  doi: 10.1109/TVT.2021.3063953
– ident: ref8
  doi: 10.1109/SPAWC51858.2021.9593172
– ident: ref21
  doi: 10.1109/COMST.2020.2965856
– year: 2021
  ident: ref31
  article-title: HyAR: Addressing discrete-continuous action reinforcement learning via hybrid action representation
  publication-title: arXiv:2109.05490
SSID ssj0014482
Score 2.6034582
Snippet A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) assisted multi-user downlink multiple-input single-output (MISO)...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2556
SubjectTerms Algorithms
Array signal processing
Beamforming
Channel estimation
Communications systems
Computational modeling
deep reinforcement learning (DRL)
Energy consumption
Hybrid control
Machine learning
Markov processes
MISO (control systems)
Optimization
Phase shift
Power consumption
reconfigurable intelligent surfaces (RISs)
Reinforcement learning
simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs)
Stars
Surface waves
Title Hybrid Reinforcement Learning for STAR-RISs: A Coupled Phase-Shift Model Based Beamformer
URI https://ieeexplore.ieee.org/document/9837935
https://www.proquest.com/docview/2704097564
Volume 40
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxEB5BTuXQFmjVUIp84FR1g-N9ZM0tREQBiapKQAqnlR9jQIQEkc0Bfj0e7yZqAVXcVpa9a3nsnfk8M98A7Ftl05SoLy3FVnj7HyPltUBE6XomyRAVUqLw2e9scJGcjtPxGvxa5cIgYgg-wxY9Bl--nZkFXZUdSI-mZJyuw7oHblWu1spj4D8TPAadOI4IBNQezDaXB6ejbs8jQSE8QJWCp_E_OigUVXn1Jw7qpf8JzpYTq6JKbluLUrfM0wvOxvfO_DN8rO1M1q02xias4XQLNv5iH9yGy8EjpWuxIQb2VBMuCllNuHrFfBMbnXeH0fBkND9kXdabLe4naNmfa6_5otH1jSsZlVKbsCPfYNkRqjsygfHhC1z0j897g6iutBAZr-7LSJhcZspInXt8Jjs8OGdzkagcY6ed05lG7RQZRyhUooQVkgu00uYuyTSPv0JjOpviN2DO93AqMbrtbMKNVG3dMTaTTscojcEm8OXaF6amIadqGJMiwBEuCxJXQeIqanE14edqyH3FwfG_ztu0_KuO9co3YXcp4KI-pfNCdDjRfaVZsvP2qO_wgd5dxZTtQqN8WOAPb4SUei_svmdeldej
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTxsxEB5ReqAc2vJSA7T1oSfUDY73kXVvISoKlCCUBAlOKz_GpCIkCDaH8uvr8W4i-lDV28oay9aMvTPjmfkG4JNVNk0J-tJSboW3_zFSXgtEVK5nkgxRIRUK98-z3mVyepVercDnZS0MIobkM2zSZ4jl25mZ01PZofTelIzTF_DS6_20VVVrLWMGfqEQM2jHcURuQB3DbHF5eDrsdL0vKIR3UaXgafyLFgptVf74FwcFc_wG-outVXklt815qZvm6TfUxv_d-1t4XVuarFMdjQ1YwekmrD_DH9yC694PKthiAwz4qSY8FbIacvWG-SE2HHUG0eBk-PiFdVh3Nr-foGUXY6_7ouH4uysZNVObsCM_YNkRqjsygvFhGy6Pv466vajutRAZr_DLSJhcZspInXsPTbZ5CM_mIlE5xk47pzON2ikyj1CoRAkrJBdopc1dkmke78DqdDbFd8Ccp3AqMbrlbMKNVC3dNjaTTscojcEG8AXvC1MDkVM_jEkRHBIuCxJXQeIqanE14GA55b5C4fgX8Raxf0lYc74B-wsBF_U9fSxEmxPgV5olu3-f9RHWeqP-WXF2cv5tD17ROlWG2T6slg9zfO9NklJ_CCfxJ8j-2uw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Hybrid+Reinforcement+Learning+for+STAR-RISs%3A+A+Coupled+Phase-Shift+Model+Based+Beamformer&rft.jtitle=IEEE+journal+on+selected+areas+in+communications&rft.au=Zhong%2C+Ruikang&rft.au=Liu%2C+Yuanwei&rft.au=Mu%2C+Xidong&rft.au=Chen%2C+Yue&rft.date=2022-09-01&rft.pub=IEEE&rft.issn=0733-8716&rft.volume=40&rft.issue=9&rft.spage=2556&rft.epage=2569&rft_id=info:doi/10.1109%2FJSAC.2022.3192053&rft.externalDocID=9837935
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0733-8716&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0733-8716&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0733-8716&client=summon