Multi-Agent Deep Reinforcement Learning for Persistent Monitoring With Sensing, Communication, and Localization Constraints

Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic syste...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automation science and engineering Vol. 22; pp. 2831 - 2843
Main Authors Mishra, Manav, Poddar, Prithvi, Agrawal, Rajat, Chen, Jingxi, Tokekar, Pratap, Sujit, P. B.
Format Journal Article
LanguageEnglish
Published IEEE 01.01.2025
Subjects
Online AccessGet full text
ISSN1545-5955
1558-3783
DOI10.1109/TASE.2024.3385412

Cover

Loading…
Abstract Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team's objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments- 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners-Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
AbstractList Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team's objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments- 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners-Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
Author Mishra, Manav
Sujit, P. B.
Tokekar, Pratap
Poddar, Prithvi
Chen, Jingxi
Agrawal, Rajat
Author_xml – sequence: 1
  givenname: Manav
  orcidid: 0009-0000-7733-607X
  surname: Mishra
  fullname: Mishra, Manav
  email: mishra20@iiserb.ac.in
  organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India
– sequence: 2
  givenname: Prithvi
  orcidid: 0000-0003-1172-8294
  surname: Poddar
  fullname: Poddar, Prithvi
  email: prithvi.poddar99@gmail.com
  organization: Department of Mechanical and Aerospace Engineering, University at Buffalo, Buffalo, NY, USA
– sequence: 3
  givenname: Rajat
  orcidid: 0009-0005-8184-2537
  surname: Agrawal
  fullname: Agrawal, Rajat
  email: rajatagrawal1307@gmail.com
  organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India
– sequence: 4
  givenname: Jingxi
  orcidid: 0000-0002-1953-8041
  surname: Chen
  fullname: Chen, Jingxi
  email: ianchen@umd.edu
  organization: Department of Computer Science, University of Maryland, College Park, MD, USA
– sequence: 5
  givenname: Pratap
  orcidid: 0000-0002-3715-0382
  surname: Tokekar
  fullname: Tokekar, Pratap
  email: tokekar@umd.edu
  organization: Department of Computer Science, University of Maryland, College Park, MD, USA
– sequence: 6
  givenname: P. B.
  orcidid: 0000-0002-7297-1493
  surname: Sujit
  fullname: Sujit, P. B.
  email: sujit@iiserb.ac.in
  organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India
BookMark eNp9UMtOwzAQtFCRaAsfgMTBH9AUO47j5FiV8pBSgWgRx8g4m2KUOJXtHoCfJ257QBw47e7Mzmp2RmhgOgMIXVIypZTk1-vZajGNSZxMGct4QuMTNKScZxETGRuEPuERzzk_QyPnPki_meVkiL6Xu8braLYB4_ENwBY_gzZ1ZxW0ASpAWqPNBvcQfgLrtPMBX3ZG-84G5lX7d7wC4_phgudd2-6MVtLrzkywNBUuOiUb_bVHet44b6U23p2j01o2Di6OdYxebhfr-X1UPN49zGdFpOI09ZGksRK1UILTVEEsRV5BBakSQGJCgFZvVSJEVjFF6_5lkmYVB0hSRlNClQQ2RuJwV9nOOQt1qbTfuwlGmpKSMmRYhgzLkGF5zLBX0j_KrdWttJ__aq4OGg0Av_aTPMkzzn4AqzeCIg
CODEN ITASC7
CitedBy_id crossref_primary_10_3390_s25020350
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TASE.2024.3385412
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-3783
EndPage 2843
ExternalDocumentID 10_1109_TASE_2024_3385412
10494985
Genre orig-research
GrantInformation_xml – fundername: ONR
  grantid: N00014-18-1-2829
  funderid: 10.13039/100000006
– fundername: Amazon Research Award
– fundername: Prime Minister Research Fellowship (PMRF)
– fundername: NSF
  grantid: 1943368
  funderid: 10.13039/100000001
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AIBXA
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
H~9
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
RIG
ID FETCH-LOGICAL-c266t-a12c7f7c7516ce2a79dede6c7e0200e1dbd4778d3c1f155068d5ee4631601cae3
IEDL.DBID RIE
ISSN 1545-5955
IngestDate Tue Jul 01 02:56:36 EDT 2025
Thu Apr 24 22:52:34 EDT 2025
Wed Aug 27 01:53:09 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c266t-a12c7f7c7516ce2a79dede6c7e0200e1dbd4778d3c1f155068d5ee4631601cae3
ORCID 0009-0000-7733-607X
0000-0003-1172-8294
0000-0002-1953-8041
0009-0005-8184-2537
0000-0002-3715-0382
0000-0002-7297-1493
PageCount 13
ParticipantIDs crossref_citationtrail_10_1109_TASE_2024_3385412
crossref_primary_10_1109_TASE_2024_3385412
ieee_primary_10494985
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationTitle IEEE transactions on automation science and engineering
PublicationTitleAbbrev TASE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0024890
Score 2.406398
Snippet Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS...
SourceID crossref
ieee
SourceType Enrichment Source
Index Database
Publisher
StartPage 2831
SubjectTerms Deep reinforcement learning
graph neural networks
Location awareness
Monitoring
Multi-agent deep reinforcement learning (MARL)
persistent monitoring (PM)
Robot sensing systems
Sensors
Surveillance
Uncertainty
Title Multi-Agent Deep Reinforcement Learning for Persistent Monitoring With Sensing, Communication, and Localization Constraints
URI https://ieeexplore.ieee.org/document/10494985
Volume 22
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZoJxh4FlFe8sCEmpCkdh5jBa0Qgg60Fd0ix74UBEorSBf48_gcF1okEFt0sqOTztY9fN93hJyxTORc4U3TzsxhEAkn4_jgmAshggypJxHvfNcPr0fsZszHFqxusDAAYJrPwMVP85avpnKOpTJ9w5FLJeY1UtOZWwXW-ibWi01BBUMChyec2ydM30suhp1BV6eCAXN1QsaZH6w4oaWpKsap9LZIf6FO1Uvy7M7LzJXvP5ga_63vNtm04SXtVOdhh6xBsUs2lkgH98iHwdw6HcRU0SuAGb0HQ58qTaWQWsbVCdUiig3yeBC0vLr9-A_68FQ-0gG2vheTFl3BmLSoKBS9RQ9pEZ4UZ4KaSRTlW4OMet3h5bVjRzA4Unvu0hF-IKM8khH3zeiwKFGgIJQR6DDTA19likVRrNrSzzHZCWPFAVjY9nWiJwW090m9mBZwQKgXSiYhjEHHHCz3mUh0qOa1OQszL8-DoEm8hU1SafnJUbmX1OQpXpKiGVM0Y2rN2CTnX1tmFTnHX4sbaKGlhZVxDn-RH5H1AGf9mnLLMamXr3M40QFImZ2ag_cJ5NzYTA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagDMDAG1GeHphQU5LUzmOsoFWBtgNtRbfIsS8FgdIK0gX-PD4nhRYJxBad7Oiks3UP3_cdIecsFglXeNO0M7MY-MKKOT44JkIIN0bqScQ7d7pea8Buh3xYgNUNFgYATPMZVPHTvOWrsZxiqUzfcORSCfgyWdGOn4U5XOubWi8wJRUMCiwecl48Yjp2eNmv9xo6GXRZVadknDnughuam6ti3Epzk3RnCuXdJM_VaRZX5fsPrsZ_a7xFNooAk9bzE7FNliDdIetztIO75MOgbq06oqroNcCE3oMhUJWmVkgLztUR1SKKLfJ4FLQ8v__4D_rwlD3SHja_p6MKXUCZVKhIFW2jjywwnhSngppZFNnbHhk0G_2rllUMYbCk9t2ZJRxX-okvfe6Y4WF-qECBJ33QgaYNjooV8_1A1aSTYLrjBYoDMK_m6FRPCqjtk1I6TuGAUNuTTIIXgI46WOIwEepgza5x5sV2krhumdgzm0SyYChH5V4ik6nYYYRmjNCMUWHGMrn42jLJ6Tn-WryHFppbmBvn8Bf5GVlt9TvtqH3TvTsiay5O_jXFl2NSyl6ncKLDkSw-NYfwE2ga25w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-Agent+Deep+Reinforcement+Learning+for+Persistent+Monitoring+With+Sensing%2C+Communication%2C+and+Localization+Constraints&rft.jtitle=IEEE+transactions+on+automation+science+and+engineering&rft.au=Mishra%2C+Manav&rft.au=Poddar%2C+Prithvi&rft.au=Agrawal%2C+Rajat&rft.au=Chen%2C+Jingxi&rft.date=2025-01-01&rft.pub=IEEE&rft.issn=1545-5955&rft.volume=22&rft.spage=2831&rft.epage=2843&rft_id=info:doi/10.1109%2FTASE.2024.3385412&rft.externalDocID=10494985
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1545-5955&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1545-5955&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1545-5955&client=summon