Multi-Agent Deep Reinforcement Learning for Persistent Monitoring With Sensing, Communication, and Localization Constraints

Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic syste...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automation science and engineering Vol. 22; pp. 2831 - 2843
Main Authors	Mishra, Manav, Poddar, Prithvi, Agrawal, Rajat, Chen, Jingxi, Tokekar, Pratap, Sujit, P. B.
Format	Journal Article
Language	English
Published	IEEE 01.01.2025
Subjects	Deep reinforcement learning graph neural networks Location awareness Monitoring Multi-agent deep reinforcement learning (MARL) persistent monitoring (PM) Robot sensing systems Sensors Surveillance Uncertainty
Online Access	Get full text
ISSN	1545-5955 1558-3783
DOI	10.1109/TASE.2024.3385412

Cover

Loading…

Abstract	Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team's objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments- 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners-Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
AbstractList	Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS environments is a challenging problem. To take the localization constraints into account, in this paper, we consider a heterogeneous robotic system consisting of two types of agents: anchor agents with accurate localization capability and auxiliary agents with low localization accuracy. To localize itself, the auxiliary agents must be within the communication range of an anchor, directly or indirectly. The robotic team's objective is to minimize environmental uncertainty through persistent monitoring. We propose a multi-agent deep reinforcement learning (MARL) based architecture with graph convolution called Graph Localized Proximal Policy Optimization (GALOPP), which incorporates the limited sensor field-of-view, communication, and localization constraints of the agents along with persistent monitoring objectives to determine motion policies for each agent. We evaluate the performance of GALOPP on open maps with obstacles having a different number of anchor and auxiliary agents. We further study 1) the effect of communication range, obstacle density, and sensing range on the performance and 2) compare the performance of GALOPP with area partition, greedy search, random search, and random search with communication constraint strategies. For its generalization capability, we also evaluated GALOPP in two different environments- 2-room and 4-room. The results show that GALOPP learns the policies and monitors the area well. As a proof-of-concept, we perform hardware experiments to demonstrate the performance of GALOPP. Note to Practitioners-Persistent monitoring is performed in various applications like search and rescue, border patrol, wildlife monitoring, etc. Typically, these applications are large-scale, and hence using a multi-robot system helps achieve the mission objectives effectively. Often, the robots are subject to limited sensing range and communication range, and they may need to operate in GPS-denied areas. In such scenarios, developing motion planning policies for the robots is difficult. Due to the lack of GPS, alternative localization mechanisms, like SLAM, high-accurate INS, UWB radio, etc. are essential. Having SLAM or a highly accurate INS system is expensive, and hence we use agents having a combination of expensive, accurate localization systems (anchor agents) and low-cost INS systems (auxiliary agents) whose localization can be made accurate using cooperative localization techniques. To determine efficient motion policies, we use a multi-agent deep reinforcement learning technique (GALOPP) that takes the heterogeneity in the vehicle localization capability, limited sensing, and communication constraints into account. GALOPP is evaluated using simulations and compared with baselines like random search, random search with ensured communication, greedy search, and area partitioning. The results show that GALOPP outperforms the baselines. The GALOPP approach offers a generic solution that be adopted with various other applications.
Author	Mishra, Manav Sujit, P. B. Tokekar, Pratap Poddar, Prithvi Chen, Jingxi Agrawal, Rajat
Author_xml	– sequence: 1 givenname: Manav orcidid: 0009-0000-7733-607X surname: Mishra fullname: Mishra, Manav email: mishra20@iiserb.ac.in organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India – sequence: 2 givenname: Prithvi orcidid: 0000-0003-1172-8294 surname: Poddar fullname: Poddar, Prithvi email: prithvi.poddar99@gmail.com organization: Department of Mechanical and Aerospace Engineering, University at Buffalo, Buffalo, NY, USA – sequence: 3 givenname: Rajat orcidid: 0009-0005-8184-2537 surname: Agrawal fullname: Agrawal, Rajat email: rajatagrawal1307@gmail.com organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India – sequence: 4 givenname: Jingxi orcidid: 0000-0002-1953-8041 surname: Chen fullname: Chen, Jingxi email: ianchen@umd.edu organization: Department of Computer Science, University of Maryland, College Park, MD, USA – sequence: 5 givenname: Pratap orcidid: 0000-0002-3715-0382 surname: Tokekar fullname: Tokekar, Pratap email: tokekar@umd.edu organization: Department of Computer Science, University of Maryland, College Park, MD, USA – sequence: 6 givenname: P. B. orcidid: 0000-0002-7297-1493 surname: Sujit fullname: Sujit, P. B. email: sujit@iiserb.ac.in organization: Department of Electrical Engineering and Computer Science, IISER Bhopal, Bhopal, India
BookMark	eNp9UMtOwzAQtFCRaAsfgMTBH9AUO47j5FiV8pBSgWgRx8g4m2KUOJXtHoCfJ257QBw47e7Mzmp2RmhgOgMIXVIypZTk1-vZajGNSZxMGct4QuMTNKScZxETGRuEPuERzzk_QyPnPki_meVkiL6Xu8braLYB4_ENwBY_gzZ1ZxW0ASpAWqPNBvcQfgLrtPMBX3ZG-84G5lX7d7wC4_phgudd2-6MVtLrzkywNBUuOiUb_bVHet44b6U23p2j01o2Di6OdYxebhfr-X1UPN49zGdFpOI09ZGksRK1UILTVEEsRV5BBakSQGJCgFZvVSJEVjFF6_5lkmYVB0hSRlNClQQ2RuJwV9nOOQt1qbTfuwlGmpKSMmRYhgzLkGF5zLBX0j_KrdWttJ__aq4OGg0Av_aTPMkzzn4AqzeCIg
CODEN	ITASC7
CitedBy_id	crossref_primary_10_3390_s25020350
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TASE.2024.3385412
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-3783
EndPage	2843
ExternalDocumentID	10_1109_TASE_2024_3385412 10494985
Genre	orig-research
GrantInformation_xml	– fundername: ONR grantid: N00014-18-1-2829 funderid: 10.13039/100000006 – fundername: Amazon Research Award – fundername: Prime Minister Research Fellowship (PMRF) – fundername: NSF grantid: 1943368 funderid: 10.13039/100000001
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AIBXA AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ H~9 IFIPE IPLJI JAVBF LAI M43 O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG
ID	FETCH-LOGICAL-c266t-a12c7f7c7516ce2a79dede6c7e0200e1dbd4778d3c1f155068d5ee4631601cae3
IEDL.DBID	RIE
ISSN	1545-5955
IngestDate	Tue Jul 01 02:56:36 EDT 2025 Thu Apr 24 22:52:34 EDT 2025 Wed Aug 27 01:53:09 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c266t-a12c7f7c7516ce2a79dede6c7e0200e1dbd4778d3c1f155068d5ee4631601cae3
ORCID	0009-0000-7733-607X 0000-0003-1172-8294 0000-0002-1953-8041 0009-0005-8184-2537 0000-0002-3715-0382 0000-0002-7297-1493
PageCount	13
ParticipantIDs	crossref_citationtrail_10_1109_TASE_2024_3385412 crossref_primary_10_1109_TASE_2024_3385412 ieee_primary_10494985
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2025-01-01
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– month: 01 year: 2025 text: 2025-01-01 day: 01
PublicationDecade	2020
PublicationTitle	IEEE transactions on automation science and engineering
PublicationTitleAbbrev	TASE
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0024890
Score	2.406398
Snippet	Determining multi-robot motion policies for persistently monitoring a region with limited sensing, communication, and localization constraints in non-GPS...
SourceID	crossref ieee
SourceType	Enrichment Source Index Database Publisher
StartPage	2831
SubjectTerms	Deep reinforcement learning graph neural networks Location awareness Monitoring Multi-agent deep reinforcement learning (MARL) persistent monitoring (PM) Robot sensing systems Sensors Surveillance Uncertainty
Title	Multi-Agent Deep Reinforcement Learning for Persistent Monitoring With Sensing, Communication, and Localization Constraints
URI	https://ieeexplore.ieee.org/document/10494985
Volume	22
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZoJxh4FlFe8sCEmpCkdh5jBa0Qgg60Fd0ix74UBEorSBf48_gcF1okEFt0sqOTztY9fN93hJyxTORc4U3TzsxhEAkn4_jgmAshggypJxHvfNcPr0fsZszHFqxusDAAYJrPwMVP85avpnKOpTJ9w5FLJeY1UtOZWwXW-ibWi01BBUMChyec2ydM30suhp1BV6eCAXN1QsaZH6w4oaWpKsap9LZIf6FO1Uvy7M7LzJXvP5ga_63vNtm04SXtVOdhh6xBsUs2lkgH98iHwdw6HcRU0SuAGb0HQ58qTaWQWsbVCdUiig3yeBC0vLr9-A_68FQ-0gG2vheTFl3BmLSoKBS9RQ9pEZ4UZ4KaSRTlW4OMet3h5bVjRzA4Unvu0hF-IKM8khH3zeiwKFGgIJQR6DDTA19likVRrNrSzzHZCWPFAVjY9nWiJwW090m9mBZwQKgXSiYhjEHHHCz3mUh0qOa1OQszL8-DoEm8hU1SafnJUbmX1OQpXpKiGVM0Y2rN2CTnX1tmFTnHX4sbaKGlhZVxDn-RH5H1AGf9mnLLMamXr3M40QFImZ2ag_cJ5NzYTA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagDMDAG1GeHphQU5LUzmOsoFWBtgNtRbfIsS8FgdIK0gX-PD4nhRYJxBad7Oiks3UP3_cdIecsFglXeNO0M7MY-MKKOT44JkIIN0bqScQ7d7pea8Buh3xYgNUNFgYATPMZVPHTvOWrsZxiqUzfcORSCfgyWdGOn4U5XOubWi8wJRUMCiwecl48Yjp2eNmv9xo6GXRZVadknDnughuam6ti3Epzk3RnCuXdJM_VaRZX5fsPrsZ_a7xFNooAk9bzE7FNliDdIetztIO75MOgbq06oqroNcCE3oMhUJWmVkgLztUR1SKKLfJ4FLQ8v__4D_rwlD3SHja_p6MKXUCZVKhIFW2jjywwnhSngppZFNnbHhk0G_2rllUMYbCk9t2ZJRxX-okvfe6Y4WF-qECBJ33QgaYNjooV8_1A1aSTYLrjBYoDMK_m6FRPCqjtk1I6TuGAUNuTTIIXgI46WOIwEepgza5x5sV2krhumdgzm0SyYChH5V4ik6nYYYRmjNCMUWHGMrn42jLJ6Tn-WryHFppbmBvn8Bf5GVlt9TvtqH3TvTsiay5O_jXFl2NSyl6ncKLDkSw-NYfwE2ga25w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-Agent+Deep+Reinforcement+Learning+for+Persistent+Monitoring+With+Sensing%2C+Communication%2C+and+Localization+Constraints&rft.jtitle=IEEE+transactions+on+automation+science+and+engineering&rft.au=Mishra%2C+Manav&rft.au=Poddar%2C+Prithvi&rft.au=Agrawal%2C+Rajat&rft.au=Chen%2C+Jingxi&rft.date=2025-01-01&rft.pub=IEEE&rft.issn=1545-5955&rft.volume=22&rft.spage=2831&rft.epage=2843&rft_id=info:doi/10.1109%2FTASE.2024.3385412&rft.externalDocID=10494985
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1545-5955&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1545-5955&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1545-5955&client=summon