A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs

This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coo...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning with applications Vol. 21; p. 100714
Main Authors	Mahmood, Jawad, Raja, Muhammad Adil, Loane, John, McCaffery, Fergal
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.09.2025 Elsevier
Subjects	A3C FlightGear ICM UAVs A3C ICM FlightGear RL UAVs
Online Access	Get full text
ISSN	2666-8270 2666-8270
DOI	10.1016/j.mlwa.2025.100714

Cover

Loading…

Abstract	This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions.
AbstractList	This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions.
ArticleNumber	100714
Author	Raja, Muhammad Adil Loane, John Mahmood, Jawad McCaffery, Fergal
Author_xml	– sequence: 1 givenname: Jawad orcidid: 0009-0004-7318-4960 surname: Mahmood fullname: Mahmood, Jawad email: Jawad.Mahmood@dkit.ie – sequence: 2 givenname: Muhammad Adil surname: Raja fullname: Raja, Muhammad Adil email: Adil.Raja@dkit.ie – sequence: 3 givenname: John surname: Loane fullname: Loane, John email: John.Loane@dkit.ie – sequence: 4 givenname: Fergal surname: McCaffery fullname: McCaffery, Fergal email: Fergal.McCaffery@dkit.ie
BookMark	eNp9kMtOwzAQRS1UJKDwA6zyAy3jieOkEpuqvCqBWPDYIctxxpWrNK7stFX_nqRFiBWrmbmjezRzL9ig8Q0xds1hzIHLm-V4Ve_0GAGzToCcixN2jlLKUYE5DP70Z-wqxiUAYMF5mopz9jVNZr6xbrEJuqwpmTdtcE10JpltgvPRtfvkxVebbmV9SHTyTrEtqTpMd7Sl2q9ds-h9VNduQU2bvO10WCUf0894yU6triNd_dQh-3i4f589jZ5fH-ez6fPIIEgxSoUpkKwpOAkDKYhKZJJsmWFOppjIieQCS8ywqAixBDvRhekENGAmoGU6ZPMjt_J6qdbBrXTYK6-dOgg-LJQOrTM1KZFSR88RM0ABYEoOQmeFsSLPKpnajoVHlgk-xkD2l8dB9XmrperzVn3e6ph3Z7o9mqj7cusoqGgcNYYqF8i03RnuP_s3xw-JZw
Cites_doi	10.1007/s10994-019-05845-8 10.1016/j.ifacol.2015.05.071 10.3390/math10142523 10.1109/MRA.2010.937855 10.2514/6.2006-6263 10.3390/e23030274 10.1016/j.oceaneng.2024.118342
ContentType	Journal Article
Copyright	2025 The Authors
Copyright_xml	– notice: 2025 The Authors
DBID	6I. AAFTH AAYXX CITATION DOA
DOI	10.1016/j.mlwa.2025.100714
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
EISSN	2666-8270
ExternalDocumentID	oai_doaj_org_article_43ed45722502400cb104a58cf475d63f 10_1016_j_mlwa_2025_100714 S2666827025000970
GroupedDBID	0R~ 6I. AAEDW AAFTH AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AEXQZ AFJKZ AFPUW AIGII AITUG AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP EBS FDB GROUPED_DOAJ M~E OK1 AAYXX CITATION
ID	FETCH-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63
IEDL.DBID	DOA
ISSN	2666-8270
IngestDate	Wed Aug 27 01:29:13 EDT 2025 Wed Aug 06 18:55:50 EDT 2025 Sat Sep 06 17:18:22 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	A3C ICM FlightGear RL UAVs
Language	English
License	This is an open access article under the CC BY license.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63
ORCID	0009-0004-7318-4960
OpenAccessLink	https://doaj.org/article/43ed45722502400cb104a58cf475d63f
ParticipantIDs	doaj_primary_oai_doaj_org_article_43ed45722502400cb104a58cf475d63f crossref_primary_10_1016_j_mlwa_2025_100714 elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100714
PublicationCentury	2000
PublicationDate	September 2025 2025-09-00 2025-09-01
PublicationDateYYYYMMDD	2025-09-01
PublicationDate_xml	– month: 09 year: 2025 text: September 2025
PublicationDecade	2020
PublicationTitle	Machine learning with applications
PublicationYear	2025
Publisher	Elsevier Ltd Elsevier
Publisher_xml	– name: Elsevier Ltd – name: Elsevier
References	Nahhas, Kharitonov, Turowski (b13) 2022 Zhang, Zhou, Xu (b22) 2015 Lin, Lai, Chen, Cao, Wang (b9) 2022; 10 Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In Bougie, Ichise (b4) 2020; 109 Babaeizadeh, Frosio, Tyree, Clemons, Kautz (b3) 2016 Michael, Mellinger, Lindsey, Kumar (b11) 2010; 17 Moness, Mostafa, Abdel-Fadeel, Aly, Al-Shamandy (b12) 2012; Vol. 8 Wang, Li, Sun, Fu, Cheng, Ye (b17) 2024 Aschauer, Schirrer, Kozek (b2) 2015; 48 Sonu, Doshi (b14) 2012 Wu, Yu, Liao, Ou (b19) 2024; 308 Li, Gajane (b7) 2023 Li, Lu, Li, Lu, Cai, Wang (b8) 2019 Zhang, Geng, Fei (b21) 2012 Sun, Chai, Wang, Sun, Wu, Wang (b16) 2025 Zhou, Wang, Hu, Deng (b25) 2021; 23 Habib, Malik, Rahman, Raja (b6) 2017 Mahmood (b10) 2025 Zheng, Chen, Wang, He, Hu, Chen (b24) 2021; 34 Wang, Liu, Li, Amani, Zhou, Yang (b18) 2024 Colas, Fournier, Chetouani, Sigaud, Oudeyer (b5) 2019 Stadie, Zhang, Ba (b15) 2020 Ahmed, Quinones-Grueiro, Biswas (b1) 2022 Zhelo, Zhang, Tai, Liu, Burgard (b23) 2018 (p. 6263). Habib (10.1016/j.mlwa.2025.100714_b6) 2017 Sonu (10.1016/j.mlwa.2025.100714_b14) 2012 Mahmood (10.1016/j.mlwa.2025.100714_b10) 2025 Michael (10.1016/j.mlwa.2025.100714_b11) 2010; 17 Babaeizadeh (10.1016/j.mlwa.2025.100714_b3) 2016 Wang (10.1016/j.mlwa.2025.100714_b17) 2024 Colas (10.1016/j.mlwa.2025.100714_b5) 2019 Bougie (10.1016/j.mlwa.2025.100714_b4) 2020; 109 Nahhas (10.1016/j.mlwa.2025.100714_b13) 2022 Moness (10.1016/j.mlwa.2025.100714_b12) 2012; Vol. 8 Stadie (10.1016/j.mlwa.2025.100714_b15) 2020 10.1016/j.mlwa.2025.100714_b20 Wang (10.1016/j.mlwa.2025.100714_b18) 2024 Zheng (10.1016/j.mlwa.2025.100714_b24) 2021; 34 Lin (10.1016/j.mlwa.2025.100714_b9) 2022; 10 Zhang (10.1016/j.mlwa.2025.100714_b22) 2015 Wu (10.1016/j.mlwa.2025.100714_b19) 2024; 308 Zhelo (10.1016/j.mlwa.2025.100714_b23) 2018 Li (10.1016/j.mlwa.2025.100714_b8) 2019 Li (10.1016/j.mlwa.2025.100714_b7) 2023 Zhang (10.1016/j.mlwa.2025.100714_b21) 2012 Zhou (10.1016/j.mlwa.2025.100714_b25) 2021; 23 Aschauer (10.1016/j.mlwa.2025.100714_b2) 2015; 48 Sun (10.1016/j.mlwa.2025.100714_b16) 2025 Ahmed (10.1016/j.mlwa.2025.100714_b1) 2022
References_xml	– year: 2024 ident: b18 article-title: Hyper: Hyperparameter robust efficient exploration in reinforcement learning – start-page: 2231 year: 2012 end-page: 2234 ident: b21 article-title: UAV flight control system modeling and simulation based on FlightGear publication-title: International conference on automatic control and artificial intelligence – year: 2018 ident: b23 article-title: Curiosity-driven exploration for mapless navigation with deep reinforcement learning – year: 2025 ident: b10 article-title: Video demonstration of experimental simulation of target tracking of UAVs based on distributed networking framework – volume: Vol. 8 start-page: 1 year: 2012 end-page: 15 ident: b12 article-title: Automatic control education using FlightGear and MATLAB based virtual lab publication-title: The international conference on electrical engineering – start-page: 1507 year: 2012 end-page: 1508 ident: b14 article-title: Gatac: A scalable and realistic testbed for multiagent decision making publication-title: AAMAS – volume: 23 start-page: 274 year: 2021 ident: b25 article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection publication-title: Entropy – year: 2025 ident: b16 article-title: Curiosity-driven reinforcement learning from human feedback – year: 2016 ident: b3 article-title: Reinforcement learning through asynchronous advantage actor-critic on a gpu – volume: 109 start-page: 493 year: 2020 end-page: 512 ident: b4 article-title: Skill-based curiosity for intrinsically motivated reinforcement learning publication-title: Machine Learning – year: 2024 ident: b17 article-title: Llm can achieve self-regulation via hyperparameter aware generation – reference: (p. 6263). – volume: 10 start-page: 2523 year: 2022 ident: b9 article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL publication-title: Mathematics – volume: 17 start-page: 56 year: 2010 end-page: 65 ident: b11 article-title: The grasp multiple micro-uav testbed publication-title: IEEE Robotics & Automation Magazine – year: 2023 ident: b7 article-title: Curiosity-driven exploration in sparse-reward multi-agent reinforcement learning – start-page: 450 year: 2015 end-page: 454 ident: b22 article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE publication-title: 2015 international conference on computational science and engineering – start-page: 1331 year: 2019 end-page: 1340 ident: b5 article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning publication-title: International conference on machine learning – start-page: 1 year: 2022 end-page: 10 ident: b1 article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning publication-title: 2022 IEEE/aIAA 41st digital avionics systems conference – volume: 308 year: 2024 ident: b19 article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV publication-title: Ocean Engineering – reference: Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In – start-page: 111 year: 2020 end-page: 120 ident: b15 article-title: Learning intrinsic rewards as a bi-level optimization problem publication-title: Conference on uncertainty in artificial intelligence – volume: 34 start-page: 3757 year: 2021 end-page: 3769 ident: b24 article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration publication-title: Advances in Neural Information Processing Systems – start-page: 1109 year: 2019 end-page: 1114 ident: b8 article-title: Curiosity-driven exploration for off-policy reinforcement learning methods publication-title: 2019 IEEE international conference on robotics and biomimetics – start-page: 185 year: 2017 end-page: 192 ident: b6 article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles publication-title: 2017 international conference on communication, computing and digital systems – year: 2022 ident: b13 article-title: Deep reinforcement learning techniques for solving hybrid flow shop scheduling problems: Proximal policy optimization (PPO) and asynchronous advantage actor-critic (A3C) – volume: 48 start-page: 67 year: 2015 end-page: 72 ident: b2 article-title: Co-simulation of matlab and flightgear for identification and control of aircraft publication-title: IFAC-PapersOnLine – volume: 109 start-page: 493 year: 2020 ident: 10.1016/j.mlwa.2025.100714_b4 article-title: Skill-based curiosity for intrinsically motivated reinforcement learning publication-title: Machine Learning doi: 10.1007/s10994-019-05845-8 – start-page: 450 year: 2015 ident: 10.1016/j.mlwa.2025.100714_b22 article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE – start-page: 2231 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b21 article-title: UAV flight control system modeling and simulation based on FlightGear – year: 2018 ident: 10.1016/j.mlwa.2025.100714_b23 – volume: 48 start-page: 67 issue: 1 year: 2015 ident: 10.1016/j.mlwa.2025.100714_b2 article-title: Co-simulation of matlab and flightgear for identification and control of aircraft publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2015.05.071 – year: 2025 ident: 10.1016/j.mlwa.2025.100714_b16 – year: 2023 ident: 10.1016/j.mlwa.2025.100714_b7 – year: 2022 ident: 10.1016/j.mlwa.2025.100714_b13 – volume: 10 start-page: 2523 issue: 14 year: 2022 ident: 10.1016/j.mlwa.2025.100714_b9 article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL publication-title: Mathematics doi: 10.3390/math10142523 – volume: 17 start-page: 56 issue: 3 year: 2010 ident: 10.1016/j.mlwa.2025.100714_b11 article-title: The grasp multiple micro-uav testbed publication-title: IEEE Robotics & Automation Magazine doi: 10.1109/MRA.2010.937855 – year: 2024 ident: 10.1016/j.mlwa.2025.100714_b18 – volume: Vol. 8 start-page: 1 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b12 article-title: Automatic control education using FlightGear and MATLAB based virtual lab – volume: 34 start-page: 3757 year: 2021 ident: 10.1016/j.mlwa.2025.100714_b24 article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration publication-title: Advances in Neural Information Processing Systems – year: 2016 ident: 10.1016/j.mlwa.2025.100714_b3 – start-page: 1507 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b14 article-title: Gatac: A scalable and realistic testbed for multiagent decision making – start-page: 185 year: 2017 ident: 10.1016/j.mlwa.2025.100714_b6 article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles – ident: 10.1016/j.mlwa.2025.100714_b20 doi: 10.2514/6.2006-6263 – start-page: 1 year: 2022 ident: 10.1016/j.mlwa.2025.100714_b1 article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning – year: 2025 ident: 10.1016/j.mlwa.2025.100714_b10 – start-page: 1109 year: 2019 ident: 10.1016/j.mlwa.2025.100714_b8 article-title: Curiosity-driven exploration for off-policy reinforcement learning methods – volume: 23 start-page: 274 issue: 3 year: 2021 ident: 10.1016/j.mlwa.2025.100714_b25 article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection publication-title: Entropy doi: 10.3390/e23030274 – start-page: 111 year: 2020 ident: 10.1016/j.mlwa.2025.100714_b15 article-title: Learning intrinsic rewards as a bi-level optimization problem – volume: 308 year: 2024 ident: 10.1016/j.mlwa.2025.100714_b19 article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV publication-title: Ocean Engineering doi: 10.1016/j.oceaneng.2024.118342 – year: 2024 ident: 10.1016/j.mlwa.2025.100714_b17 – start-page: 1331 year: 2019 ident: 10.1016/j.mlwa.2025.100714_b5 article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning
SSID	ssj0002811334
Score	2.3020535
Snippet	This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target...
SourceID	doaj crossref elsevier
SourceType	Open Website Index Database Publisher
StartPage	100714
SubjectTerms	A3C FlightGear ICM UAVs
Title	A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs
URI	https://dx.doi.org/10.1016/j.mlwa.2025.100714 https://doaj.org/article/43ed45722502400cb104a58cf475d63f
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykxdRVJy_yMGbFNs0adPjnBubMC9usouU5kelw1WpK8OLf7svSTfqRS9eCmlLUr4X8r6XvvcFoStf0SxUGfdoEEuP8oh7mYxzjwoRgzsDDmy3BiYP0WhG7-ds3jrqy-SEOXlgB9wNDbWiLIZpZ9S4fCkgfsgYlzmNmYrC3Ky-fuK3gqmF3TIKIPiiTZWMS-havq6N0BBhNjMgoD88kRXsbzmklpMZ7qO9hh3invuqA7Sjy0P03MOmLq94qStT54TH5aoqSkAX9-uqMElXn3jypmp4BAwUZ3gKK73QyrbutkVReLxV31zhx3VWLfGs9_RxhGbDwbQ_8ppTETxJgD94IZWc6FzyQFNpfmwCPJHOBSOxltyI3QSUCMIIV5oQ4edJxiXcINKXiZ9F4THqlG-lPkEYQh9mCZ7QlIokFEwpYZoKlk_tiy663iCUvjvxi3STFbZIDZ6pwTN1eHbRrQFx-6YRrrY3wJxpY870L3N2EduYIG04gPPt0FXxy-Cn_zH4Gdo1XboUsnPUWVW1vgDOsRKXdnrBdfI1-AZHydJZ
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Configurable+Intrinsic+Curiosity+Module+for+a+Testbed+for+Developing+Intelligent+Swarm+UAVs&rft.jtitle=Machine+learning+with+applications&rft.au=Mahmood%2C+Jawad&rft.au=Raja%2C+Muhammad+Adil&rft.au=Loane%2C+John&rft.au=McCaffery%2C+Fergal&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100714&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100714&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100714
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon