A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs

This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coo...

Full description

Saved in:
Bibliographic Details
Published inMachine learning with applications Vol. 21; p. 100714
Main Authors Mahmood, Jawad, Raja, Muhammad Adil, Loane, John, McCaffery, Fergal
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2025
Elsevier
Subjects
Online AccessGet full text
ISSN2666-8270
2666-8270
DOI10.1016/j.mlwa.2025.100714

Cover

Loading…
Abstract This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions.
AbstractList This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions.
ArticleNumber 100714
Author Raja, Muhammad Adil
Loane, John
Mahmood, Jawad
McCaffery, Fergal
Author_xml – sequence: 1
  givenname: Jawad
  orcidid: 0009-0004-7318-4960
  surname: Mahmood
  fullname: Mahmood, Jawad
  email: Jawad.Mahmood@dkit.ie
– sequence: 2
  givenname: Muhammad Adil
  surname: Raja
  fullname: Raja, Muhammad Adil
  email: Adil.Raja@dkit.ie
– sequence: 3
  givenname: John
  surname: Loane
  fullname: Loane, John
  email: John.Loane@dkit.ie
– sequence: 4
  givenname: Fergal
  surname: McCaffery
  fullname: McCaffery, Fergal
  email: Fergal.McCaffery@dkit.ie
BookMark eNp9kMtOwzAQRS1UJKDwA6zyAy3jieOkEpuqvCqBWPDYIctxxpWrNK7stFX_nqRFiBWrmbmjezRzL9ig8Q0xds1hzIHLm-V4Ve_0GAGzToCcixN2jlLKUYE5DP70Z-wqxiUAYMF5mopz9jVNZr6xbrEJuqwpmTdtcE10JpltgvPRtfvkxVebbmV9SHTyTrEtqTpMd7Sl2q9ds-h9VNduQU2bvO10WCUf0894yU6triNd_dQh-3i4f589jZ5fH-ez6fPIIEgxSoUpkKwpOAkDKYhKZJJsmWFOppjIieQCS8ywqAixBDvRhekENGAmoGU6ZPMjt_J6qdbBrXTYK6-dOgg-LJQOrTM1KZFSR88RM0ABYEoOQmeFsSLPKpnajoVHlgk-xkD2l8dB9XmrperzVn3e6ph3Z7o9mqj7cusoqGgcNYYqF8i03RnuP_s3xw-JZw
Cites_doi 10.1007/s10994-019-05845-8
10.1016/j.ifacol.2015.05.071
10.3390/math10142523
10.1109/MRA.2010.937855
10.2514/6.2006-6263
10.3390/e23030274
10.1016/j.oceaneng.2024.118342
ContentType Journal Article
Copyright 2025 The Authors
Copyright_xml – notice: 2025 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOA
DOI 10.1016/j.mlwa.2025.100714
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2666-8270
ExternalDocumentID oai_doaj_org_article_43ed45722502400cb104a58cf475d63f
10_1016_j_mlwa_2025_100714
S2666827025000970
GroupedDBID 0R~
6I.
AAEDW
AAFTH
AALRI
AAXUO
AAYWO
ACVFH
ADCNI
ADVLN
AEUPX
AEXQZ
AFJKZ
AFPUW
AIGII
AITUG
AKBMS
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
APXCP
EBS
FDB
GROUPED_DOAJ
M~E
OK1
AAYXX
CITATION
ID FETCH-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63
IEDL.DBID DOA
ISSN 2666-8270
IngestDate Wed Aug 27 01:29:13 EDT 2025
Wed Aug 06 18:55:50 EDT 2025
Sat Sep 06 17:18:22 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords A3C
ICM
FlightGear
RL
UAVs
Language English
License This is an open access article under the CC BY license.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63
ORCID 0009-0004-7318-4960
OpenAccessLink https://doaj.org/article/43ed45722502400cb104a58cf475d63f
ParticipantIDs doaj_primary_oai_doaj_org_article_43ed45722502400cb104a58cf475d63f
crossref_primary_10_1016_j_mlwa_2025_100714
elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100714
PublicationCentury 2000
PublicationDate September 2025
2025-09-00
2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: September 2025
PublicationDecade 2020
PublicationTitle Machine learning with applications
PublicationYear 2025
Publisher Elsevier Ltd
Elsevier
Publisher_xml – name: Elsevier Ltd
– name: Elsevier
References Nahhas, Kharitonov, Turowski (b13) 2022
Zhang, Zhou, Xu (b22) 2015
Lin, Lai, Chen, Cao, Wang (b9) 2022; 10
Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In
Bougie, Ichise (b4) 2020; 109
Babaeizadeh, Frosio, Tyree, Clemons, Kautz (b3) 2016
Michael, Mellinger, Lindsey, Kumar (b11) 2010; 17
Moness, Mostafa, Abdel-Fadeel, Aly, Al-Shamandy (b12) 2012; Vol. 8
Wang, Li, Sun, Fu, Cheng, Ye (b17) 2024
Aschauer, Schirrer, Kozek (b2) 2015; 48
Sonu, Doshi (b14) 2012
Wu, Yu, Liao, Ou (b19) 2024; 308
Li, Gajane (b7) 2023
Li, Lu, Li, Lu, Cai, Wang (b8) 2019
Zhang, Geng, Fei (b21) 2012
Sun, Chai, Wang, Sun, Wu, Wang (b16) 2025
Zhou, Wang, Hu, Deng (b25) 2021; 23
Habib, Malik, Rahman, Raja (b6) 2017
Mahmood (b10) 2025
Zheng, Chen, Wang, He, Hu, Chen (b24) 2021; 34
Wang, Liu, Li, Amani, Zhou, Yang (b18) 2024
Colas, Fournier, Chetouani, Sigaud, Oudeyer (b5) 2019
Stadie, Zhang, Ba (b15) 2020
Ahmed, Quinones-Grueiro, Biswas (b1) 2022
Zhelo, Zhang, Tai, Liu, Burgard (b23) 2018
(p. 6263).
Habib (10.1016/j.mlwa.2025.100714_b6) 2017
Sonu (10.1016/j.mlwa.2025.100714_b14) 2012
Mahmood (10.1016/j.mlwa.2025.100714_b10) 2025
Michael (10.1016/j.mlwa.2025.100714_b11) 2010; 17
Babaeizadeh (10.1016/j.mlwa.2025.100714_b3) 2016
Wang (10.1016/j.mlwa.2025.100714_b17) 2024
Colas (10.1016/j.mlwa.2025.100714_b5) 2019
Bougie (10.1016/j.mlwa.2025.100714_b4) 2020; 109
Nahhas (10.1016/j.mlwa.2025.100714_b13) 2022
Moness (10.1016/j.mlwa.2025.100714_b12) 2012; Vol. 8
Stadie (10.1016/j.mlwa.2025.100714_b15) 2020
10.1016/j.mlwa.2025.100714_b20
Wang (10.1016/j.mlwa.2025.100714_b18) 2024
Zheng (10.1016/j.mlwa.2025.100714_b24) 2021; 34
Lin (10.1016/j.mlwa.2025.100714_b9) 2022; 10
Zhang (10.1016/j.mlwa.2025.100714_b22) 2015
Wu (10.1016/j.mlwa.2025.100714_b19) 2024; 308
Zhelo (10.1016/j.mlwa.2025.100714_b23) 2018
Li (10.1016/j.mlwa.2025.100714_b8) 2019
Li (10.1016/j.mlwa.2025.100714_b7) 2023
Zhang (10.1016/j.mlwa.2025.100714_b21) 2012
Zhou (10.1016/j.mlwa.2025.100714_b25) 2021; 23
Aschauer (10.1016/j.mlwa.2025.100714_b2) 2015; 48
Sun (10.1016/j.mlwa.2025.100714_b16) 2025
Ahmed (10.1016/j.mlwa.2025.100714_b1) 2022
References_xml – year: 2024
  ident: b18
  article-title: Hyper: Hyperparameter robust efficient exploration in reinforcement learning
– start-page: 2231
  year: 2012
  end-page: 2234
  ident: b21
  article-title: UAV flight control system modeling and simulation based on FlightGear
  publication-title: International conference on automatic control and artificial intelligence
– year: 2018
  ident: b23
  article-title: Curiosity-driven exploration for mapless navigation with deep reinforcement learning
– year: 2025
  ident: b10
  article-title: Video demonstration of experimental simulation of target tracking of UAVs based on distributed networking framework
– volume: Vol. 8
  start-page: 1
  year: 2012
  end-page: 15
  ident: b12
  article-title: Automatic control education using FlightGear and MATLAB based virtual lab
  publication-title: The international conference on electrical engineering
– start-page: 1507
  year: 2012
  end-page: 1508
  ident: b14
  article-title: Gatac: A scalable and realistic testbed for multiagent decision making
  publication-title: AAMAS
– volume: 23
  start-page: 274
  year: 2021
  ident: b25
  article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection
  publication-title: Entropy
– year: 2025
  ident: b16
  article-title: Curiosity-driven reinforcement learning from human feedback
– year: 2016
  ident: b3
  article-title: Reinforcement learning through asynchronous advantage actor-critic on a gpu
– volume: 109
  start-page: 493
  year: 2020
  end-page: 512
  ident: b4
  article-title: Skill-based curiosity for intrinsically motivated reinforcement learning
  publication-title: Machine Learning
– year: 2024
  ident: b17
  article-title: Llm can achieve self-regulation via hyperparameter aware generation
– reference: (p. 6263).
– volume: 10
  start-page: 2523
  year: 2022
  ident: b9
  article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL
  publication-title: Mathematics
– volume: 17
  start-page: 56
  year: 2010
  end-page: 65
  ident: b11
  article-title: The grasp multiple micro-uav testbed
  publication-title: IEEE Robotics & Automation Magazine
– year: 2023
  ident: b7
  article-title: Curiosity-driven exploration in sparse-reward multi-agent reinforcement learning
– start-page: 450
  year: 2015
  end-page: 454
  ident: b22
  article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE
  publication-title: 2015 international conference on computational science and engineering
– start-page: 1331
  year: 2019
  end-page: 1340
  ident: b5
  article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning
  publication-title: International conference on machine learning
– start-page: 1
  year: 2022
  end-page: 10
  ident: b1
  article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning
  publication-title: 2022 IEEE/aIAA 41st digital avionics systems conference
– volume: 308
  year: 2024
  ident: b19
  article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV
  publication-title: Ocean Engineering
– reference: Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In
– start-page: 111
  year: 2020
  end-page: 120
  ident: b15
  article-title: Learning intrinsic rewards as a bi-level optimization problem
  publication-title: Conference on uncertainty in artificial intelligence
– volume: 34
  start-page: 3757
  year: 2021
  end-page: 3769
  ident: b24
  article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration
  publication-title: Advances in Neural Information Processing Systems
– start-page: 1109
  year: 2019
  end-page: 1114
  ident: b8
  article-title: Curiosity-driven exploration for off-policy reinforcement learning methods
  publication-title: 2019 IEEE international conference on robotics and biomimetics
– start-page: 185
  year: 2017
  end-page: 192
  ident: b6
  article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles
  publication-title: 2017 international conference on communication, computing and digital systems
– year: 2022
  ident: b13
  article-title: Deep reinforcement learning techniques for solving hybrid flow shop scheduling problems: Proximal policy optimization (PPO) and asynchronous advantage actor-critic (A3C)
– volume: 48
  start-page: 67
  year: 2015
  end-page: 72
  ident: b2
  article-title: Co-simulation of matlab and flightgear for identification and control of aircraft
  publication-title: IFAC-PapersOnLine
– volume: 109
  start-page: 493
  year: 2020
  ident: 10.1016/j.mlwa.2025.100714_b4
  article-title: Skill-based curiosity for intrinsically motivated reinforcement learning
  publication-title: Machine Learning
  doi: 10.1007/s10994-019-05845-8
– start-page: 450
  year: 2015
  ident: 10.1016/j.mlwa.2025.100714_b22
  article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE
– start-page: 2231
  year: 2012
  ident: 10.1016/j.mlwa.2025.100714_b21
  article-title: UAV flight control system modeling and simulation based on FlightGear
– year: 2018
  ident: 10.1016/j.mlwa.2025.100714_b23
– volume: 48
  start-page: 67
  issue: 1
  year: 2015
  ident: 10.1016/j.mlwa.2025.100714_b2
  article-title: Co-simulation of matlab and flightgear for identification and control of aircraft
  publication-title: IFAC-PapersOnLine
  doi: 10.1016/j.ifacol.2015.05.071
– year: 2025
  ident: 10.1016/j.mlwa.2025.100714_b16
– year: 2023
  ident: 10.1016/j.mlwa.2025.100714_b7
– year: 2022
  ident: 10.1016/j.mlwa.2025.100714_b13
– volume: 10
  start-page: 2523
  issue: 14
  year: 2022
  ident: 10.1016/j.mlwa.2025.100714_b9
  article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL
  publication-title: Mathematics
  doi: 10.3390/math10142523
– volume: 17
  start-page: 56
  issue: 3
  year: 2010
  ident: 10.1016/j.mlwa.2025.100714_b11
  article-title: The grasp multiple micro-uav testbed
  publication-title: IEEE Robotics & Automation Magazine
  doi: 10.1109/MRA.2010.937855
– year: 2024
  ident: 10.1016/j.mlwa.2025.100714_b18
– volume: Vol. 8
  start-page: 1
  year: 2012
  ident: 10.1016/j.mlwa.2025.100714_b12
  article-title: Automatic control education using FlightGear and MATLAB based virtual lab
– volume: 34
  start-page: 3757
  year: 2021
  ident: 10.1016/j.mlwa.2025.100714_b24
  article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration
  publication-title: Advances in Neural Information Processing Systems
– year: 2016
  ident: 10.1016/j.mlwa.2025.100714_b3
– start-page: 1507
  year: 2012
  ident: 10.1016/j.mlwa.2025.100714_b14
  article-title: Gatac: A scalable and realistic testbed for multiagent decision making
– start-page: 185
  year: 2017
  ident: 10.1016/j.mlwa.2025.100714_b6
  article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles
– ident: 10.1016/j.mlwa.2025.100714_b20
  doi: 10.2514/6.2006-6263
– start-page: 1
  year: 2022
  ident: 10.1016/j.mlwa.2025.100714_b1
  article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning
– year: 2025
  ident: 10.1016/j.mlwa.2025.100714_b10
– start-page: 1109
  year: 2019
  ident: 10.1016/j.mlwa.2025.100714_b8
  article-title: Curiosity-driven exploration for off-policy reinforcement learning methods
– volume: 23
  start-page: 274
  issue: 3
  year: 2021
  ident: 10.1016/j.mlwa.2025.100714_b25
  article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection
  publication-title: Entropy
  doi: 10.3390/e23030274
– start-page: 111
  year: 2020
  ident: 10.1016/j.mlwa.2025.100714_b15
  article-title: Learning intrinsic rewards as a bi-level optimization problem
– volume: 308
  year: 2024
  ident: 10.1016/j.mlwa.2025.100714_b19
  article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV
  publication-title: Ocean Engineering
  doi: 10.1016/j.oceaneng.2024.118342
– year: 2024
  ident: 10.1016/j.mlwa.2025.100714_b17
– start-page: 1331
  year: 2019
  ident: 10.1016/j.mlwa.2025.100714_b5
  article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning
SSID ssj0002811334
Score 2.3020535
Snippet This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target...
SourceID doaj
crossref
elsevier
SourceType Open Website
Index Database
Publisher
StartPage 100714
SubjectTerms A3C
FlightGear
ICM
UAVs
Title A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs
URI https://dx.doi.org/10.1016/j.mlwa.2025.100714
https://doaj.org/article/43ed45722502400cb104a58cf475d63f
Volume 21
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykxdRVJy_yMGbFNs0adPjnBubMC9usouU5kelw1WpK8OLf7svSTfqRS9eCmlLUr4X8r6XvvcFoStf0SxUGfdoEEuP8oh7mYxzjwoRgzsDDmy3BiYP0WhG7-ds3jrqy-SEOXlgB9wNDbWiLIZpZ9S4fCkgfsgYlzmNmYrC3Ky-fuK3gqmF3TIKIPiiTZWMS-havq6N0BBhNjMgoD88kRXsbzmklpMZ7qO9hh3invuqA7Sjy0P03MOmLq94qStT54TH5aoqSkAX9-uqMElXn3jypmp4BAwUZ3gKK73QyrbutkVReLxV31zhx3VWLfGs9_RxhGbDwbQ_8ppTETxJgD94IZWc6FzyQFNpfmwCPJHOBSOxltyI3QSUCMIIV5oQ4edJxiXcINKXiZ9F4THqlG-lPkEYQh9mCZ7QlIokFEwpYZoKlk_tiy663iCUvjvxi3STFbZIDZ6pwTN1eHbRrQFx-6YRrrY3wJxpY870L3N2EduYIG04gPPt0FXxy-Cn_zH4Gdo1XboUsnPUWVW1vgDOsRKXdnrBdfI1-AZHydJZ
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Configurable+Intrinsic+Curiosity+Module+for+a+Testbed+for+Developing+Intelligent+Swarm+UAVs&rft.jtitle=Machine+learning+with+applications&rft.au=Mahmood%2C+Jawad&rft.au=Raja%2C+Muhammad+Adil&rft.au=Loane%2C+John&rft.au=McCaffery%2C+Fergal&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100714&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100714&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100714
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon