A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs
This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coo...
Saved in:
Published in | Machine learning with applications Vol. 21; p. 100714 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.09.2025
Elsevier |
Subjects | |
Online Access | Get full text |
ISSN | 2666-8270 2666-8270 |
DOI | 10.1016/j.mlwa.2025.100714 |
Cover
Loading…
Abstract | This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions. |
---|---|
AbstractList | This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target tracking, leveraging the actor–critic architecture to control the roll, pitch, yaw, and throttle motions of UAVs. A key challenge in RL-based UAV coordination is the delayed reward problem, which hinders effective learning in dynamic environments. Existing UAV testbeds rely primarily on extrinsic rewards and lack mechanisms for adaptive exploration and efficient UAV coordination. To address these limitations, we propose a testbed that integrates an ICM with the Asynchronous Advantage Actor-Critic (A3C) algorithm for tracking UAVs. It incorporates the Self-Reflective Curiosity-Weighted (SRCW) hyperparameter tuning mechanism for the ICM, which adaptively modifies hyperparameters based on the ongoing RL agent’s performance. In this testbed, the target UAV is guided by the Advantage Actor-Critic (A2C) model, while a swarm of two tracking UAVs is controlled by using the A3C-ICM approach. The proposed framework facilitates real-time autonomous coordination among UAVs within a simulated environment. This system is developed using the FlightGear flight simulator and the JSBSim Flight Dynamics Model (FDM), which enables dynamic simulations and continuous interaction between UAVs. Experimental results demonstrate that the tracking UAVs can effectively coordinate and maintain precise paths even under complex conditions. |
ArticleNumber | 100714 |
Author | Raja, Muhammad Adil Loane, John Mahmood, Jawad McCaffery, Fergal |
Author_xml | – sequence: 1 givenname: Jawad orcidid: 0009-0004-7318-4960 surname: Mahmood fullname: Mahmood, Jawad email: Jawad.Mahmood@dkit.ie – sequence: 2 givenname: Muhammad Adil surname: Raja fullname: Raja, Muhammad Adil email: Adil.Raja@dkit.ie – sequence: 3 givenname: John surname: Loane fullname: Loane, John email: John.Loane@dkit.ie – sequence: 4 givenname: Fergal surname: McCaffery fullname: McCaffery, Fergal email: Fergal.McCaffery@dkit.ie |
BookMark | eNp9kMtOwzAQRS1UJKDwA6zyAy3jieOkEpuqvCqBWPDYIctxxpWrNK7stFX_nqRFiBWrmbmjezRzL9ig8Q0xds1hzIHLm-V4Ve_0GAGzToCcixN2jlLKUYE5DP70Z-wqxiUAYMF5mopz9jVNZr6xbrEJuqwpmTdtcE10JpltgvPRtfvkxVebbmV9SHTyTrEtqTpMd7Sl2q9ds-h9VNduQU2bvO10WCUf0894yU6triNd_dQh-3i4f589jZ5fH-ez6fPIIEgxSoUpkKwpOAkDKYhKZJJsmWFOppjIieQCS8ywqAixBDvRhekENGAmoGU6ZPMjt_J6qdbBrXTYK6-dOgg-LJQOrTM1KZFSR88RM0ABYEoOQmeFsSLPKpnajoVHlgk-xkD2l8dB9XmrperzVn3e6ph3Z7o9mqj7cusoqGgcNYYqF8i03RnuP_s3xw-JZw |
Cites_doi | 10.1007/s10994-019-05845-8 10.1016/j.ifacol.2015.05.071 10.3390/math10142523 10.1109/MRA.2010.937855 10.2514/6.2006-6263 10.3390/e23030274 10.1016/j.oceaneng.2024.118342 |
ContentType | Journal Article |
Copyright | 2025 The Authors |
Copyright_xml | – notice: 2025 The Authors |
DBID | 6I. AAFTH AAYXX CITATION DOA |
DOI | 10.1016/j.mlwa.2025.100714 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2666-8270 |
ExternalDocumentID | oai_doaj_org_article_43ed45722502400cb104a58cf475d63f 10_1016_j_mlwa_2025_100714 S2666827025000970 |
GroupedDBID | 0R~ 6I. AAEDW AAFTH AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AEXQZ AFJKZ AFPUW AIGII AITUG AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP EBS FDB GROUPED_DOAJ M~E OK1 AAYXX CITATION |
ID | FETCH-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63 |
IEDL.DBID | DOA |
ISSN | 2666-8270 |
IngestDate | Wed Aug 27 01:29:13 EDT 2025 Wed Aug 06 18:55:50 EDT 2025 Sat Sep 06 17:18:22 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | A3C ICM FlightGear RL UAVs |
Language | English |
License | This is an open access article under the CC BY license. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c2064-34c82efc81e4c0304d456efb527ec89696142b2528de22b0f9a8c2b22c0c90a63 |
ORCID | 0009-0004-7318-4960 |
OpenAccessLink | https://doaj.org/article/43ed45722502400cb104a58cf475d63f |
ParticipantIDs | doaj_primary_oai_doaj_org_article_43ed45722502400cb104a58cf475d63f crossref_primary_10_1016_j_mlwa_2025_100714 elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100714 |
PublicationCentury | 2000 |
PublicationDate | September 2025 2025-09-00 2025-09-01 |
PublicationDateYYYYMMDD | 2025-09-01 |
PublicationDate_xml | – month: 09 year: 2025 text: September 2025 |
PublicationDecade | 2020 |
PublicationTitle | Machine learning with applications |
PublicationYear | 2025 |
Publisher | Elsevier Ltd Elsevier |
Publisher_xml | – name: Elsevier Ltd – name: Elsevier |
References | Nahhas, Kharitonov, Turowski (b13) 2022 Zhang, Zhou, Xu (b22) 2015 Lin, Lai, Chen, Cao, Wang (b9) 2022; 10 Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In Bougie, Ichise (b4) 2020; 109 Babaeizadeh, Frosio, Tyree, Clemons, Kautz (b3) 2016 Michael, Mellinger, Lindsey, Kumar (b11) 2010; 17 Moness, Mostafa, Abdel-Fadeel, Aly, Al-Shamandy (b12) 2012; Vol. 8 Wang, Li, Sun, Fu, Cheng, Ye (b17) 2024 Aschauer, Schirrer, Kozek (b2) 2015; 48 Sonu, Doshi (b14) 2012 Wu, Yu, Liao, Ou (b19) 2024; 308 Li, Gajane (b7) 2023 Li, Lu, Li, Lu, Cai, Wang (b8) 2019 Zhang, Geng, Fei (b21) 2012 Sun, Chai, Wang, Sun, Wu, Wang (b16) 2025 Zhou, Wang, Hu, Deng (b25) 2021; 23 Habib, Malik, Rahman, Raja (b6) 2017 Mahmood (b10) 2025 Zheng, Chen, Wang, He, Hu, Chen (b24) 2021; 34 Wang, Liu, Li, Amani, Zhou, Yang (b18) 2024 Colas, Fournier, Chetouani, Sigaud, Oudeyer (b5) 2019 Stadie, Zhang, Ba (b15) 2020 Ahmed, Quinones-Grueiro, Biswas (b1) 2022 Zhelo, Zhang, Tai, Liu, Burgard (b23) 2018 (p. 6263). Habib (10.1016/j.mlwa.2025.100714_b6) 2017 Sonu (10.1016/j.mlwa.2025.100714_b14) 2012 Mahmood (10.1016/j.mlwa.2025.100714_b10) 2025 Michael (10.1016/j.mlwa.2025.100714_b11) 2010; 17 Babaeizadeh (10.1016/j.mlwa.2025.100714_b3) 2016 Wang (10.1016/j.mlwa.2025.100714_b17) 2024 Colas (10.1016/j.mlwa.2025.100714_b5) 2019 Bougie (10.1016/j.mlwa.2025.100714_b4) 2020; 109 Nahhas (10.1016/j.mlwa.2025.100714_b13) 2022 Moness (10.1016/j.mlwa.2025.100714_b12) 2012; Vol. 8 Stadie (10.1016/j.mlwa.2025.100714_b15) 2020 10.1016/j.mlwa.2025.100714_b20 Wang (10.1016/j.mlwa.2025.100714_b18) 2024 Zheng (10.1016/j.mlwa.2025.100714_b24) 2021; 34 Lin (10.1016/j.mlwa.2025.100714_b9) 2022; 10 Zhang (10.1016/j.mlwa.2025.100714_b22) 2015 Wu (10.1016/j.mlwa.2025.100714_b19) 2024; 308 Zhelo (10.1016/j.mlwa.2025.100714_b23) 2018 Li (10.1016/j.mlwa.2025.100714_b8) 2019 Li (10.1016/j.mlwa.2025.100714_b7) 2023 Zhang (10.1016/j.mlwa.2025.100714_b21) 2012 Zhou (10.1016/j.mlwa.2025.100714_b25) 2021; 23 Aschauer (10.1016/j.mlwa.2025.100714_b2) 2015; 48 Sun (10.1016/j.mlwa.2025.100714_b16) 2025 Ahmed (10.1016/j.mlwa.2025.100714_b1) 2022 |
References_xml | – year: 2024 ident: b18 article-title: Hyper: Hyperparameter robust efficient exploration in reinforcement learning – start-page: 2231 year: 2012 end-page: 2234 ident: b21 article-title: UAV flight control system modeling and simulation based on FlightGear publication-title: International conference on automatic control and artificial intelligence – year: 2018 ident: b23 article-title: Curiosity-driven exploration for mapless navigation with deep reinforcement learning – year: 2025 ident: b10 article-title: Video demonstration of experimental simulation of target tracking of UAVs based on distributed networking framework – volume: Vol. 8 start-page: 1 year: 2012 end-page: 15 ident: b12 article-title: Automatic control education using FlightGear and MATLAB based virtual lab publication-title: The international conference on electrical engineering – start-page: 1507 year: 2012 end-page: 1508 ident: b14 article-title: Gatac: A scalable and realistic testbed for multiagent decision making publication-title: AAMAS – volume: 23 start-page: 274 year: 2021 ident: b25 article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection publication-title: Entropy – year: 2025 ident: b16 article-title: Curiosity-driven reinforcement learning from human feedback – year: 2016 ident: b3 article-title: Reinforcement learning through asynchronous advantage actor-critic on a gpu – volume: 109 start-page: 493 year: 2020 end-page: 512 ident: b4 article-title: Skill-based curiosity for intrinsically motivated reinforcement learning publication-title: Machine Learning – year: 2024 ident: b17 article-title: Llm can achieve self-regulation via hyperparameter aware generation – reference: (p. 6263). – volume: 10 start-page: 2523 year: 2022 ident: b9 article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL publication-title: Mathematics – volume: 17 start-page: 56 year: 2010 end-page: 65 ident: b11 article-title: The grasp multiple micro-uav testbed publication-title: IEEE Robotics & Automation Magazine – year: 2023 ident: b7 article-title: Curiosity-driven exploration in sparse-reward multi-agent reinforcement learning – start-page: 450 year: 2015 end-page: 454 ident: b22 article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE publication-title: 2015 international conference on computational science and engineering – start-page: 1331 year: 2019 end-page: 1340 ident: b5 article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning publication-title: International conference on machine learning – start-page: 1 year: 2022 end-page: 10 ident: b1 article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning publication-title: 2022 IEEE/aIAA 41st digital avionics systems conference – volume: 308 year: 2024 ident: b19 article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV publication-title: Ocean Engineering – reference: Yasar, M., Bridges, D., Mallapragada, G., & Horn, J. (2006). A simulation test bed for coordination of unmanned rotorcraft and ground vehicles. In – start-page: 111 year: 2020 end-page: 120 ident: b15 article-title: Learning intrinsic rewards as a bi-level optimization problem publication-title: Conference on uncertainty in artificial intelligence – volume: 34 start-page: 3757 year: 2021 end-page: 3769 ident: b24 article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration publication-title: Advances in Neural Information Processing Systems – start-page: 1109 year: 2019 end-page: 1114 ident: b8 article-title: Curiosity-driven exploration for off-policy reinforcement learning methods publication-title: 2019 IEEE international conference on robotics and biomimetics – start-page: 185 year: 2017 end-page: 192 ident: b6 article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles publication-title: 2017 international conference on communication, computing and digital systems – year: 2022 ident: b13 article-title: Deep reinforcement learning techniques for solving hybrid flow shop scheduling problems: Proximal policy optimization (PPO) and asynchronous advantage actor-critic (A3C) – volume: 48 start-page: 67 year: 2015 end-page: 72 ident: b2 article-title: Co-simulation of matlab and flightgear for identification and control of aircraft publication-title: IFAC-PapersOnLine – volume: 109 start-page: 493 year: 2020 ident: 10.1016/j.mlwa.2025.100714_b4 article-title: Skill-based curiosity for intrinsically motivated reinforcement learning publication-title: Machine Learning doi: 10.1007/s10994-019-05845-8 – start-page: 450 year: 2015 ident: 10.1016/j.mlwa.2025.100714_b22 article-title: Hardware-in-the-loop simulation platform for UAV based on dSPACE – start-page: 2231 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b21 article-title: UAV flight control system modeling and simulation based on FlightGear – year: 2018 ident: 10.1016/j.mlwa.2025.100714_b23 – volume: 48 start-page: 67 issue: 1 year: 2015 ident: 10.1016/j.mlwa.2025.100714_b2 article-title: Co-simulation of matlab and flightgear for identification and control of aircraft publication-title: IFAC-PapersOnLine doi: 10.1016/j.ifacol.2015.05.071 – year: 2025 ident: 10.1016/j.mlwa.2025.100714_b16 – year: 2023 ident: 10.1016/j.mlwa.2025.100714_b7 – year: 2022 ident: 10.1016/j.mlwa.2025.100714_b13 – volume: 10 start-page: 2523 issue: 14 year: 2022 ident: 10.1016/j.mlwa.2025.100714_b9 article-title: Learning to utilize curiosity: A new approach of automatic curriculum learning for deep RL publication-title: Mathematics doi: 10.3390/math10142523 – volume: 17 start-page: 56 issue: 3 year: 2010 ident: 10.1016/j.mlwa.2025.100714_b11 article-title: The grasp multiple micro-uav testbed publication-title: IEEE Robotics & Automation Magazine doi: 10.1109/MRA.2010.937855 – year: 2024 ident: 10.1016/j.mlwa.2025.100714_b18 – volume: Vol. 8 start-page: 1 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b12 article-title: Automatic control education using FlightGear and MATLAB based virtual lab – volume: 34 start-page: 3757 year: 2021 ident: 10.1016/j.mlwa.2025.100714_b24 article-title: Episodic multi-agent reinforcement learning with curiosity-driven exploration publication-title: Advances in Neural Information Processing Systems – year: 2016 ident: 10.1016/j.mlwa.2025.100714_b3 – start-page: 1507 year: 2012 ident: 10.1016/j.mlwa.2025.100714_b14 article-title: Gatac: A scalable and realistic testbed for multiagent decision making – start-page: 185 year: 2017 ident: 10.1016/j.mlwa.2025.100714_b6 article-title: Nuav-a testbed for developing autonomous unmanned aerial vehicles – ident: 10.1016/j.mlwa.2025.100714_b20 doi: 10.2514/6.2006-6263 – start-page: 1 year: 2022 ident: 10.1016/j.mlwa.2025.100714_b1 article-title: A high-fidelity simulation test-bed for fault-tolerant octo-rotor control using reinforcement learning – year: 2025 ident: 10.1016/j.mlwa.2025.100714_b10 – start-page: 1109 year: 2019 ident: 10.1016/j.mlwa.2025.100714_b8 article-title: Curiosity-driven exploration for off-policy reinforcement learning methods – volume: 23 start-page: 274 issue: 3 year: 2021 ident: 10.1016/j.mlwa.2025.100714_b25 article-title: Application of improved asynchronous advantage actor critic reinforcement learning model on anomaly detection publication-title: Entropy doi: 10.3390/e23030274 – start-page: 111 year: 2020 ident: 10.1016/j.mlwa.2025.100714_b15 article-title: Learning intrinsic rewards as a bi-level optimization problem – volume: 308 year: 2024 ident: 10.1016/j.mlwa.2025.100714_b19 article-title: Deep reinforcement learning with intrinsic curiosity module based trajectory tracking control for USV publication-title: Ocean Engineering doi: 10.1016/j.oceaneng.2024.118342 – year: 2024 ident: 10.1016/j.mlwa.2025.100714_b17 – start-page: 1331 year: 2019 ident: 10.1016/j.mlwa.2025.100714_b5 article-title: Curious: intrinsically motivated modular multi-goal reinforcement learning |
SSID | ssj0002811334 |
Score | 2.3020535 |
Snippet | This paper introduces an Intrinsic Curiosity Module (ICM) based Reinforcement Learning (RL) framework for swarm Unmanned Aerial Vehicles (UAVs) target... |
SourceID | doaj crossref elsevier |
SourceType | Open Website Index Database Publisher |
StartPage | 100714 |
SubjectTerms | A3C FlightGear ICM UAVs |
Title | A Configurable Intrinsic Curiosity Module for a Testbed for Developing Intelligent Swarm UAVs |
URI | https://dx.doi.org/10.1016/j.mlwa.2025.100714 https://doaj.org/article/43ed45722502400cb104a58cf475d63f |
Volume | 21 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6ykxdRVJy_yMGbFNs0adPjnBubMC9usouU5kelw1WpK8OLf7svSTfqRS9eCmlLUr4X8r6XvvcFoStf0SxUGfdoEEuP8oh7mYxzjwoRgzsDDmy3BiYP0WhG7-ds3jrqy-SEOXlgB9wNDbWiLIZpZ9S4fCkgfsgYlzmNmYrC3Ky-fuK3gqmF3TIKIPiiTZWMS-havq6N0BBhNjMgoD88kRXsbzmklpMZ7qO9hh3invuqA7Sjy0P03MOmLq94qStT54TH5aoqSkAX9-uqMElXn3jypmp4BAwUZ3gKK73QyrbutkVReLxV31zhx3VWLfGs9_RxhGbDwbQ_8ppTETxJgD94IZWc6FzyQFNpfmwCPJHOBSOxltyI3QSUCMIIV5oQ4edJxiXcINKXiZ9F4THqlG-lPkEYQh9mCZ7QlIokFEwpYZoKlk_tiy663iCUvjvxi3STFbZIDZ6pwTN1eHbRrQFx-6YRrrY3wJxpY870L3N2EduYIG04gPPt0FXxy-Cn_zH4Gdo1XboUsnPUWVW1vgDOsRKXdnrBdfI1-AZHydJZ |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Configurable+Intrinsic+Curiosity+Module+for+a+Testbed+for+Developing+Intelligent+Swarm+UAVs&rft.jtitle=Machine+learning+with+applications&rft.au=Mahmood%2C+Jawad&rft.au=Raja%2C+Muhammad+Adil&rft.au=Loane%2C+John&rft.au=McCaffery%2C+Fergal&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100714&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100714&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100714 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon |