Policy Selection and Scheduling of Cyber-Physical Systems with Denial-of-Service Attacks via Reinforcement Learning

This paper focuses on policy selection and scheduling of sensors and attackers in cyber-physical systems (CPSs) with multiple sensors under denial-of-service (DoS) attacks. DoS attacks have caused enormous disruption to the regular operation of CPSs, and it is necessary to assess this damage. The st...

Full description

Saved in:

Bibliographic Details
Published in	Journal of advanced computational intelligence and intelligent informatics Vol. 28; no. 4; pp. 962 - 973
Main Authors	Jin, Zengwang, Li, Qian, Zhang, Huixiang, Liu, Zhiqiang, Wang, Zhen
Format	Journal Article
Language	English
Published	Tokyo Fuji Technology Press Co. Ltd 01.07.2024
Subjects	Algorithms Control methods Cyber-physical systems Cybersecurity Damage assessment Denial of service attacks Dynamic programming Game theory Machine learning Optimization Real time operation Robust control Scheduling Sensors State estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper focuses on policy selection and scheduling of sensors and attackers in cyber-physical systems (CPSs) with multiple sensors under denial-of-service (DoS) attacks. DoS attacks have caused enormous disruption to the regular operation of CPSs, and it is necessary to assess this damage. The state estimation of the CPSs plays a vital role in providing real-time information about their operational status and ensuring accurate prediction and assessment of their security. For a multi-sensor CPS, this paper is different from utilizing robust control methods to characterize the state of the system against DoS attacks, but rather positively analyzes the optimal policy selection of the sensors and the attackers through dynamic programming ideology. To optimize the strategies of both sides, game theory is employed as a means to study the dynamic interaction that occurs between the sensors and the attackers. During the policy iterative optimization process, the sensors and attackers dynamically learn and adjust strategies by incorporating reinforcement learning. In order to explore more state information, the restriction on the set of states is relaxed, i.e., the transfer of states is not limited compulsorily. Meanwhile, the complexity of the proposed algorithm is decreased by introducing a penalty in the reward function. Finally, simulation results show that the proposed algorithm can effectively optimize policy selection and scheduling for CPSs with multiple sensors.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1343-0130 1883-8014
DOI:	10.20965/jaciii.2024.p0962