A Smart Flight Controller based on Reinforcement Learning for Unmanned Aerial Vehicle (UAV)

Traditional flight controllers consist of Proportional Integral Derivates (PID), that although have dominant stability control but required high human interventions. In this study, a smart flight controller is developed for controlling UAVs which produces operator less mechanisms for flight controll...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) pp. 203 - 208
Main Authors Khan, Fawad Salam, Mohd, Mohd Norzali Haji, Larik, Raja Masood, Khan, Muhammad Danial, Abbasi, Muhammad Inam, Bagchi, Susama
Format Conference Proceeding
LanguageEnglish
Published IEEE 13.09.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Traditional flight controllers consist of Proportional Integral Derivates (PID), that although have dominant stability control but required high human interventions. In this study, a smart flight controller is developed for controlling UAVs which produces operator less mechanisms for flight controllers. It uses a neural network that has been trained using reinforcement learning techniques. Engineered with a variety of actuators (pitch, yaw, roll, and speed), the next-generation flight controller is directly trained to control its own decisions in flight. It also optimizes learning algorithms different from the traditional Actor and Critic networks. The agent gets state information from the environment and calculates the reward function depending on the sensors data from the environment. The agent then receives the observations to identify the state and reward functions and the agent activates the algorithm to perform actions. It shows the performance of a trained neural network consisting of a reward function in both simulation and real-time UAV control. Experimental results show that it can respond with relative precision. Using the same framework shows that UAVs can reliably hover in the air, even under adverse initialization conditions with obstacles. Reward functions computed during the flight for 2500, 5000, 7500 and 10000 episodes between the normalized values 0 and −4000. The computation time observed during each episode is 15 micro sec.
ISSN:2642-6471
DOI:10.1109/ICSIPA52582.2021.9576806