Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In...

Full description

Saved in:

Bibliographic Details
Published in	Entropy (Basel, Switzerland) Vol. 23; no. 6; p. 737
Main Authors	Sun, Fengjie, Wang, Xianchang, Zhang, Rui
Format	Journal Article
Language	English
Published	Basel MDPI AG 11.06.2021 MDPI
Subjects	Algorithms Decision making decision-making support system Machine learning Manpower Markov analysis Matching Methods Norms Performance evaluation Pesticides Planning Q-learning reinforcement learning Researchers Spraying Support systems Unknown environments Unmanned aerial vehicles
Online Access	Get full text

Cover

Loading…

More Information
Summary:	An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1099-4300 1099-4300
DOI:	10.3390/e23060737