A Coverage-guided Fuzzing Method for Automatic Software Vulnerability Detection using Reinforcement Learning-enabled Multi-Level Input Mutation

Fuzzing is a popular and effective software testing technique that automatically generates or modifies inputs to test the stability and vulnerabilities of a software system, which has been widely applied and improved by security researchers and experts. The goal of fuzzing is to uncover potential we...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; p. 1
Main Authors	Pham, Van-Hau, Hien, Do Thi Thu, Chuong, Nguyen Phuc, Thai, Pham Thanh, Duy, Phan The
Format	Journal Article
Language	English
Published	IEEE 01.07.2024
Subjects	Codes Coverage Fuzzing Fault diagnosis Fuzzing Q-learning Reinforcement learning Software Source coding Vulnerability Detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fuzzing is a popular and effective software testing technique that automatically generates or modifies inputs to test the stability and vulnerabilities of a software system, which has been widely applied and improved by security researchers and experts. The goal of fuzzing is to uncover potential weaknesses in software by providing unexpected and invalid inputs to the target program to monitor its behavior and identify errors or unintended outcomes. Recently, researchers have also integrated promising machine learning algorithms, such as reinforcement learning, to enhance the fuzzing process. Reinforcement learning (RL) has been proven to be able to improve the effectiveness of fuzzing by selecting and prioritizing transformation actions with higher coverage, which reduces the required effort to uncover vulnerabilities. However, RL-based fuzzing models also encounter certain limitations, including an imbalance between exploitation and exploration. In this study, we propose a coverage-guided RL-based fuzzing model that enhances grey-box fuzzing, in which we leverage deep Q-learning to predict and select input variations to maximize code coverage and use code coverage as a reward signal. This model is complemented by simple input selection and scheduling algorithms that promote a more balanced approach to exploiting and exploring software. Furthermore, we introduce a multi-level input mutation model combined with RL to create a sequence of actions for comprehensive input variation. The proposed model is compared to other fuzzing tools in testing various real-world programs, where the results indicate a notable enhancement in terms of code coverage, discovered paths, and execution speed of our solution.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3421989