Machine learning steered symbolic execution framework for complex software code

During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers. In thi...

Full description

Saved in:
Bibliographic Details
Published inFormal aspects of computing Vol. 33; no. 3; pp. 301 - 323
Main Authors Bu, Lei, Liang, Yongjuan, Xie, Zhunyi, Qian, Hong, Hu, Yi-Qi, Yu, Yang, Chen, Xin, Li, Xuandong
Format Journal Article
LanguageEnglish
Published London Springer London 01.06.2021
Association for Computing Machinery
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers. In this paper, we adapt the classical symbolic execution framework with a machine learning approach for constraint satisfaction. The approach samples and learns from different solutions to identify potentially feasible area. This sampling-learning style solving can be applied in different class of complex problems easily. Therefore, incorporating this approach, our framework, MLBSE, supports the symbolic execution of not only simple linear path conditions, but also nonlinear arithmetic operations, and even black-box function calls of library methods. Meanwhile, thanks to the theoretical foundation of the machine learning based approach, when the solver fails to solve a path condition, we can have an estimation of the confidence in the satisfiability (ECS) of the problem to give users insights about how the problem is analyzed and whether they could ultimately find a solution. We implement MLBSE on the basis of Symbolic Path Finder (SPF) into a fully automatic Java symbolic execution engine. Users can feed their code to MLBSE directly, which is very convenient to use. To evaluate its performance, 22 real case programs are used as the benchmarks for MLBSE to generate test cases, which involve a total number of 1042 methods that are full of nonlinear operations, floating-point arithmetic as well as native method calls. Experiment results show that the coverage achieved by MLBSE is much higher than the state-of-the-art tools.
ISSN:0934-5043
1433-299X
DOI:10.1007/s00165-021-00538-3