Machine learning steered symbolic execution framework for complex software code

During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers. In thi...

Full description

Saved in:

Bibliographic Details
Published in	Formal aspects of computing Vol. 33; no. 3; pp. 301 - 323
Main Authors	Bu, Lei, Liang, Yongjuan, Xie, Zhunyi, Qian, Hong, Hu, Yi-Qi, Yu, Yang, Chen, Xin, Li, Xuandong
Format	Journal Article
Language	English
Published	London Springer London 01.06.2021 Association for Computing Machinery
Subjects	Computer programming Computer Science Floating point arithmetic Machine learning Math Applications in Computer Science Original Article Solvers Theory of Computation Symbolic execution Machine learning Nonlinear path condition Constraint solving
Online Access	Get full text

Cover

Loading…

More Information
Summary:	During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers. In this paper, we adapt the classical symbolic execution framework with a machine learning approach for constraint satisfaction. The approach samples and learns from different solutions to identify potentially feasible area. This sampling-learning style solving can be applied in different class of complex problems easily. Therefore, incorporating this approach, our framework, MLBSE, supports the symbolic execution of not only simple linear path conditions, but also nonlinear arithmetic operations, and even black-box function calls of library methods. Meanwhile, thanks to the theoretical foundation of the machine learning based approach, when the solver fails to solve a path condition, we can have an estimation of the confidence in the satisfiability (ECS) of the problem to give users insights about how the problem is analyzed and whether they could ultimately find a solution. We implement MLBSE on the basis of Symbolic Path Finder (SPF) into a fully automatic Java symbolic execution engine. Users can feed their code to MLBSE directly, which is very convenient to use. To evaluate its performance, 22 real case programs are used as the benchmarks for MLBSE to generate test cases, which involve a total number of 1042 methods that are full of nonlinear operations, floating-point arithmetic as well as native method calls. Experiment results show that the coverage achieved by MLBSE is much higher than the state-of-the-art tools.
ISSN:	0934-5043 1433-299X
DOI:	10.1007/s00165-021-00538-3