Gpass: A Goal-Adaptive Neural Theorem Prover Based on Coq for Automated Formal Verification

Formal verification is a crucial means to assure software quality. Regrettably, the manual composition of verification scripts proves to be both laborious and time-consuming. In response, researchers have put forth automated theorem prover approaches; however, these approaches still grapple with sev...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 29 - 41
Main Authors	Chen, Yizhou, Sun, Zeyu, Wang, Guoqing, Hao, Dan
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	Automated Formal Verification Benchmark testing Cognition Deep learning Formal verification Manuals Optimization Proof Synthesis Software quality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Formal verification is a crucial means to assure software quality. Regrettably, the manual composition of verification scripts proves to be both laborious and time-consuming. In response, researchers have put forth automated theorem prover approaches; however, these approaches still grapple with several limitations. These limitations encompass insufficient handling of lengthy proof steps, difficulty in aligning the various components of a Coq program with the requirements and constraints of the proof goal, and inefficiencies. To surmount these limitations, we present Gpass, a goal-adaptive neural theorem prover based on deep learning technology. Firstly, we design a unique sequence encoder for Gpass that completely scans previous proof tactics through multiple sliding windows and provides information related to the current proof step. Secondly, Gpass incorporates a goal-adaptive feature integration module to align the reasoning process with the requirements of the proof goal. Finally, we devise a parameter selection method based on loss values and loss slopes to procure parameter sets with diverse distributions, thereby facilitating the exploration of various proof tactics. Experimental results demonstrate that Gpass attains better performance on the extensive CoqGym benchmark and proves 11.03%-96.37% more theorems than the prior work most closely related to ours. We find that the orthogonality between Gpass and CoqHammer proves their complementary capabilities, and together they prove a total of 3,774 theorems, which is state-of-the-art performance. In addition, we propose an efficiency optimisation approach that allows Gpass to achieve performance beyond Diva at one-sixth of the parameter sets.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00116