CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries

Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL mo...

Full description

Saved in:

Bibliographic Details
Published in	2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) pp. 1027 - 1038
Main Authors	Pham, Hung Viet, Lutellier, Thibaud, Qi, Weizhen, Tan, Lin
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2019
Subjects	Atmospheric modeling bugs detection Computer bugs cross-implementation testing Deep learning deep learning software testing Libraries software testing Task analysis Testing Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.
ISSN:	1558-1225
DOI:	10.1109/ICSE.2019.00107