Collective program analysis

Popularity of data-driven software engineering has led to an increasing demand on the infrastructures to support efficient execution of tasks that require deeper source code analysis. While task optimization and parallelization are the adopted solutions, other research directions are less explored....

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) pp. 620 - 631
Main Authors	Upadhyaya, Ganesha, Rajan, Hridesh
Format	Conference Proceeding
Language	English
Published	New York, NY, USA ACM 27.05.2018
Series	ACM Conferences
Subjects	Analytical models Boa Cloning Clustering Labeling Software and its engineering > Software creation and management > Software verification and validation > Formal software verification Software and its engineering > Software notations and tools > Software maintenance tools Software engineering Source code analysis Syntactics Task analysis Transfer functions Boa source code analysis clustering
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Popularity of data-driven software engineering has led to an increasing demand on the infrastructures to support efficient execution of tasks that require deeper source code analysis. While task optimization and parallelization are the adopted solutions, other research directions are less explored. We present collective program analysis (CPA), a technique for scaling large scale source code analyses, especially those that make use of control and data flow analysis, by leveraging analysis specific similarity. Analysis specific similarity is about, whether two or more programs can be considered similar for a given analysis. The key idea of collective program analysis is to cluster programs based on analysis specific similarity, such that running the analysis on one candidate in each cluster is sufficient to produce the result for others. For determining analysis specific similarity and clustering analysis-equivalent programs, we use a sparse representation and a canonical labeling scheme. Our evaluation shows that for a variety of source code analyses on a large dataset of programs, substantial reduction in the analysis time can be achieved; on average a 69% reduction when compared to a baseline and on average a 36% reduction when compared to a prior technique. We also found that a large amount of analysis-equivalent programs exists in large datasets.
ISBN:	9781450356381 1450356389
ISSN:	1558-1225
DOI:	10.1145/3180155.3180252