Collective program analysis

Popularity of data-driven software engineering has led to an increasing demand on the infrastructures to support efficient execution of tasks that require deeper source code analysis. While task optimization and parallelization are the adopted solutions, other research directions are less explored....

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) pp. 620 - 631
Main Authors Upadhyaya, Ganesha, Rajan, Hridesh
Format Conference Proceeding
LanguageEnglish
Published New York, NY, USA ACM 27.05.2018
SeriesACM Conferences
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Popularity of data-driven software engineering has led to an increasing demand on the infrastructures to support efficient execution of tasks that require deeper source code analysis. While task optimization and parallelization are the adopted solutions, other research directions are less explored. We present collective program analysis (CPA), a technique for scaling large scale source code analyses, especially those that make use of control and data flow analysis, by leveraging analysis specific similarity. Analysis specific similarity is about, whether two or more programs can be considered similar for a given analysis. The key idea of collective program analysis is to cluster programs based on analysis specific similarity, such that running the analysis on one candidate in each cluster is sufficient to produce the result for others. For determining analysis specific similarity and clustering analysis-equivalent programs, we use a sparse representation and a canonical labeling scheme. Our evaluation shows that for a variety of source code analyses on a large dataset of programs, substantial reduction in the analysis time can be achieved; on average a 69% reduction when compared to a baseline and on average a 36% reduction when compared to a prior technique. We also found that a large amount of analysis-equivalent programs exists in large datasets.
ISBN:9781450356381
1450356389
ISSN:1558-1225
DOI:10.1145/3180155.3180252