IBFET: Index‐based features extraction technique for scalable code clone detection at file level granularity

Summary Many techniques have been developed over the years to detect code clones in different software systems to maintain security measures. These techniques often require the source code to compare the subject system against a very large data set of big code. This paper presents index‐based featur...

Full description

Saved in:
Bibliographic Details
Published inSoftware, practice & experience Vol. 50; no. 1; pp. 22 - 46
Main Authors Akram, Junaid, Mumtaz, Majid, Luo, Ping
Format Journal Article
LanguageEnglish
Published Bognor Regis Wiley Subscription Services, Inc 01.01.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Summary Many techniques have been developed over the years to detect code clones in different software systems to maintain security measures. These techniques often require the source code to compare the subject system against a very large data set of big code. This paper presents index‐based features extraction technique (IBFET) to detect code clones at a very large‐scale level to billions of LOC at file level granularity. We performed preprocessing, indexing, and clone detection for more than 324 billion of LOC using a Hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques; meanwhile, it detects all three types of clones efficiently. The MapReduce rule of divide and conquer is used for a count and retrieve the similar features between different systems. We evaluated the execution time, scalability, precision, and recall of IBFET by using a well‐known clone detection data set IJaDataset and BigCloneBench; furthermore, we compared the results with other state‐of‐the‐art tools. Our approach is faster, flexible, scalable, and provides accurate results with high authenticity and can be implemented at a large‐scale level.
Bibliography:Present Address
Junaid Akram, Gujranwala, Punjab, Pakistan
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0038-0644
1097-024X
DOI:10.1002/spe.2759