Code similarity detection through control statement and program features

•Methods to identify duplicate codes (code clones) are introduced.•All four types of clones can be identified.•Does not require external lexer or parser to process the code.•Less complex approach compared to AST and PDG based approaches. Software clone detection is an emerging research area in the f...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 132; pp. 63 - 75
Main Authors Sudhamani, M., Rangarajan, Lalitha
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 15.10.2019
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Methods to identify duplicate codes (code clones) are introduced.•All four types of clones can be identified.•Does not require external lexer or parser to process the code.•Less complex approach compared to AST and PDG based approaches. Software clone detection is an emerging research area in the field of software engineering. Software systems are subjected to continuous modifications in source code to improve the performance of the software, which may lead to code redundancy. Duplicate code/code clone is a piece of code reworked several times in software programs due to copy paste activity or reusability of existing software. Code clone is a prime subject in software evolution. Detection of software clones at the time of software evolution may improve the performance of software and reduce the maintenance cost and effort. This paper proposes metric based methods to detect code clones, as software clone is a universal problem in large scale programming environment. This paper introduces two metric based approaches to detect code clones by comparing (i) Control Statement Features (ii) Program Features like different types of statements, operators and operands. In order to demonstrate the effectiveness of the proposed approaches, extensive experiments are conducted on two datasets, C projects of Bellon's benchmark dataset and student lab programs (SLP).The methods efficiently identify similar functional clones. Proposed models only find similarity of whole programs but intelligent enough to highlight similar code segments across program files.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.04.045