On the effectiveness of clone detection by string matching

Although duplicated code is known to pose severe problems for software maintenance, it is difficult to identify in large systems. Many different techniques have been developed to detect software clones, some of which are very sophisticated, but are also expensive to implement and adapt. Lightweight...

Full description

Saved in:
Bibliographic Details
Published inJournal of software maintenance and evolution Vol. 18; no. 1; pp. 37 - 58
Main Authors Ducasse, Stéphane, Nierstrasz, Oscar, Rieger, Matthias
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Ltd 01.01.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Although duplicated code is known to pose severe problems for software maintenance, it is difficult to identify in large systems. Many different techniques have been developed to detect software clones, some of which are very sophisticated, but are also expensive to implement and adapt. Lightweight techniques based on simple string matching are easy to implement, but how effective are they? We present a simple string‐based approach which we have successfully applied to a number of different languages such COBOL, JAVA, C++, PASCAL, PYTHON, SMALLTALK, C and PDP‐11 ASSEMBLER. In each case the maximum time to adapt the approach to a new language was less than 45 minutes. In this paper we investigate a number of simple variants of string‐based clone detection that normalize differences due to common editing operations, and assess the quality of clone detection for very different case studies. Our results confirm that this inexpensive clone detection technique generally achieves high recall and acceptable precision. Over‐zealous normalization of the code before comparison, however, can result in an unacceptable numbers of false positives. Copyright © 2005 John Wiley & Sons, Ltd.
Bibliography:ark:/67375/WNG-W0SJFBSQ-V
istex:ABBC0B57B0F80471CDAAD524E8287089B102EC79
ArticleID:SMR317
Swiss National Science Foundation and Swiss Federal Office for Education and Science - No. ESPRIT Project 21975/Swiss BBW 96.0015; No. 20-53711.98; No. 20-61655.00; No. 2000-061655.00/1
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1532-060X
1532-0618
DOI:10.1002/smr.317