A graph-based clustering algorithm for software systems modularization

Clustering algorithms, as a modularization technique, are used to modularize a program aiming to understand large software systems as well as software refactoring. These algorithms partition the source code of the software system into smaller and easy-to-manage modules (clusters). The resulting deco...

Full description

Saved in:
Bibliographic Details
Published inInformation and software technology Vol. 133; p. 106469
Main Authors Pourasghar, Babak, Izadkhah, Habib, Isazadeh, Ayaz, Lotfi, Shahriar
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Clustering algorithms, as a modularization technique, are used to modularize a program aiming to understand large software systems as well as software refactoring. These algorithms partition the source code of the software system into smaller and easy-to-manage modules (clusters). The resulting decomposition is called the software system structure (or software architecture). Due to the NP-hardness of the modularization problem, evolutionary clustering approaches such as the genetic algorithm have been used to solve this problem. These methods do not make much use of the information and knowledge available in the artifact dependency graph which is extracted from the source code. To overcome the limitations of the existing modularization techniques, this paper presents a new modularization technique named GMA (Graph-based Modularization Algorithm). In this paper, a new graph-based clustering algorithm is presented for software modularization. To this end, the depth of relationships is used to compute the similarity between artifacts, as well as seven new criteria are proposed to evaluate the quality of a modularization. The similarity presented in this paper enables the algorithm to use graph-theoretic information. To demonstrate the applicability of the proposed algorithm, ten folders of Mozilla Firefox with different domains and functions, along with four other applications, are selected. The experimental results demonstrate that the proposed algorithm produces modularization closer to the human expert’s decomposition (i.e., directory structure) than the other existing algorithms. The proposed algorithm is expected to help a software designer in the software reverse engineering process to extract easy-to-manage and understandable modules from source code.
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2020.106469