Do Code Clones Matter? (1701.05472v1)

Published 19 Jan 2017 in cs.SE

Abstract: Code cloning is not only assumed to inflate maintenance costs but also considered defect-prone as inconsistent changes to code duplicates can lead to unexpected behavior. Consequently, the identification of duplicated code, clone detection, has been a very active area of research in recent years. Up to now, however, no substantial investigation of the consequences of code cloning on program correctness has been carried out. To remedy this shortcoming, this paper presents the results of a large-scale case study that was undertaken to find out if inconsistent changes to cloned code can indicate faults. For the analyzed commercial and open source systems we not only found that inconsistent changes to clones are very frequent but also identified a significant number of faults induced by such changes. The clone detection tool used in the case study implements a novel algorithm for the detection of inconsistent clones. It is available as open source to enable other researchers to use it as basis for further investigations.

Citations (463)

View on Semantic Scholar

Summary

The paper demonstrates that 15% of inconsistent code clones are fault-inducing, revealing significant risks to software reliability.
The paper employs a novel suffix-tree based algorithm to detect inconsistencies in clones across five software systems, including both commercial and open-source projects.
The study suggests that reducing clone proliferation and using robust detection tools can improve software maintainability and quality.

Analyzing the Impacts of Code Cloning on Software Correctness

The paper in question, "Do Code Clones Matter?", addresses a critical issue in software maintenance: the presence and impact of code clones in software systems. The authors embark on a comprehensive investigation to determine whether inconsistent changes to cloned code are indicative of faults, thereby affecting program correctness.

Core Findings and Methodology

The research presents a thorough case paper involving five software systems, comprising both commercial and open-source projects written in C#, Cobol, and Java. The paper leverages a novel suffix-tree based detection algorithm to identify inconsistent clones effectively.

Key outcomes of the paper affirm that:

Prevalence of Inconsistencies: A significant proportion (52%) of code clones exhibit inconsistencies.
Unintentional Inconsistencies: About 28% of these inconsistencies are introduced unintentionally, indicating lapses in developer awareness or oversight during maintenance.
Fault Potential: A consequential 15% of inconsistent clones were identified as fault-inducing, underlining the risk clones pose to software reliability.

These findings underscore that inconsistent changes frequently stem from incomplete developer understanding or oversight, with fault densities in inconsistencies often surpassing typical averages.

Contributions and Implications

The authors contribute dual insights to the field. First, empirical evidence is provided through a large-scale paper on the deleterious effects of inconsistent clones on software quality. Second, they share a scalable, open-source tool suite capable of detecting such inconsistencies, facilitating further research and application in diverse software environments.

Addressing potential impacts, the paper suggests engaging in practices to reduce clone proliferation and incorporating robust tool support to track and manage clones effectively. This approach may mitigate risks and enhance software maintainability and quality.

Speculative Future Directions

Future research may delve into more granular classifications of the defects associated with inconsistent clones, examining their visibility and detectability through conventional testing methods. Furthermore, comparing the efficacy of various clone detection parameters and algorithms in diverse programming languages could provide insights into optimizing clone detection techniques.

Conclusion

This paper clearly establishes that code clones matter significantly when considering software correctness. By identifying and addressing inconsistencies, developers and organizations can better manage software quality and maintenance costs. Future explorations should aim to refine our understanding and management of clones to further align with optimal software engineering practices.

PDF Markdown