Overview

Wrote a thesis on Tracing Evolutionary Changes of APIs as part of my graduation. I implemented machine learning techniques to categorize release log messages and applied empirical methods to analyze the functionality and architecture of software codebases.

Abstract

In modern software development, APIs are essential tools that enable developers to utilise the capabilities provided by software libraries efficiently. As any software artifact, APIs evolve, undergoing changes in functionality and architecture. Therefore, understanding their evolution is essential for creating robust and adaptable systems. The current study emphasises and addresses the importance of understanding the underlying causes of API modifications. Particularly, their impact on software functionality and architecture. To explore this, we utilise Java APIs from Maven, including JUnit, Project Lombok, Log4J and Apache Commons IO. Our study involves an in-depth analysis of the aforementioned codebases and corresponding datasets featuring release log messages for selected API versions. We implement machine learning based techniques for categorising the release log messages and utilise empirical methods for analysing the functionality and architecture of the codebases. Our focus is on class-level architecture. Our findings show that machine learning models can classify release log messages with above 70% accuracy in the majority of cases. Moreover, detailed and descriptive release message logs significantly enhance classification accuracy, while brief messages hinder it. Concerning the architecture and functionality, the results suggest that codebases evolve predominantly through adding new classes and layers of abstraction. Design patterns, such as Visitor and Strategy, are utilised and extended throughout evolution, indicating a structured approach related to the evolution process. Overall, this research contributes to the understanding of API evolution. Particularly, it highlights the reasons for changes in APIs and their impact on architectural and functional aspects. These insights can help software professionals manage the effects of such changes.

GitHub

My repository with results can be found on the following link link