Stable Prediction of Defect-Inducing Software Changes (SPDISC)
Principal investigator: Dr. Leandro Minku.
Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort.
With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.
Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.
Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.
Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.
Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.
Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.
Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.
Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.
Principal investigator: Prof. Xin Yao.
My work on this project completed in August 2015.
DAASE aims to create a new approach to software engineering which places computational search at the heart of the process and products it creates and embeds adaptivity into both. This new approach will produce software that is dynamically adaptive, being not only able to respond to and fix problems that arise before deployment and during operation, but also to continually optimise, re-configure and evolve to adapt to new operating conditions, platforms and environmental challenges. DAASE will create an array of new processes, methods, techniques and tools for this new kind of software engineering, radically transforming both theory and practice of software engineering. As part of it, DAASE will develop a hyper-heuristic approach to adaptive automation. A hyper-heuristic is a methodology for selecting or generating heuristics. Most heuristic methods in the literature operate on a search space of potential solutions to a particular problem. However, a hyper-heuristic operates on a search space of heuristics.
Currently, I am researching into adaptive software prediction. Software prediction tasks are of strategic importance for software developing companies. An example of such task is software effort estimation. Overestimations may result in a company loosing contracts or wasting resources, whereas underestimations may result in poor quality, delayed or unfinished software systems. Most software prediction research neglects the fact that software prediction tasks operate in online changing environments. Models are typically trained on a set of projects and evaluated on another set of projects, without considering whether the training projects were really available before the testing projects. Besides possibly leading to incorrect conclusions, this results in inflexible prediction approaches that become obsolete with time. I am currently investigating the type of changes suffered by software prediction tasks and proposing new approaches to quickly adapt to these changes.
Principal investigator: Prof. Xin Yao.
Project completed in December 2011.
[Section of the home page under construction]
Supervisor: Prof. Xin Yao.
Degree congregation: 2011.
Prof. Teresa B. Ludermir.
Keywords: online parameters optimisation, numeric parameters optimisation, fuzzy neural networks, ensembles of neural networks.
Funding: Brazilian Council for Scientific and Technological Development (CNPq).
Degree congregation: 2006.
Evolving Connectionist Systems (ECoSs) are systems composed by one or more neural networks whose structures adapt according to the data in a continuous interaction with the environment. Evolving Fuzzy Neural Networks (EFuNNs) are ECoSs which join the neural networks functional characteristics to the power of fuzzy logic. Fuzzy systems have been showing to be very efficient to represent and reason about uncertain knowledge. This is very important, as, many times, human knowledge is uncertain.
A key challenge in Artificial Intelligence is to create systems that are able not only to represent human knowledge and reason about it, but also to evolve and adapt their structures in a changing environment. This kind of system is able to model processes that continually develop and change over time, e.g., biological data processing, electricity load forecasting and adaptive speech recognition. A system with these characteristics needs to be able to tune its parameters in an on-line manner, according to the environment. EFuNNs have some adaptable parameters and their structures can also adapt according to incoming data. However, they still have many parameters that are fixed before the learning and have great influence on its results. The problem of using a fixed set of parameters is that an optimal set to a particular state of the environment can be unsuitable when the state of the environment changes.
In this work, two new techniques which use evolutionary algorithms to evolve the EFuNN parameters in an on-line manner were developed. These techniques are able to create fuzzy systems that are completely tunable, according to unpredictable and unknown environments. The techniques showed to be able to have better accuracy than the techniques existent in the literature to evolve EFuNN parameters in an on-line manner.
Besides the necessity to create new techniques to allow changing environments to be represented, it is always important to develop approaches with increasing generalization capabilities and lower execution time. Ensembles of neural networks have formally and empirically shown to outperform systems composed by only one neural network. Thus, this work also proposes a new approach to create ensembles of neural networks, e.g., ensembles of EFuNNs. The approach uses a clustering method and co-evolutionary algorithms to create the ensembles in an innovative way, explicitly partitioning the input space, in order to allow the networks that compose the ensemble to specialise in different parts of it and work in a divide-an-conquer manner. The approach showed to be able to improve the accuracy of single EFuNNs generated using evolutionary algorithms similar to the co-evolutionary algorithms used in the approach. Furthermore, the execution time of the approach is lower than the execution time of evolutionary algorithms to generate single EFuNNs.