Improving model performance for software defect detection and prediction using ensemble method and cross validation techniques

Emmanuel U. Oyo-Ita 1, *, Emmanuel A. Edim 2, Anthony Otiko 3 and Darlington E.  Izuki 4

1 Department of Computer Science, University of Cross River State, Nigeria.
2 Department of Computer Science, University of Calabar, Nigeria.
3 Department of Computer Science, University of Cross River State, Nigeria.
4 Directorate of Information & Communication Technology, University of Cross River State Nigeria.
 
Review
International Journal of Science and Research Archive, 2024, 12(02), 2363–2373.
Article DOI: 10.30574/ijsra.2024.12.2.1518
Publication history: 
Received on 07 July 2024; revised on 16 August 2024; accepted on 19 August 2024
 
Abstract: 
Software defects and quality assurance are crucial aspects of software development that should be considered during the software development cycle. To ensure high-quality software, it is essential to have a robust quality assurance process in place. System reliability and quality are very key components that must be considered during software development, and this can only be achieved when software undergoes a thorough test process for errors, anomalies, defects, omissions, and bugs. Early software defect prediction and detection play an essential role in ensuring the reliability and quality of software systems, ensuring that software companies discover errors or defects early enough and allocate more resources to defect-prone modules. This study proposes the development of an enhanced classifier model for software defect prediction and detection. The aim is to harness the collective intelligence of selected base classifiers like Support Vector Machine, Logistic regression, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, K-Nearest Neighbor, GaussianNB, and Multi-Layer Perception to improve accuracy, robustness, and generalization in identifying potential defects using a soft voting ensemble technique. The ensemble model leveraged the confidence probability of the soft voting technique and the generalization advantage of cross-validation leading to a more robust and dynamic model. The performance of the model with existing classifiers was evaluated using accuracy, F1 score, Precision, and area under the ROC curve (ROC- AUC) as the evaluation metrics. The results of the experiment revealed that the Proposed Classifier produced an overall Accuracy rate of 93%, and ROC AUC of 98%. The results demonstrate the effectiveness of our enhanced ensemble classifier in software defect detection and prediction. By harnessing the strengths of diverse base classifiers, our approach provides a robust and adaptive solution to the challenges of early detection and mitigating defects in software systems. This research contributes to the advancement of reliable software development practices and lays the foundation for future enhancements in ensemble-based defect detection methodologies.
 
Keywords: 
Base Classifier; Cross-Validation; Ensemble; Machine learning; Software Defect; Soft Voting
 
Full text article in PDF: