Promise repository datasets for defect prediction

4/13/2023

Code (PSC) dataset, which was derived from the PROMISE repository 30. In this research work, Stacking Ensemble technique gave best results for all the datasets with defect prediction accuracy more than 0.9 among the algorithms used for this experiment. To improve software reliability, software defect prediction is used to find. For datasets from PROMISE repository, multiple software metrics have been evaluated with feature selection (FS) techniques such as Recursive Feature Elimination (RFE) and correlation based FS combined with Synthetic Minority Oversampling Technique for imbalanced datasets. Artificial Neural Network (ANN), Decision Trees, K-nearest neighbour, SVM and Ensemble Learning are some of the algorithms in machine learning that have been used for classifying the modules in software as defect-prone and not defect-prone. This paper aims to do a comparative research on different classification algorithms taking into consideration the data imbalance and high dimension of the defect datasets. We use an enhanced SZZ algorithm to extract fault information and calculate metrics using JHawk. The effort and resources required for testing can be reduced by early prediction of the defects present in various modules of the software. We reduce 50,000 potential candidates down to 23 suitable for defect prediction using a selection criteria based on the systems software repository and its defect tracking system. The process of developing a good quality software requires rigorous testing of the software modules.

0 Comments

Promise repository datasets for defect prediction

Leave a Reply.

Author

Archives

Categories