Assessing the Quality and Cleaning of a Software Project Dataset: An Experience Report

Authors: Liebchen, G., Twala, B., Shepperd, M. and Cartwright, M.

Journal: 10th International Conference on Evaluation and Assessment in Software Engineering, EASE 2006

DOI: 10.14236/ewic/EASE2006.14

Abstract:

OBJECTIVE - The aim is to report upon an assessment of the impact noise has on the predictive accuracy by comparing noise handling techniques. METHOD - We describe the process of cleaning a large software management dataset comprising initially of more than 10,000 projects. The data quality is mainly assessed through feedback from the data provider and manual inspection of the data. Three methods of noise correction (polishing, noise elimination and robust algorithms) are compared with each other assessing their accuracy. The noise detection was undertaken by using a regression tree model. RESULTS - Three noise correction methods are compared and different results in their accuracy where noted. CONCLUSIONS - The results demonstrated that polishing improves classification accuracy compared to noise elimination and robust algorithms approaches.

Source: Scopus

Assessing the Quality and Cleaning of a Software Project Data Set: An Experience Report

Authors: Liebchen, G.A., Twala, B., Shepperd, M. and Cartwright, M.

Journal: Proceedings of 10th International Conference on Evaluation and Assessment in Software Engineering (EASE)

Publisher: British Computer Society

Source: Manual

Preferred by: Gernot Liebchen