Data sets and data quality in software engineering

This source preferred by Gernot Liebchen

Authors: Liebchen, G.A. and Shepperd, M.

Journal: PROMISE ’08: Proceedings of the 4th international workshop on Predictor models in software engineering

Pages: 39-44

Publisher: ACM

ISBN: 978-1-60558-036-4

This data was imported from Scopus:

Authors: Liebchen, G.A. and Shepperd, M.

Journal: Proceedings - International Conference on Software Engineering

Pages: 39-44

ISBN: 9781605580364

ISSN: 0270-5257

DOI: 10.1145/1370788.1370799

OBJECTIVE - to assess the extent and types of techniques used to manage quality within software engineering data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets. METHOD - we perform a systematic review of available empirical software engineering studies. RESULTS - only 23 out of the many hundreds of studies assessed, explicitly considered data quality. CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need more research into means of identifying, and ideally repairing, noisy cases. Third, it should become routine to use sensitivity analysis to assess conclusion stability with respect to the assumptions that must be made concerning noise levels. Copyright 2008 ACM.

The data on this page was last updated at 04:50 on December 17, 2018.