Can We Trust Our Results? A Mapping Study on Data Quality
Background: The quality of data sets used in software engineering research is of the utmost importance. To ensure credibility of results obtained from use of data sets, the quality of the data must be examined. Objective: This study provides an overview of recent research(2008-2012) involving data q...
Saved in:
Published in | 2013 20th Asia-Pacific Software Engineering Conference (APSEC) Vol. 1; pp. 116 - 123 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2013
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Background: The quality of data sets used in software engineering research is of the utmost importance. To ensure credibility of results obtained from use of data sets, the quality of the data must be examined. Objective: This study provides an overview of recent research(2008-2012) involving data quality in software engineering datasets, with the goal of generally understanding what research there is that addresses data quality, and in particular to determine to what degree researchers have addressed any data quality issues in order to evaluate the trustworthiness of their results. Method: We performed a systematic mapping study to investigate treatment of data quality issues in software engineering research. A total of 64 papers published from 2008 to 2012explicitly address issues with the quality of data and use software engineering data sets. These studies were classified according to the data quality topic, data set and data quality problem. Results: We found only 31 studies gave serious consideration for how the quality of the data affected their results. We observed that there is a lack of clear and consistent terminology regarding data quality, especially with respect to the kinds of quality problems a data set might have. As a first step to address this problem, we propose a model that describes the lifecycle that research data goes through when used in research. Conclusions: The results suggest that researchers should give more attention to the quality of data sets in order to produce trustworthy data for reliable empirical research, and that the research community needs to better understand and communicate issues with data quality. |
---|---|
ISSN: | 1530-1362 2640-0715 |
DOI: | 10.1109/APSEC.2013.26 |