Kurz erklärt: Measuring Data Changes in Data Engineering and their Impact on Explainability and Algorithm Fairness

Data engineering is an integral part of any data science and ML process. It consists of several subtasks that are performed to improve data quality and to transform data into a target format suitable for analysis. The quality and correctness of the data engineering steps is therefore important to en...

Full description

Saved in:

Bibliographic Details
Published in	Datenbank-Spektrum : Zeitschrift für Datenbanktechnologie : Organ der Fachgruppe Datenbanken der Gesellschaft für Informatik e.V Vol. 21; no. 3; pp. 245 - 249
Main Authors	Klettke, Meike, Lutsch, Adrian, Störl, Uta
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 2021 Springer Nature B.V
Subjects	Algorithms Computer Science Computer Systems Organization and Communication Networks Data Mining and Knowledge Discovery Data science Data Structures and Information Theory Database Management Engineering Information Storage and Retrieval IT in Business Kurz Erklärt Machine learning Measurement methods Data engineering pipelines Reliability Degree of data changes Explainability Data bias
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Data engineering is an integral part of any data science and ML process. It consists of several subtasks that are performed to improve data quality and to transform data into a target format suitable for analysis. The quality and correctness of the data engineering steps is therefore important to ensure the quality of the overall process. In machine learning processes requirements such as fairness and explainability are essential. The answers to these must also be provided by the data engineering subtasks. In this article, we will show how these can be achieved by logging, monitoring and controlling the data changes in order to evaluate their correctness. However, since data preprocessing algorithms are part of any machine learning pipeline, they must obviously also guarantee that they do not produce data biases. In this article we will briefly introduce three classes of methods for measuring data changes in data engineering and present which research questions still remain unanswered in this area.
ISSN:	1618-2162 1610-1995
DOI:	10.1007/s13222-021-00392-w