Journal:Towards a contextual approach to data quality
Full article title | Towards a contextual approach to data quality |
---|---|
Journal | Data |
Author(s) | Canali, Stefano |
Author affiliation(s) | Leibniz University Hannover |
Primary contact | Email: stefano dot canali at philos dot uni-hannover dot de |
Year published | 2020 |
Volume and issue | 5(4) |
Article # | 90 |
DOI | 10.3390/data5040090 |
ISSN | 2306-5729 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.mdpi.com/2306-5729/5/4/90/htm |
Download | https://www.mdpi.com/2306-5729/5/4/90/pdf (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
This essay delves into the need for a framework for approaching data quality in the context of scientific research. First, the concept of "quality" as a property of information, evidence, and data is presented, and research on the philosophy of information, science, and biomedicine is reviewed. Based on this review, the need for a more purpose-dependent and contextual approach to data quality in scientific research is argued, whereby the quality of a dataset is dependent on the context of use of the dataset as much as the dataset itself. The rationale to the approach is then exemplified by discussing current critiques and debates of scientific quality, thus showcasing how data quality can be approached contextually.
Keywords: research data management, scientific epistemology, data quality, FAIR, reproducibility crisis
Introduction
Determining the quality of scientific data is a task of key importance for any research project and involves considerations at conceptual, practical, and methodological levels. The task has arguably become even more pressing in recent years, as a result of the ways in which the volume, variety, value, volatility, veracity, and validity of scientific data have changed with the rise of data-intensive methods in the sciences.[1] At the start of the last decade, many commentators argued that these changes would bring dramatic shifts to the scientific method and would per se make science better, thanks to fully automated reasoning, more data-driven methods, less theorizing, and more objectivity.[2] However, analyses of the use of data-intensive methods in the sciences have shown that the feasibility and benefits of these methods are not automatic results of these changes, but crucially rest upon the transparency, validity, and quality of data practices.[3] As a consequence, there are currently various attempts at implementing guidelines to maintain and promote the quality of datasets, developing ways and tools to measure it, and conceptualizing the notion of quality.[4][5][6]
This essay focuses on the latter line of research and discusses the following question: what are high-quality data? At the essay's core is a framework for data quality that suggests a contextual approach, whereby quality should be seen as a result of the context where a dataset is used, and not only of the intrinsic features of the data. This approach is based on the integration of philosophical discussions on the quality of data, information, and evidence. The next section begins by reviewing analyses of quality in different areas of philosophical research, particularly in the philosophy of information, science, and biomedicine. Then, shared results from this review are identified and integrated, with those results arguably pointing towards the need for a contextual approach. A discussion of what the approach entails and how it can be used in practice follows, looking at current debates on quality in the scientific and philosophical literature. Finally, in the conclusion, a discussion of the commentary is made and future research is proposed.
Quality as a property of information, evidence, and data
Quality has been discussed in areas of philosophical work highly engaged with research practices and debates in the sciences. In this context, three main areas of research were identified, whose results are particularly significant for conceptualizations of quality and yet have only partially been applied to issues in data quality. These results and their integration as important contributions for more general and interdisciplinary discussions on data quality are worthy of discussion. As such, this essay proposes that quality can be discussed as a property of three closely related notions: information, data, and evidence.
First, research on quality has traditionally focused on information quality, which became prominent in computer science in the 1990s. In this context, an influential line of research started to move beyond traditional interpretations of quality in terms of solely accuracy, developing a multi-dimensional and purpose-dependent view whereby a piece of information is of high quality insofar as it is fit for a certain purpose.[7] This line of research has developed into two main approaches since the 1990s: surveying opinions and definitions of academics and practices from an “empirical” point of view; and studying the different dimensions of quality and interrelations between these from a theoretical and “ontological” perspective.[8] The empirical approach has expanded conceptualizations of information quality to include not only traditional dimensions such as accuracy, but also objectivity, completeness, relevance, security, access. and timeliness; here, the goal has primarily been to categorize these dimensions, rather than to define them.[9] On the other hand, the goal of the ontological approach has been to understand how to connect different dimensions of information quality (such as those surveyed through the empirical approach[10]) and conceptualize and measure potential disconnections as errors.[11]
References
- ↑ Leonelli, S. (2020). "Scientific Research and Big Data". Stanford Encyclopedia of Philosophy Archive (Summer 2020). https://plato.stanford.edu/archives/sum2020/entries/science-big-data/.
- ↑ Canali, S. (2016). "Big Data, epistemology and causality: Knowledge in and knowledge out in EXPOsOMICS". Big Data & Society 3 (2). doi:10.1177/2053951716669530.
- ↑ Leonelli, S. (2014). "What difference does quantity make? On the epistemology of Big Data in biology". Big Data & Society 1 (1). doi:10.1177/2053951714534395.
- ↑ Cai, L.; Zhu, Y. (2015). "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era". Data Science Journal 14: 2. doi:10.5334/dsj-2015-002.
- ↑ Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.
- ↑ Illari, P.; Floridi, L. (2014). "Chapter 2: Information Quality, Data and Philosophy". In Floridi, L., Illari, P.. The Philosophy of Information Quality. Springer International Publishing. pp. 5–23. doi:10.1007/978-3-319-07121-3. ISBN 9783319071213.
- ↑ Wang, R.Y.; Reddy, M.P.; Kon, H.B. (1995). "Toward quality data: An attribute-based approach". Decision Support Systems 13 (3–4): 349–72. doi:10.1016/0167-9236(93)E0050-N.
- ↑ Wang, R.Y. (1998). "A product perspective on total data quality management". Communications of the ACM 41 (2): 58–65. doi:10.1145/269012.269022.
- ↑ Batini, C.; Scannapieca, M. (2006). Data Quality: Concepts, Methodologies and Techniques. Springer. ISBN 9783540331728.
- ↑ Wand, Y.; Wang, R.Y. (1996). "Anchoring data quality dimensions in ontological foundations". Communications of the ACM 39 (11): 86-95. doi:10.1145/240455.240479.
- ↑ Primiero, G. (2014). "Chapter 7: Algorithmic Check of Standards for Information Quality Dimensions". In Floridi, L., Illari, P.. The Philosophy of Information Quality. Springer International Publishing. pp. 107–34. doi:10.1007/978-3-319-07121-3. ISBN 9783319071213.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.