Journal:Towards a contextual approach to data quality
Full article title | Towards a contextual approach to data quality |
---|---|
Journal | Data |
Author(s) | Canali, Stefano |
Author affiliation(s) | Leibniz University Hannover |
Primary contact | Email: stefano dot canali at philos dot uni-hannover dot de |
Year published | 2020 |
Volume and issue | 5(4) |
Article # | 90 |
DOI | 10.3390/data5040090 |
ISSN | 2306-5729 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.mdpi.com/2306-5729/5/4/90/htm |
Download | https://www.mdpi.com/2306-5729/5/4/90/pdf (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
This essay delves into the need for a framework for approaching data quality in the context of scientific research. First, the concept of "quality" as a property of information, evidence, and data is presented, and research on the philosophy of information, science, and biomedicine is reviewed. Based on this review, the need for a more purpose-dependent and contextual approach to data quality in scientific research is argued, whereby the quality of a dataset is dependent on the context of use of the dataset as much as the dataset itself. The rationale to the approach is then exemplified by discussing current critiques and debates of scientific quality, thus showcasing how data quality can be approached contextually.
Keywords: research data management, scientific epistemology, data quality, FAIR, reproducibility crisis
Introduction
Determining the quality of scientific data is a task of key importance for any research project and involves considerations at conceptual, practical, and methodological levels. The task has arguably become even more pressing in recent years, as a result of the ways in which the volume, variety, value, volatility, veracity, and validity of scientific data have changed with the rise of data-intensive methods in the sciences.[1] At the start of the last decade, many commentators argued that these changes would bring dramatic shifts to the scientific method and would per se make science better, thanks to fully automated reasoning, more data-driven methods, less theorizing, and more objectivity.[2] However, analyses of the use of data-intensive methods in the sciences have shown that the feasibility and benefits of these methods are not automatic results of these changes, but crucially rest upon the transparency, validity, and quality of data practices.[3] As a consequence, there are currently various attempts at implementing guidelines to maintain and promote the quality of datasets, developing ways and tools to measure it, and conceptualizing the notion of quality.[4][5][6]
This essay focuses on the latter line of research and discusses the following question: what are high-quality data? At the essay's core is a framework for data quality that suggests a contextual approach, whereby quality should be seen as a result of the context where a dataset is used, and not only of the intrinsic features of the data. This approach is based on the integration of philosophical discussions on the quality of data, information, and evidence. The next section begins by reviewing analyses of quality in different areas of philosophical research, particularly in the philosophy of information, science, and biomedicine. Then, shared results from this review are identified and integrated, with those results arguably pointing towards the need for a contextual approach. A discussion of what the approach entails and how it can be used in practice follows, looking at current debates on quality in the scientific and philosophical literature. Finally, in the conclusion, a discussion of the commentary is made and future research is proposed.
References
- ↑ Leonelli, S. (2020). "Scientific Research and Big Data". Stanford Encyclopedia of Philosophy Archive (Summer 2020). https://plato.stanford.edu/archives/sum2020/entries/science-big-data/.
- ↑ Canali, S. (2016). "Big Data, epistemology and causality: Knowledge in and knowledge out in EXPOsOMICS". Big Data & Society 3 (2). doi:10.1177/2053951716669530.
- ↑ Leonelli, S. (2014). "What difference does quantity make? On the epistemology of Big Data in biology". Big Data & Society 1 (1). doi:10.1177/2053951714534395.
- ↑ Cai, L.; Zhu, Y. (2015). "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era". Data Science Journal 14: 2. doi:10.5334/dsj-2015-002.
- ↑ Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.
- ↑ Illari, P.; Floridi, L. (2014). "Chapter 2: Information Quality, Data and Philosophy". In Floridi, L., Illari, P.. The Philosophy of Information Quality. Springer International Publishing. pp. 5–23. doi:10.1007/978-3-319-07121-3. ISBN 9783319071213.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.