Journal:Data without software are just numbers
Full article title | Data without software are just numbers |
---|---|
Journal | Data Science Journal |
Author(s) | Davenport, James H.; Grant, James; Jones, Catherine M. |
Author affiliation(s) | University of Bath, Science and Technology Facilities Council |
Primary contact | Email: J dot H dot Davenport at bath dot ac dot uk |
Year published | 2020 |
Volume and issue | 19(1) |
Article # | 3 |
DOI | 10.5334/dsj-2020-003 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://datascience.codata.org/articles/10.5334/dsj-2020-003/ |
Download | https://datascience.codata.org/articles/10.5334/dsj-2020-003/galley/929/download/ (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
Great strides have been made to encourage researchers to archive data created by research and provide the necessary systems to support their storage. Additionally, it is recognized that data are meaningless unless their provenance is preserved, through appropriate metadata. Alongside this is a pressing need to ensure the quality and archiving of the software that generates data, through simulation and control of experiment or data collection, and that which analyzes, modifies, and draws value from raw data. In order to meet the aims of reproducibility, we argue that data management alone is insufficient: it must be accompanied by good software practices, the training to facilitate it, and the support of stakeholders, including appropriate recognition for software as a research output.
Keywords: software citation, software management, reproducibility, archiving, research software engineer
Introduction
Context
In the last decade, there has been a drive towards improved research data management in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a Concordat on Open Research Data[1], and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.[2] The FAIR principles[3]—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through archiving with a persistent identifier, it should be well described with suitable metadata, and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it.
While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated.[4] A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."[5][6][7] This goes beyond data itself, requiring software and analysis pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on.
References
- ↑ Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome (28 July 2016). "Concordat on Open Research Data" (PDF). https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/.
- ↑ Directorate-General for Research & Innovation (26 July 2016). "Guidelines on FAIR Data Management in Horizon 2020" (PDR). H2020 Programme. European Commission. https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 12 August 2019.
- ↑ Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.
- ↑ Lyon, L.; Jeng, W.; Mattern, E. (2017). "Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services". International Journal of Digital Curation 12 (1): 46–64. doi:10.2218/ijdc.v12i1.530.
- ↑ Chen, X.; Dallmeier-Tiessen, S.; Dasler, R. et al. (2019). "Open is not enough". Nature Physics 15: 113–19. doi:10.1038/s41567-018-0342-2.
- ↑ Mesnard, O.; Barba, L.A. (2017). "Reproducible and Replicable Computational Fluid Dynamics: It’s Harder Than You Think". Computing in Science & Engineering 19 (4): 44–55. doi:10.1109/MCSE.2017.3151254.
- ↑ Allison, D.B.; Shiffrin, R.M.; Stodden, V. (2018). "Reproducibility of research: Issues and proposed remedies". Proceedings of the National Academy of Sciences of the United States of America 115 (11): 2561–62. doi:10.1073/pnas.1802324115. PMC PMC5856570. PMID 29531033. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5856570.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.