Difference between revisions of "Journal:Data without software are just numbers"
Shawndouglas (talk | contribs) (Saving and adding more.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 34: | Line 34: | ||
In the last decade, there has been a drive towards improved research [[Information management|data management]] in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a ''Concordat on Open Research Data''<ref name="RCUKConcord16">{{cite web |url=https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/ |format=PDF |title=Concordat on Open Research Data |author=Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome |date=28 July 2016}}</ref>, and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.<ref name="ECGuidelines16">{{cite web |url=https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDR |title=Guidelines on FAIR Data Management in Horizon 2020 |work=H2020 Programme |author=Directorate-General for Research & Innovation |publisher=European Commission |date=26 July 2016 |accessdate=12 August 2019}}</ref> The FAIR principles<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref>—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through [[Archival informatics|archiving]] with a persistent identifier, it should be well described with suitable [[metadata]], and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it. | In the last decade, there has been a drive towards improved research [[Information management|data management]] in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a ''Concordat on Open Research Data''<ref name="RCUKConcord16">{{cite web |url=https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/ |format=PDF |title=Concordat on Open Research Data |author=Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome |date=28 July 2016}}</ref>, and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.<ref name="ECGuidelines16">{{cite web |url=https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDR |title=Guidelines on FAIR Data Management in Horizon 2020 |work=H2020 Programme |author=Directorate-General for Research & Innovation |publisher=European Commission |date=26 July 2016 |accessdate=12 August 2019}}</ref> The FAIR principles<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref>—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through [[Archival informatics|archiving]] with a persistent identifier, it should be well described with suitable [[metadata]], and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it. | ||
While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated. (Lyon, Jeng, & Mattern, 2017) A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."(Chen et al., 2019; Mesnard & Barba, 2017; Allison, Shiffrin, & Stodden, 2018) This goes beyond data itself, requiring software and [[Data analysis|analysis]] pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on. | |||
==References== | ==References== |
Revision as of 15:35, 1 October 2020
Full article title | Data without software are just numbers |
---|---|
Journal | Data Science Journal |
Author(s) | Davenport, James H.; Grant, James; Jones, Catherine M. |
Author affiliation(s) | University of Bath, Science and Technology Facilities Council |
Primary contact | Email: J dot H dot Davenport at bath dot ac dot uk |
Year published | 2020 |
Volume and issue | 19(1) |
Article # | 3 |
DOI | 10.5334/dsj-2020-003 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://datascience.codata.org/articles/10.5334/dsj-2020-003/ |
Download | https://datascience.codata.org/articles/10.5334/dsj-2020-003/galley/929/download/ (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
Great strides have been made to encourage researchers to archive data created by research and provide the necessary systems to support their storage. Additionally, it is recognized that data are meaningless unless their provenance is preserved, through appropriate metadata. Alongside this is a pressing need to ensure the quality and archiving of the software that generates data, through simulation and control of experiment or data collection, and that which analyzes, modifies, and draws value from raw data. In order to meet the aims of reproducibility, we argue that data management alone is insufficient: it must be accompanied by good software practices, the training to facilitate it, and the support of stakeholders, including appropriate recognition for software as a research output.
Keywords: software citation, software management, reproducibility, archiving, research software engineer
Introduction
Context
In the last decade, there has been a drive towards improved research data management in academia, moving away from the model of "supplementary material" that did not fit in publications, to the requirement that all data supporting research be made available at the time of publication. In the U.K., for example, the Research Councils have a Concordat on Open Research Data[1], and the E.U.’s Horizon 2020 program incorporates similar policies on data availability.[2] The FAIR principles[3]—that state data be findable, accessible, interoperable, and re-usable—embody the philosophy underlying this: data should be preserved through archiving with a persistent identifier, it should be well described with suitable metadata, and it should be done in a way that is relevant to the domain. Together with the OpenAccess movement, there has been a profound transformation in the availability of research and the data supporting it.
While this is a great stride towards transparency, it does not by itself improve the quality of research, and even what exactly transparency entails remains debated. (Lyon, Jeng, & Mattern, 2017) A common theme discussed in many disciplines is the need for a growing emphasis on "reproducibility."(Chen et al., 2019; Mesnard & Barba, 2017; Allison, Shiffrin, & Stodden, 2018) This goes beyond data itself, requiring software and analysis pipelines to be published in a usable state alongside papers. In order to spread such good practices, a coordinated effort towards training in professional programming methods in academia, recognizing the role of research software and the effort required to develop it, and storing the software instance itsels as well as the data it creates and operates on.
References
- ↑ Higher Education Funding Council for England, Research Councils UK, Universities UK, Wellcome (28 July 2016). "Concordat on Open Research Data" (PDF). https://www.ukri.org/files/legacy/documents/concordatonopenresearchdata-pdf/.
- ↑ Directorate-General for Research & Innovation (26 July 2016). "Guidelines on FAIR Data Management in Horizon 2020" (PDR). H2020 Programme. European Commission. https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 12 August 2019.
- ↑ Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.