Journal:Recommended versus certified repositories: Mind the gap
Full article title | Recommended versus certified repositories: Mind the gap |
---|---|
Journal | Data Science Journal |
Author(s) | Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena |
Author affiliation(s) | Leiden University, Elsevier |
Primary contact | Email: s dot e dot husen at hum dot leidenuniv dot nl |
Year published | 2017 |
Volume and issue | 16(1) |
Page(s) | 42 |
DOI | 10.5334/dsj-2017-042 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://datascience.codata.org/article/10.5334/dsj-2017-042/ |
Download | https://datascience.codata.org/articles/10.5334/dsj-2017-042/galley/710/download/ (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
Researchers are increasingly required to make research data publicly available in data repositories. Although several organizations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository. In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organizations. We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organizations recommending and certifying repositories are similar, although the certification criteria are generally more detailed. We distill the lists of criteria into seven main categories: “Mission,” “Community/Recognition,” “Legal and Contractual Compliance,” “Access/Accessibility,” “Technical Structure/Interface,” “Retrievability,” and “Preservation.” Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than six percent obtained certification. As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.
Keywords: data repositories, data management, certification, data quality, data fitness
Introduction
Data sharing and data management are topics that are becoming increasingly important. More information is appearing about their benefits, such as increased citation rates for research papers with associated shared datasets.[1][2] A growing number of funding bodies such as the NIH and the Wellcome Trust[3][4], as well as several journals[5], have installed policies that require research data to be shared.[6] To be able to share data, both now and in the future, datasets not only need to be preserved but also need to be comprehensible and usable for others. To ensure these qualities, research data needs to be managed[7], and data repositories can play a role in maintaining the data in a usable structure.[8] However, using a data repository does not guarantee that the data is usable, since not every repository uses the same procedures and quality metrics, such as applying proper metadata tags.[9] As many repositories have not yet adopted generally accepted standards, it can be difficult for researchers to choose the right repository for their dataset.[7]
References
- ↑ Piwowar, H.A.; Vision, T.J. (2013). "Data reuse and the open data citation advantage". PeerJ 1: e175. doi:10.7717/peerj.175. PMC PMC3792178. PMID 24109559. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3792178.
- ↑ Piwowar, H.A.; Day, R.S.; Fridsma, D.B. (2007). "Sharing detailed research data is associated with increased citation rate". PLoS One 2 (3): e308. doi:10.1371/journal.pone.0000308. PMC PMC1817752. PMID 17375194. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1817752.
- ↑ "NIH Grants Policy Statement" (PDF). National Institutes of Health. 2015. https://grants.nih.gov/grants/policy/nihgps/nihgps.pdf. Retrieved 27 January 2017.
- ↑ "Policy on data, software and materials management and sharing". Wellcome Trust. https://wellcome.ac.uk/funding/managing-grant/policy-data-software-materials-management-and-sharing. Retrieved 27 January 2017.
- ↑ Borgman, C.L. (2012). "The conundrum of sharing research data". Journal of the American Society for Information Science and Technology 63 (6): 1059-1078. doi:10.1002/asi.22634.
- ↑ Mayernik, M.S.; Callaghan, S.; Leigh, R. et al. (2014). "Peer Review of Datasets: When, Why, and How". Bulletin of the American Meteorological Society 96 (2): 191–201. doi:10.1175/BAMS-D-13-00083.1.
- ↑ 7.0 7.1 Dobratz, S.; Rödig, P.; Borghoff, U.M. et al. (2010). "The Use of Quality Management Standards in Trustworthy Digital Archives". International Journal of Digital Curation 5 (1): 46–63. doi:10.2218/ijdc.v5i1.143.
- ↑ Assante, M.; Candela, L.; Castelli, D. et al. (2016). "Are Scientific Data Repositories Coping with Research Data Publishing?". Data Science Journal 15 (6): 1–24. doi:10.5334/dsj-2016-006.
- ↑ Merson, L.; Gaye, O.; Guerin, P.J. (2016). "Avoiding Data Dumpsters--Toward Equitable and Useful Data Sharing". New England Journal of Medicine 374 (25): 2414-5. doi:10.1056/NEJMp1605148. PMID 27168351.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.