Difference between revisions of "Journal:Recommended versus certified repositories: Mind the gap"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 32: Line 32:
Data sharing and data management are topics that are becoming increasingly important. More information is appearing about their benefits, such as increased citation rates for research papers with associated shared datasets.<ref name="PiwowarDataReuse13">{{cite journal |title=Data reuse and the open data citation advantage |journal=PeerJ |author=Piwowar, H.A.; Vision, T.J. |volume=1 |pages=e175 |year=2013 |doi=10.7717/peerj.175 |pmid=24109559 |pmc=PMC3792178}}</ref><ref name="PiwowarSharing07">{{cite journal |title=Sharing detailed research data is associated with increased citation rate |journal=PLoS One |author=Piwowar, H.A.; Day, R.S.; Fridsma, D.B. |volume=2 |issue=3 |pages=e308 |year=2007 |doi=10.1371/journal.pone.0000308 |pmid=17375194 |pmc=PMC1817752}}</ref> A growing number of funding bodies such as the NIH and the Wellcome Trust<ref name="NIHGrants15">{{cite web |url=https://grants.nih.gov/grants/policy/nihgps/nihgps.pdf |format=PDF |title=NIH Grants Policy Statement |publisher=National Institutes of Health |date=2015 |accessdate=27 January 2017}}</ref><ref name="WellcomePolicy">{{cite web |url=https://wellcome.ac.uk/funding/managing-grant/policy-data-software-materials-management-and-sharing |title=Policy on data, software and materials management and sharing |publisher=Wellcome Trust |accessdate=27 January 2017}}</ref>, as well as several journals<ref name="BorgmanTheConun12">{{cite journal |title=The conundrum of sharing research data |journal=Journal of the American Society for Information Science and Technology |author=Borgman, C.L. |volume=63 |issue=6 |pages=1059-1078 |year=2012 |doi=10.1002/asi.22634}}</ref>, have installed policies that require research data to be shared.<ref name="MayernikPeer14">{{cite journal |title=Peer Review of Datasets: When, Why, and How |journal=Bulletin of the American Meteorological Society |author=Mayernik, M.S.; Callaghan, S.; Leigh, R. et al. |volume=96 |issue=2 |pages=191–201 |year=2014 |doi=10.1175/BAMS-D-13-00083.1}}</ref> To be able to share data, both now and in the future, datasets not only need to be preserved but also need to be comprehensible and usable for others. To ensure these qualities, research data needs to be managed<ref name="DobratzTheUse10">{{cite journal |title=The Use of Quality Management Standards in Trustworthy Digital Archives |journal=International Journal of Digital Curation |author=Dobratz, S.; Rödig, P.; Borghoff, U.M. et al. |volume=5 |issue=1 |pages=46–63 |year=2010 |doi=10.2218/ijdc.v5i1.143}}</ref>, and data repositories can play a role in maintaining the data in a usable structure.<ref name="AssanteAreSci16">{{cite journal |title=Are Scientific Data Repositories Coping with Research Data Publishing? |journal=Data Science Journal |author=Assante, M.; Candela, L.; Castelli, D. et al. |volume=15 |issue=6 |pages=1–24 |year=2016 |doi=10.5334/dsj-2016-006}}</ref> However, using a data repository does not guarantee that the data is usable, since not every repository uses the same procedures and quality metrics, such as applying proper metadata tags.<ref name="MersonAvoid16">{{cite journal |title=Avoiding Data Dumpsters--Toward Equitable and Useful Data Sharing |journal=New England Journal of Medicine |author=Merson, L.; Gaye, O.; Guerin, P.J. |volume=374 |issue=25 |pages=2414-5 |year=2016 |doi=10.1056/NEJMp1605148 |pmid=27168351}}</ref> As many repositories have not yet adopted generally accepted standards, it can be difficult for researchers to choose the right repository for their dataset.<ref name="DobratzTheUse10" />
Data sharing and data management are topics that are becoming increasingly important. More information is appearing about their benefits, such as increased citation rates for research papers with associated shared datasets.<ref name="PiwowarDataReuse13">{{cite journal |title=Data reuse and the open data citation advantage |journal=PeerJ |author=Piwowar, H.A.; Vision, T.J. |volume=1 |pages=e175 |year=2013 |doi=10.7717/peerj.175 |pmid=24109559 |pmc=PMC3792178}}</ref><ref name="PiwowarSharing07">{{cite journal |title=Sharing detailed research data is associated with increased citation rate |journal=PLoS One |author=Piwowar, H.A.; Day, R.S.; Fridsma, D.B. |volume=2 |issue=3 |pages=e308 |year=2007 |doi=10.1371/journal.pone.0000308 |pmid=17375194 |pmc=PMC1817752}}</ref> A growing number of funding bodies such as the NIH and the Wellcome Trust<ref name="NIHGrants15">{{cite web |url=https://grants.nih.gov/grants/policy/nihgps/nihgps.pdf |format=PDF |title=NIH Grants Policy Statement |publisher=National Institutes of Health |date=2015 |accessdate=27 January 2017}}</ref><ref name="WellcomePolicy">{{cite web |url=https://wellcome.ac.uk/funding/managing-grant/policy-data-software-materials-management-and-sharing |title=Policy on data, software and materials management and sharing |publisher=Wellcome Trust |accessdate=27 January 2017}}</ref>, as well as several journals<ref name="BorgmanTheConun12">{{cite journal |title=The conundrum of sharing research data |journal=Journal of the American Society for Information Science and Technology |author=Borgman, C.L. |volume=63 |issue=6 |pages=1059-1078 |year=2012 |doi=10.1002/asi.22634}}</ref>, have installed policies that require research data to be shared.<ref name="MayernikPeer14">{{cite journal |title=Peer Review of Datasets: When, Why, and How |journal=Bulletin of the American Meteorological Society |author=Mayernik, M.S.; Callaghan, S.; Leigh, R. et al. |volume=96 |issue=2 |pages=191–201 |year=2014 |doi=10.1175/BAMS-D-13-00083.1}}</ref> To be able to share data, both now and in the future, datasets not only need to be preserved but also need to be comprehensible and usable for others. To ensure these qualities, research data needs to be managed<ref name="DobratzTheUse10">{{cite journal |title=The Use of Quality Management Standards in Trustworthy Digital Archives |journal=International Journal of Digital Curation |author=Dobratz, S.; Rödig, P.; Borghoff, U.M. et al. |volume=5 |issue=1 |pages=46–63 |year=2010 |doi=10.2218/ijdc.v5i1.143}}</ref>, and data repositories can play a role in maintaining the data in a usable structure.<ref name="AssanteAreSci16">{{cite journal |title=Are Scientific Data Repositories Coping with Research Data Publishing? |journal=Data Science Journal |author=Assante, M.; Candela, L.; Castelli, D. et al. |volume=15 |issue=6 |pages=1–24 |year=2016 |doi=10.5334/dsj-2016-006}}</ref> However, using a data repository does not guarantee that the data is usable, since not every repository uses the same procedures and quality metrics, such as applying proper metadata tags.<ref name="MersonAvoid16">{{cite journal |title=Avoiding Data Dumpsters--Toward Equitable and Useful Data Sharing |journal=New England Journal of Medicine |author=Merson, L.; Gaye, O.; Guerin, P.J. |volume=374 |issue=25 |pages=2414-5 |year=2016 |doi=10.1056/NEJMp1605148 |pmid=27168351}}</ref> As many repositories have not yet adopted generally accepted standards, it can be difficult for researchers to choose the right repository for their dataset.<ref name="DobratzTheUse10" />


Several organizations, including funding agencies, academic publishers, and data organizations provide researchers with lists of supported or recommended repositories, e.g., BioSharing.<ref name="McQuiltonBioSharing16">{{cite journal |title=BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences |journal=Database |author=McQuilton, P.; Gonzalez-Beltran, A.; Rocca-Serra, P. et al. |volume=2016 |pages=baw075 |year=2016 |doi=10.1093/database/baw075 |pmid=27189610 |pmc=PMC4869797}}</ref> These lists vary in length, in the number and type of repositories they list, and in their selection criteria for recommendation. In addition, recommendations for data and data sharing are emerging, such as the [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR Data Principles]], guidelines to establish a common ground for all data to be findable, accessible, interoperable, and reusable.<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref> Some data repositories are beginning to incorporate the FAIR principles into their policies, such as the UK Data Service<ref name="UKDSTheFair16">{{cite web |url=https://www.ukdataservice.ac.uk/news-and-events/newsitem/?id=4615 |title=The 'FAIR' principles for scientific data management |publisher=UK Data Service |date=08 June 2016 |accessdate=28 October 2016}}</ref> and several funders such as the EU Horizon 2020 program and the NIH.<ref name="ECGuide16">{{cite web |url=http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDF |title=Guidelines on FAIR Data Management in Horizon 2020 |publisher=European Commission |date=26 July 2016 |accessdate=27 October 2016}}</ref><ref name="NIHBigData17">{{cite web |url=https://commonfund.nih.gov/bd2k |title=Big Data to Knowledge |publisher=National Institutes of Health |date=2017}}</ref> Lists of recommended repositories and guidelines such as these can help researchers decide how and where to store and share their data.
Several organizations, including funding agencies, academic publishers, and data organizations provide researchers with lists of supported or recommended repositories, e.g., BioSharing.<ref name="McQuiltonBioSharing16">{{cite journal |title=BioSharing: Curated and crowd-sourced metadata standards, databases and data policies in the life sciences |journal=Database |author=McQuilton, P.; Gonzalez-Beltran, A.; Rocca-Serra, P. et al. |volume=2016 |pages=baw075 |year=2016 |doi=10.1093/database/baw075 |pmid=27189610 |pmc=PMC4869797}}</ref> These lists vary in length, in the number and type of repositories they list, and in their selection criteria for recommendation. In addition, recommendations for data and data sharing are emerging, such as the [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR Data Principles]], guidelines to establish a common ground for all data to be findable, accessible, interoperable, and reusable.<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref> Some data repositories are beginning to incorporate the FAIR principles into their policies, such as the UK Data Service<ref name="UKDSTheFair16">{{cite web |url=https://www.ukdataservice.ac.uk/news-and-events/newsitem/?id=4615 |title=The 'FAIR' principles for scientific data management |publisher=UK Data Service |date=08 June 2016 |accessdate=28 October 2016}}</ref> and several funders such as the EU Horizon 2020 program and the NIH.<ref name="ECGuide16">{{cite web |url=http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDF |title=Guidelines on FAIR Data Management in Horizon 2020 |publisher=European Commission |date=26 July 2016 |accessdate=27 October 2016}}</ref><ref name="NIHBigData17">{{cite web |url=https://commonfund.nih.gov/bd2k |title=Big Data to Knowledge |publisher=National Institutes of Health |date=2017}}</ref> Lists of recommended repositories and guidelines such as these can help researchers decide how and where to store and share their data.


Next to lists of recommended repositories, there are a number of schemes which specifically certify the quality of data repositories. One of the first of these certification schemes is the Data Seal of Approval (DSA), with an objective "to safeguard data, to ensure high quality and to guide reliable management of data for the future without requiring the implementation of new standards, regulations or high costs."<ref name="DSAAbout">{{cite web |url=https://www.datasealofapproval.org/en/information/about/ |title=Data Seal of Approval: About |publisher=DSA Board |accessdate=18 January 2017}}</ref> Building upon the DSA certification, but with more elaborate and detailed guidelines<ref name="DilloTen15">{{cite journal |title=Ten Years Back, Five Years Forward: The Data Seal of Approval |journal=International Journal of Digital Curation |author=Dillo, I.; de Leeuw, L. |volume=10 |issue=1 |pages=363 |year=2015 |doi=10.2218/ijdc.v10i1.363}}</ref>, is the Network of Expertise in Long-Term Storage of Digital Resources (NESTOR) and the ISO 16363 standard/Trusted Data Repository (TDR). DSA, NESTOR, and TDR form a three-step framework for data repository certification.<ref name="DilloTen15" /> The ICSU-WDS membership incorporates guidelines from DSA, NESTOR and Trustworthy Repositories Audit & Certification (TRAC), among others, for its data repository framework.<ref name="ICSUCert12">{{cite web |url=https://www.icsu-wds.org/files/wds-certification-summary-11-june-2012.pdf |format=PDF |title=Certification of WDS Members |publisher=ICSU World Data System |date=11 June 2012 |accessdate=28 October 2016}}</ref> Furthermore, the TRAC guidelines were used as a basis for the ISO 16363/TDR guidelines.<ref name="CCSDSRecomm11">{{cite web |url=https://public.ccsds.org/pubs/652x0m1.pdf |format=PDF |title=Audit and Certification of Trustworthy Digital Repositories |publisher=CCSDS |date=September 2011 |accessdate=28 October 2016}}</ref>
Next to lists of recommended repositories, there are a number of schemes which specifically certify the quality of data repositories. One of the first of these certification schemes is the Data Seal of Approval (DSA), with an objective "to safeguard data, to ensure high quality and to guide reliable management of data for the future without requiring the implementation of new standards, regulations or high costs."<ref name="DSAAbout">{{cite web |url=https://www.datasealofapproval.org/en/information/about/ |title=Data Seal of Approval: About |publisher=DSA Board |accessdate=18 January 2017}}</ref> Building upon the DSA certification, but with more elaborate and detailed guidelines<ref name="DilloTen15">{{cite journal |title=Ten Years Back, Five Years Forward: The Data Seal of Approval |journal=International Journal of Digital Curation |author=Dillo, I.; de Leeuw, L. |volume=10 |issue=1 |pages=363 |year=2015 |doi=10.2218/ijdc.v10i1.363}}</ref>, is the Network of Expertise in Long-Term Storage of Digital Resources (NESTOR) and the ISO 16363 standard/Trusted Data Repository (TDR). DSA, NESTOR, and TDR form a three-step framework for data repository certification.<ref name="DilloTen15" /> The ICSU-WDS membership incorporates guidelines from DSA, NESTOR and Trustworthy Repositories Audit & Certification (TRAC), among others, for its data repository framework.<ref name="ICSUCert12">{{cite web |url=https://www.icsu-wds.org/files/wds-certification-summary-11-june-2012.pdf |format=PDF |title=Certification of WDS Members |publisher=ICSU World Data System |date=11 June 2012 |accessdate=28 October 2016}}</ref> Furthermore, the TRAC guidelines were used as a basis for the ISO 16363/TDR guidelines.<ref name="CCSDSRecomm11">{{cite web |url=https://public.ccsds.org/pubs/652x0m1.pdf |format=PDF |title=Audit and Certification of Trustworthy Digital Repositories |publisher=CCSDS |date=September 2011 |accessdate=28 October 2016}}</ref>
Given the multitude of recommendations and certification schemes, we set out to map the current landscape to compare criteria and analyze which repositories are recommended and certified by different parties. This paper is structured as follows: first, we investigate which repositories have been recommended and certified by different organizations. Next, we provide an analysis of the criteria used by organizations recommending repositories and the criteria used by certification schemes, and then derive a set of shared criteria for recommendation and certification. Lastly, we explore what this tells us about the overlap between recommendations and certifications.
==Methods==
===Lists of repositories===
====Recommended repositories====
To examine which repositories are being recommended, we looked at the recommendations of 17 different organizations, including academic publishers, funding agencies, and data organizations. These lists of recommended repositories include all the available recommendation lists currently found on the BioSharing (now "FAIRsharing") website under the Recommendations tab<ref name="FAIRsharingRecomm">{{cite web |url=https://fairsharing.org/recommendations/ |title=Recommendations |work=FAIRsharing.org |publisher=University of Oxford}}</ref> and those found in a web search by using the term “recommended data repositories.” These lists have been compiled by the American Geophysical Union, BBSRC<ref name="BBSRCResources">{{cite web |url=http://www.bbsrc.ac.uk/research/resources/ |title=Resources |publisher=BBSRC |accessdate=27 January 2017}}</ref>, BioSharing<ref name="FAIRsharingDatabases">{{cite web |url=https://fairsharing.org/databases/?q=&selected_facets=recommended:true |title=Databases |work=FAIRsharing.org |publisher=University of Oxford}}</ref>, COPDESS<ref name="COPDESSSearch">{{cite web |url=https://copdessdirectory.osf.io/search/ |title=Search for Repositories |publisher=COPDESS |accessdate=27 January 2017}}</ref>, DataMed<ref name="DataMedRepos">{{cite web |url=https://datamed.org/repository_list.php |title=Repository List |publisher=bioCADDIE |accessdate=27 January 2017}}</ref>, Elsevier<ref name="ElsevierSupp">{{cite web |url=https://www.elsevier.com/authors/author-services/research-data/data-base-linking/supported-data-repositories |title=Supported Data Repositories |publisher=Elsevier |accessdate=27 January 2017}}</ref>, EMBO Press<ref name="EMBOAuthor">{{cite web |url=http://msb.embopress.org/authorguide#datadeposition |title=Data Deposition |work=Author Guidelines |publisher=EMBO Press |accessdate=27 January 2017}}</ref>, F1000Research<ref name="F1000Data">{{cite web |url=https://f1000research.com/for-authors/data-guidelines |title=Data Guidelines |work=How to Publish |publisher=F1000 Research |accessdate=27 January 2017}}</ref>, GigaScience<ref name="GigsScienceEditor">{{cite web |url=https://academic.oup.com/gigascience/pages/editorial_policies_and_reporting_standards |title=Editorial Policies & Reporting Standards |publisher=Oxford University Press |accessdate=27 January 2017}}</ref>, NIH<ref name="NIHDataSharing">{{cite web |url=https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html |title=NIH Data Sharing Repositories |publisher=U.S. National Library of Medicine |accessdate=27 January 2017}}</ref>, PLOS<ref name="PLOSDataAvail">{{cite web |url=http://journals.plos.org/plosbiology/s/data-availability |title=Data Availability |publisher=PLOS |accessdate=27 January 2017}}</ref>, Scientific Data<ref name="SciDataRecommended">{{cite web |url=https://www.nature.com/sdata/policies/repositories |title=Recommended Data Repositories |work=Scientific Data |publisher=Macmillan Publishers Limited |accessdate=27 January 2017}}</ref>, Springer Nature/BioMed Central (both share the same list)<ref name="SpringerNatRecomm">{{cite web |url=http://www.springernature.com/gp/authors/research-data-policy/repositories/12327124?countryChanged=true |title=Recommended Repositories |work=Springer Nature |publisher=Springer-Verlag GmbH |accessdate=27 January 2017}}</ref>, Web of Science<ref name="WebofScienceMaster">{{cite web |url=http://wokinfo.com/cgi-bin/dci/search.cgi |title=Master Data Repository List |work=Web of Science |publisher=Clarivate Analytics |accessdate=27 January 2017}}</ref>, Wellcome Trust<ref name="WellcomeDataRepos">{{cite web |url=https://wellcome.ac.uk/funding/managing-grant/data-repositories-and-database-resources |title=Data repositories and database resources |publisher=Wellcome Trust |accessdate=27 January 2017}}</ref>, and Wiley. All lists, including links to the online lists, were compiled into one list to compare recommendations (http://dx.doi.org/10.17632/zx2kcyvvwm.1). Not all data repositories indexed by the Web of Science’s Data Citation Index (DCI) were included as there is no publicly available list with all repositories indexed by the DCI, so retrieval of recommended repositories was done through an individual search. The repositories indexed by Re3Data were not included in our list of recommended repositories as Re3data functions as “a global registry of research data repositories”<ref name="RE3DATAAbout">{{cite web |url=https://www.re3data.org/about |title=About |work=re3data.org |publisher=Karlsruhe Institute of Technology |accessdate=27 January 2017}}</ref> and thus does not recommend repositories. However, Re3Data was used to verify the repository’s status, persistent identifiers, and obtained certifications.


==References==
==References==
Line 40: Line 47:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance. The BioSharing website has since become the "FAIRSharing" website, and as such the original BioSharing links point to the new website. Several other website URLs have also changed, and the updated URL is used here. The original includes several inline citations that are not listed in the references section; they have been omitted here.


<!--Place all category tags here-->
<!--Place all category tags here-->

Revision as of 22:30, 17 October 2017

Full article title Recommended versus certified repositories: Mind the gap
Journal Data Science Journal
Author(s) Husen, Sean Edward; de Wilde, Zoë G.; de Waard, Anita; Cousijn, Helena
Author affiliation(s) Leiden University, Elsevier
Primary contact Email: s dot e dot husen at hum dot leidenuniv dot nl
Year published 2017
Volume and issue 16(1)
Page(s) 42
DOI 10.5334/dsj-2017-042
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/article/10.5334/dsj-2017-042/
Download https://datascience.codata.org/articles/10.5334/dsj-2017-042/galley/710/download/ (PDF)

Abstract

Researchers are increasingly required to make research data publicly available in data repositories. Although several organizations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository. In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organizations. We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organizations recommending and certifying repositories are similar, although the certification criteria are generally more detailed. We distill the lists of criteria into seven main categories: “Mission,” “Community/Recognition,” “Legal and Contractual Compliance,” “Access/Accessibility,” “Technical Structure/Interface,” “Retrievability,” and “Preservation.” Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than six percent obtained certification. As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.

Keywords: data repositories, data management, certification, data quality, data fitness

Introduction

Data sharing and data management are topics that are becoming increasingly important. More information is appearing about their benefits, such as increased citation rates for research papers with associated shared datasets.[1][2] A growing number of funding bodies such as the NIH and the Wellcome Trust[3][4], as well as several journals[5], have installed policies that require research data to be shared.[6] To be able to share data, both now and in the future, datasets not only need to be preserved but also need to be comprehensible and usable for others. To ensure these qualities, research data needs to be managed[7], and data repositories can play a role in maintaining the data in a usable structure.[8] However, using a data repository does not guarantee that the data is usable, since not every repository uses the same procedures and quality metrics, such as applying proper metadata tags.[9] As many repositories have not yet adopted generally accepted standards, it can be difficult for researchers to choose the right repository for their dataset.[7]

Several organizations, including funding agencies, academic publishers, and data organizations provide researchers with lists of supported or recommended repositories, e.g., BioSharing.[10] These lists vary in length, in the number and type of repositories they list, and in their selection criteria for recommendation. In addition, recommendations for data and data sharing are emerging, such as the FAIR Data Principles, guidelines to establish a common ground for all data to be findable, accessible, interoperable, and reusable.[11] Some data repositories are beginning to incorporate the FAIR principles into their policies, such as the UK Data Service[12] and several funders such as the EU Horizon 2020 program and the NIH.[13][14] Lists of recommended repositories and guidelines such as these can help researchers decide how and where to store and share their data.

Next to lists of recommended repositories, there are a number of schemes which specifically certify the quality of data repositories. One of the first of these certification schemes is the Data Seal of Approval (DSA), with an objective "to safeguard data, to ensure high quality and to guide reliable management of data for the future without requiring the implementation of new standards, regulations or high costs."[15] Building upon the DSA certification, but with more elaborate and detailed guidelines[16], is the Network of Expertise in Long-Term Storage of Digital Resources (NESTOR) and the ISO 16363 standard/Trusted Data Repository (TDR). DSA, NESTOR, and TDR form a three-step framework for data repository certification.[16] The ICSU-WDS membership incorporates guidelines from DSA, NESTOR and Trustworthy Repositories Audit & Certification (TRAC), among others, for its data repository framework.[17] Furthermore, the TRAC guidelines were used as a basis for the ISO 16363/TDR guidelines.[18]

Given the multitude of recommendations and certification schemes, we set out to map the current landscape to compare criteria and analyze which repositories are recommended and certified by different parties. This paper is structured as follows: first, we investigate which repositories have been recommended and certified by different organizations. Next, we provide an analysis of the criteria used by organizations recommending repositories and the criteria used by certification schemes, and then derive a set of shared criteria for recommendation and certification. Lastly, we explore what this tells us about the overlap between recommendations and certifications.

Methods

Lists of repositories

Recommended repositories

To examine which repositories are being recommended, we looked at the recommendations of 17 different organizations, including academic publishers, funding agencies, and data organizations. These lists of recommended repositories include all the available recommendation lists currently found on the BioSharing (now "FAIRsharing") website under the Recommendations tab[19] and those found in a web search by using the term “recommended data repositories.” These lists have been compiled by the American Geophysical Union, BBSRC[20], BioSharing[21], COPDESS[22], DataMed[23], Elsevier[24], EMBO Press[25], F1000Research[26], GigaScience[27], NIH[28], PLOS[29], Scientific Data[30], Springer Nature/BioMed Central (both share the same list)[31], Web of Science[32], Wellcome Trust[33], and Wiley. All lists, including links to the online lists, were compiled into one list to compare recommendations (http://dx.doi.org/10.17632/zx2kcyvvwm.1). Not all data repositories indexed by the Web of Science’s Data Citation Index (DCI) were included as there is no publicly available list with all repositories indexed by the DCI, so retrieval of recommended repositories was done through an individual search. The repositories indexed by Re3Data were not included in our list of recommended repositories as Re3data functions as “a global registry of research data repositories”[34] and thus does not recommend repositories. However, Re3Data was used to verify the repository’s status, persistent identifiers, and obtained certifications.

References

  1. Piwowar, H.A.; Vision, T.J. (2013). "Data reuse and the open data citation advantage". PeerJ 1: e175. doi:10.7717/peerj.175. PMC PMC3792178. PMID 24109559. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3792178. 
  2. Piwowar, H.A.; Day, R.S.; Fridsma, D.B. (2007). "Sharing detailed research data is associated with increased citation rate". PLoS One 2 (3): e308. doi:10.1371/journal.pone.0000308. PMC PMC1817752. PMID 17375194. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1817752. 
  3. "NIH Grants Policy Statement" (PDF). National Institutes of Health. 2015. https://grants.nih.gov/grants/policy/nihgps/nihgps.pdf. Retrieved 27 January 2017. 
  4. "Policy on data, software and materials management and sharing". Wellcome Trust. https://wellcome.ac.uk/funding/managing-grant/policy-data-software-materials-management-and-sharing. Retrieved 27 January 2017. 
  5. Borgman, C.L. (2012). "The conundrum of sharing research data". Journal of the American Society for Information Science and Technology 63 (6): 1059-1078. doi:10.1002/asi.22634. 
  6. Mayernik, M.S.; Callaghan, S.; Leigh, R. et al. (2014). "Peer Review of Datasets: When, Why, and How". Bulletin of the American Meteorological Society 96 (2): 191–201. doi:10.1175/BAMS-D-13-00083.1. 
  7. 7.0 7.1 Dobratz, S.; Rödig, P.; Borghoff, U.M. et al. (2010). "The Use of Quality Management Standards in Trustworthy Digital Archives". International Journal of Digital Curation 5 (1): 46–63. doi:10.2218/ijdc.v5i1.143. 
  8. Assante, M.; Candela, L.; Castelli, D. et al. (2016). "Are Scientific Data Repositories Coping with Research Data Publishing?". Data Science Journal 15 (6): 1–24. doi:10.5334/dsj-2016-006. 
  9. Merson, L.; Gaye, O.; Guerin, P.J. (2016). "Avoiding Data Dumpsters--Toward Equitable and Useful Data Sharing". New England Journal of Medicine 374 (25): 2414-5. doi:10.1056/NEJMp1605148. PMID 27168351. 
  10. McQuilton, P.; Gonzalez-Beltran, A.; Rocca-Serra, P. et al. (2016). "BioSharing: Curated and crowd-sourced metadata standards, databases and data policies in the life sciences". Database 2016: baw075. doi:10.1093/database/baw075. PMC PMC4869797. PMID 27189610. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4869797. 
  11. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175. 
  12. "The 'FAIR' principles for scientific data management". UK Data Service. 8 June 2016. https://www.ukdataservice.ac.uk/news-and-events/newsitem/?id=4615. Retrieved 28 October 2016. 
  13. "Guidelines on FAIR Data Management in Horizon 2020" (PDF). European Commission. 26 July 2016. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 27 October 2016. 
  14. "Big Data to Knowledge". National Institutes of Health. 2017. https://commonfund.nih.gov/bd2k. 
  15. "Data Seal of Approval: About". DSA Board. https://www.datasealofapproval.org/en/information/about/. Retrieved 18 January 2017. 
  16. 16.0 16.1 Dillo, I.; de Leeuw, L. (2015). "Ten Years Back, Five Years Forward: The Data Seal of Approval". International Journal of Digital Curation 10 (1): 363. doi:10.2218/ijdc.v10i1.363. 
  17. "Certification of WDS Members" (PDF). ICSU World Data System. 11 June 2012. https://www.icsu-wds.org/files/wds-certification-summary-11-june-2012.pdf. Retrieved 28 October 2016. 
  18. "Audit and Certification of Trustworthy Digital Repositories" (PDF). CCSDS. September 2011. https://public.ccsds.org/pubs/652x0m1.pdf. Retrieved 28 October 2016. 
  19. "Recommendations". FAIRsharing.org. University of Oxford. https://fairsharing.org/recommendations/. 
  20. "Resources". BBSRC. http://www.bbsrc.ac.uk/research/resources/. Retrieved 27 January 2017. 
  21. "Databases". FAIRsharing.org. University of Oxford. https://fairsharing.org/databases/?q=&selected_facets=recommended:true. 
  22. "Search for Repositories". COPDESS. https://copdessdirectory.osf.io/search/. Retrieved 27 January 2017. 
  23. "Repository List". bioCADDIE. https://datamed.org/repository_list.php. Retrieved 27 January 2017. 
  24. "Supported Data Repositories". Elsevier. https://www.elsevier.com/authors/author-services/research-data/data-base-linking/supported-data-repositories. Retrieved 27 January 2017. 
  25. "Data Deposition". Author Guidelines. EMBO Press. http://msb.embopress.org/authorguide#datadeposition. Retrieved 27 January 2017. 
  26. "Data Guidelines". How to Publish. F1000 Research. https://f1000research.com/for-authors/data-guidelines. Retrieved 27 January 2017. 
  27. "Editorial Policies & Reporting Standards". Oxford University Press. https://academic.oup.com/gigascience/pages/editorial_policies_and_reporting_standards. Retrieved 27 January 2017. 
  28. "NIH Data Sharing Repositories". U.S. National Library of Medicine. https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html. Retrieved 27 January 2017. 
  29. "Data Availability". PLOS. http://journals.plos.org/plosbiology/s/data-availability. Retrieved 27 January 2017. 
  30. "Recommended Data Repositories". Scientific Data. Macmillan Publishers Limited. https://www.nature.com/sdata/policies/repositories. Retrieved 27 January 2017. 
  31. "Recommended Repositories". Springer Nature. Springer-Verlag GmbH. http://www.springernature.com/gp/authors/research-data-policy/repositories/12327124?countryChanged=true. Retrieved 27 January 2017. 
  32. "Master Data Repository List". Web of Science. Clarivate Analytics. http://wokinfo.com/cgi-bin/dci/search.cgi. Retrieved 27 January 2017. 
  33. "Data repositories and database resources". Wellcome Trust. https://wellcome.ac.uk/funding/managing-grant/data-repositories-and-database-resources. Retrieved 27 January 2017. 
  34. "About". re3data.org. Karlsruhe Institute of Technology. https://www.re3data.org/about. Retrieved 27 January 2017. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance. The BioSharing website has since become the "FAIRSharing" website, and as such the original BioSharing links point to the new website. Several other website URLs have also changed, and the updated URL is used here. The original includes several inline citations that are not listed in the references section; they have been omitted here.