Journal:Towards a risk catalog for data management plans

From LIMSWiki
Revision as of 23:59, 5 March 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Towards a risk catalog for data management plans
Journal International Journal of Digital Curation
Author(s) Weng, Franziska; Thoben, Stella
Author affiliation(s) Kiel University
Primary contact Email: franziskaweng at web dot de
Year published 2020
Volume and issue 15(1)
Page(s) 18
DOI 10.2218/ijdc.v15i1.697
ISSN 1746-8256
Distribution license Creative Commons Attribution 4.0 International
Website http://www.ijdc.net/article/view/697
Download http://www.ijdc.net/article/view/697/614 (PDF)

Abstract

Although data management and its careful planning are not new topics, there is little published research on risk mitigation in data management plans (DMPs). We consider it a problem that DMPs do not include a structured approach for the identification or mitigation of risks, because it would instill confidence and trust in the data and its stewards, and foster the successful conduction of data-generating projects, which often are funded research projects. In this paper, we present a lightweight approach for identifying general risk in DMPs. We introduce an initial version of a generic risk catalog for funded research and similar projects. By analyzing a selection of 13 DMPs for projects from multiple disciplines published in the Research Ideas and Outcomes (RIO) journal, we demonstrate that our approach is applicable to DMPs and transferable to multiple institutional constellations. As a result, the effort for integrating risk management in data management planning can be reduced.

Keywords: data management plan, data management, risk management, risk assessment, information security

Introduction

University of New Mexico's William Michener describes a data management plan (DMP) as "a document that describes how you will treat your data during a project and what happens with the data after the project ends.”[1] The Digital Curation Centre's (DCC) Martin Donnelly notes that DMPs “serve to mitigate risks and help instill confidence and trust in the data and its stewards.”[2] Sarah Jones, also of the DCC, adds that “planning for the effective creation, management, and sharing of your data enables you to get the most out of your research.”[3] As such, the creation of a DMP should not only happen for obtaining a grant but also for successfully conducting the proposed project.

According to ISO 31000[4], a risk is “an effect of uncertainty on objectives.” Data management plans should help to decrease effects of uncertainty on project objectives. We consider it a problem that neither DMPs nor funders’ DMP evaluation schemes include a structured approach for the identification or mitigation of risks, since this would foster the successful conduction of data-generating projects, which often are funded research projects. We believe our approach will help funders evaluate risks of proposed projects and hence the risks of their investment options.

Data management maturity models like the Data Management Maturity (DMM) model[5] or the Enterprise Information Management (EIM) maturity model[6] are primarily designed for enterprises and may not be feasible for higher education institutions (HEIs). A rigid model for HEIs to coordinate support of data management and sharing across a diverse range of actors and processes to deliver the necessary technological and human infrastructures “cannot be prescribed since individual organizations and cultures occupy a spectrum of differences.”[7] Also, there is a potential conflict between organizational demands and scientific freedom. The Charter of Fundamental Rights of the E.U. contains scientific freedom as a constitutional right, and researchers may view the imposition of specific data management processes as a restriction of their scientific freedom. On an even more international level, the UNESCO recommends that “each Member State should institute procedures adapted to its needs for ensuring that, in the performance of research and development, scientific researchers respect public accountability while at the same time enjoying the degree of autonomy appropriate to their task and to the advancement of science and technology.”[8]

We consider it important, that researchers commit themselves to data management practices like e.g., ISO 31000. However, ISO 31000 defines the risk management process as a feedback loop to be conducted in organizations.[4] Projects tend to have a much more limited scope with regard to funding and duration than organizations. Therefore, we regard the ISO 31000 risk management process as too time-consuming and of limited suitability for funded research and similar projects.

In this paper, we propose a lightweight approach for the identification of general risks in DMPs. We introduce an initial version of a generic risk catalog for funded research and similar projects. By analyzing a selection of 13 DMPs for projects from multiple disciplines published in the Research Ideas and Outcomes (RIO) journal[9][10][11][12][13][14][15][16][17][18][19][20][21], we demonstrate that our approach is applicable and transferable to multiple institutional constellations. As a result, the effort for integrating risk management in data management planning can be reduced.

Related work

Jones et al. developed a guide for HEIs “to help institutions understand the key aims and issues associated with planning and implementing research data management (RDM) services.”[7] In this guide, the authors mention data management risks for HEIs. They note that While the upfront costs for cheap storage of active data “may be only a fraction of those quoted by central services, the risks of data loss and security breaches are significantly higher, potentially leading to far greater costs in the long term.”[7] Additionally, there are “potential legal risks from using third-party services.”[7] However, data selection counters the risks of “reputational damage from exposing dirty, confidential, or undocumented data that has been retained long after the researchers who created it have left.”[7]

The OSCRP working group developed the OSCRP (Open Science Cyber Risk Profile), which “is designed to help principal investigators (PI) and their supporting information technology (IT) professionals assess cybersecurity risks related to open science projects.”[22] The OSCRP working group proposes that principal investigators examine risks, consequences and avenues of attack for each mission critical science asset on an inventory list, whereas assets include devices, systems, data, personnel, workflows, and other kinds of resources.[22] We regard this as a very detailed alternative to our approach, but FAIR Guiding Principles[23] and long-term preservation need to be added.

In 2014, Ferreira et al.[24] “propose an analysis process for eScience projects using a data management plan and ISO 31000 in order to create a risk management plan that can complement the data management plan.” The authors describe an analytical process for creating a risk management plan and “present the previous process’ validation, based on the MetaGen-FRAME project.”[24] Within this validation Ferreira et al. also identify a project’s task-specific risks, e.g., “R6: Loss of metadata, denying the representation of the output information to the user via Taverna.”[24] This risk is tailored to the use of Taverna and hence may not be relevant for the majority of funded research and similar projects. There may be projects for which analyzing specific risks for all resources may be crucial. However, a detailed risk analysis may require a considerable amount of work.

Methods

We propose a lightweight approach that can serve as a starting point to include risk management in research data management planning. It doesn’t preclude detailed approaches like OSCRP[22] or ISO 31000.[4] Instead, we propose an approach which tries to reduce and maybe avoid the burden of a full risk management process like, e.g., ISO 31000. Our approach is based on a pre-tailored and extensible general risk catalog (Table 1) to lessen the effort required for risk management. We derived part of this risk catalog from 29 interviews with researchers from multiple disciplines[a], which we conducted as part of project SynFo: Creating synergies on the operational level of research data management.[25] One goal of project SynFo was the development of a transferable approach to improve research data management in multiple organizational constellations. In generalized content from the interviews, we identified risks entailed by interfaces of information, e.g., between researchers and data subjects or between researchers and external service providers. For the development of our approach, we also consulted the catalogs for threats and measures from the supplement of the “IT-Grundschutz” catalogs[26] by the German Federal Office for Information Security (BSI), the FAIR Guiding Principles[23], and the report and action plan from the European Commission expert group on FAIR data.[27]

Table 1. General risk catalog
Risk category Risk [CODE] Possible risk source
Legal Penalty for conducting unreported notifiable practices [RLEGU] Physical sample collection
Penalty for unpermitted usage of external data [RLEGE] Processing external data
Penalty for unpermitted usage of personal data [RLEGP] Processing personal data
Penalty for conducting inadequate data protection practices [RLEGD] Using an external service provider for processing personal data
Privacy Loss of confdentiality through sending data to an unintended recipient [RPRIR] Correspondence
Loss of confdentiality through interception or eavesdropping of information [RPRII] Online data transmission
Loss of confdentiality through loss or theft of portable storage media or devices [RPRIS] Portable storage media or devices
Loss of confdentiality through careless data handling by an external party [RPRIE] Sharing data with an external party without publication purposes
Technical Unavailability through data corruption [RTECC] Data processing
Unavailability through data loss [RTECL] Data storage
Science Poor knowledge discovery or reusability for stakeholders cannot fnd the data [RSCIF] Searchable information not planned
Poor knowledge discovery or reusability for stakeholders cannot access the data [RSCIA] Sharing location not planned
Poor knowledge discovery or reusability for stakeholders cannot integrate the data [RSCII] File format not planned
Poor knowledge discovery or reusability for stakeholders cannot reuse the data [RSCIR] Licensing and context information not planned
Preservation Unsustainability in the long-term through unavailability or discontinuity of financial support [RPREU] Preservation location not planned

Our risk identification includes risks, their possible risk sources, mitigation approaches, and consequences. By analyzing occurrences and mitigations of risks from our catalog within a selection of 13 DMPs from multiple disciplines[b], published in the ‘’RIO’’ journal, we demonstrate that our lightweight approach is applicable to DMPs and transferable to multiple institutional constellations. We evaluate the occurrences of the 15 risks in our catalog by identifying possible risk sources in each of the selected DMPs and analyze the risk mitigations in accordance to what the authors wrote.

Risks

Legal risks

A breach of a regulation like the General Data Protection Regulation (GDPR) or the Nagoya Protocol can result in high fines. At worst, compliance breaches can lead to reputational damages, legal disputes, and enormous cost.

Penalty for conducting unreported notifiable practices [RLEGU]

Research may include reportable research practices like the collection of physical samples regulated by the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization, which was transposed into E.U. law by Regulation (EU) No 511/2014. Under this regulation, there is a reporting obligation if the research on genetic resources is financially supported (Art. 7, Sec. 1) and if the research is in the final stage of development of a product that is based on the utilization of genetic resources (Art. 7, Sec.2).[28] Article 11 says that “Member States shall lay down the rules on penalties applicable to infringements of Articles 4 and 7 and shall take all the measures necessary to ensure that they are applied.”[28] The Nagoya Protocol “and EU documents themselves give no guidance on penalties, each country has the liberty to determine these.”[29] Consequences may be fines of up to EUR 810,000 or even imprisonment.[29] To avoid penalties, the parties should comply strictly with the rules. The Convention on Biological Diversity publishes a detailed list of parties to the Nagoya Protocol.[c]

Penalty for unpermitted usage of external data [RLEGE]

In many countries, data by themselves do not have inherent legal protection. License contracts can reach various agreements concerning terms of use. Free licenses make (data) objects available for utilization to everyone, but usage can be restricted or conditioned. Creative Commons (CC) licenses and the GNU General Public License (GPL), which is specialized for free software, are widely used. Nonetheless, using CC licenses can lead to conflicting rights of third parties. Publicity, personality, and privacy rights “not held by the licensor are not affected and may still affect your desired use of a licensed work.”[30] “If there are any third parties who may have publicity, privacy, or personality rights that apply, those rights are not affected by your application of a CC license, and a reuser must seek permission for relevant uses.”[30] This example holds for pictures of persons. Also, the GNU GPL license imposes transitive obligations, e.g., “derivative programs must also be subject to the same initial GPL conditions of ability to copy, modify, or redistribute.”[31] To mitigate the risk of unpermitted usage of external data, it is recommended to abide by the license terms. In general, an overview about the data and the related licenses can be developed in the DMP or within the framework of a data policy.


Footnotes

  1. Geo sciences (12), biology (5), humanities (5), social and behavioral sciences (4), computer science, systems engineering and electrical engineering (2), and medicine (1)
  2. Biology (4), geo sciences (4), social and behavioural sciences (3), computer science, systems engineering and electrical engineering (1), and humanities (1)
  3. Parties to the Nagoya Protocol

References

  1. Michener, W.K. (2015). "Ten Simple Rules for Creating a Good Data Management Plan". PLoS Computational Biology 11 (10): e1004525. doi:10.1371/journal.pcbi.1004525. 
  2. Donnelly, M. (2012). "Chapter 5: Data management plans and planning". In Pryor, G.. Managing Research Data. Facet. pp. 83–104. doi:10.29085/9781856048910.006. ISBN 9781856048910. 
  3. Jones, S. (2011). "How to Develop a Data Management and Sharing Plan". Digital Curation Centre. https://www.dcc.ac.uk/guidance/how-guides/develop-data-plan. Retrieved 19 November 2019. 
  4. 4.0 4.1 4.2 "ISO 31000:2018 Risk management — Guidelines". International Organization for Standardization. February 2018. https://www.iso.org/standard/65694.html. 
  5. "Data Management Maturity (DMM)". Information System Audit and Control Association, Inc. 2019. https://cmmiinstitute.com/data-management-maturity. Retrieved 22 November 2019. 
  6. Newman, D.; Logan, D. (23 December 2008). "Overview: Gartner Introduces the EIM Maturity Model". Gartner. https://www.gartner.com/en/documents/846312/overview-gartner-introduces-the-eim-maturity-model. 
  7. 7.0 7.1 7.2 7.3 7.4 Jones, S.; Pryor, G.; Whyte, A. (25 March 2013). "How to Develop RDM Services - A guide for HEIs". Digital Curation Centre. https://www.dcc.ac.uk/guidance/how-guides/how-develop-rdm-services. Retrieved 19 November 2019. 
  8. UNESCO (2017). "Records of the General Conference, 39th session, Paris, 30 October-14 November 2017, v. 1: Resolutions". p. 116. https://unesdoc.unesco.org/ark:/48223/pf0000260889.page=116. Retrieved 30 November 2019. 
  9. Canhos, D.A.L. (2017). "Data Management Plan: Brazil's Virtual Herbarium". RIO 3: e14675. doi:10.3897/rio.3.e14675. 
  10. Fey, J.; Anderson, S. (2016). "Boulder Creek Critical Zone Observatory Data Management Plan". RIO 2: e9419. doi:https://doi.org/10.3897/rio.2.e9419. 
  11. Fisher, J.; Nading, A.M. (2016). "A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social Policy". RIO 2: e8720. doi:10.3897/rio.2.e8720. 
  12. Gatto, L. (2017). "Data Management Plan for a Biotechnology and Biological Sciences Research Council (BBSRC) Tools and Resources Development Fund (TRDF) Grant". RIO 3: e11624. doi:10.3897/rio.3.e11624. 
  13. McWhorter, J.; Wright, D.; Thomas, J. (2016). "Coastal Data Information Program (CDIP)". RIO 2: e8827. doi:10.3897/rio.2.e8827. 
  14. Neylon, C. (2017). "Data Management Plan: IDRC Data Sharing Pilot Project". RIO 3: e14672. doi:10.3897/rio.3.e14672. 
  15. Nichols, H.; Stolze, S. (2016). "Migration of legacy data to new media formats for long-time storage and maximum visibility: Modern pollen data from the Canadian Arctic (1972/1973)". RIO 2: e10269. doi:10.3897/rio.2.e10269. 
  16. Pannell, J.L. (2016). "Data Management Plan for PhD Thesis "Climatic Limitation of Alien Weeds in New Zealand: Enhancing Species Distribution Models with Field Data"". RIO 2: e10600. doi:10.3897/rio.2.e10600. 
  17. Traynor, C. (2017). "Data Management Plan: Empowering Indigenous Peoples and Knowledge Systems Related to Climate Change and Intellectual Property Rights". RIO 3: e15111. doi:10.3897/rio.3.e15111. 
  18. Wael, R. (2017). "Data Management Plan: HarassMap". RIO 3: e15133. doi:10.3897/rio.3.e15133. 
  19. White, E.P. (2016). "Data Management Plan for Moore Investigator in Data Driven Discovery Grant". RIO 2: e10708. doi:10.3897/rio.2.e10708. 
  20. Woolfrey, L. (2017). "Data Management Plan: Opening access to economic data to prevent tobacco related diseases in Africa". RIO 3: e14837. doi:10.3897/rio.3.e14837. 
  21. Xu, H.; Ishida, M.; Wang, M. (2016). "A Data Management Plan for Effects of particle size on physical and chemical properties of mine wastes". RIO 2: e11065. doi:10.3897/rio.2.e11065. 
  22. 22.0 22.1 22.2 Peisert, S.; Welch, V.; Adams, A. et al. (2017). "Open Science Cyber Risk Profile (OSCRP)". IUScholar Works. http://hdl.handle.net/2022/21259. Retrieved 19 November 2019. 
  23. 23.0 23.1 Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175. 
  24. 24.0 24.1 24.2 Ferreira, F.; Coimbra, M.E.; Bairrão, R. et al. (2014). "Data Management in Metagenomics: A Risk Management Approach". International Journal of Digital Curation 9 (1): 41–56. doi:10.2218/ijdc.v9i1.299. 
  25. University Computing Centre (Rechenzentrum) (2019). "SynFo - Creating synergies on the operational level of research data management". Kiel University. https://www.rz.uni-kiel.de/en/projects/synfo-creating-synergies-on-the-operational-level-of-research-data-management. 
  26. German Federal Office for Information Security (22 December 2016). "IT-Grundschutz-catalogues 15th version - 2015 (Draft)". Archived from the original on 28 January 2020. https://web.archive.org/web/20200128211607/https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Grundschutz/International/GSK_15_EL_EN_Draft.html. Retrieved 19 November 2019. 
  27. Collins, S.; Genova, F.; Harrower, N. (26 November 2018). "Turning FAIR into reality". European Commission. doi:10.2777/1524. https://op.europa.eu/en/publication-detail/-/publication/7769a148-f1f6-11e8-9982-01aa75ed71a1/language-en. 
  28. 28.0 28.1 "Regulation (EU) No 511/2014 of the European Parliament and of the Council of 16 April 2014 on compliance measures for users from the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization in the Union Text with EEA relevance". EUR-Lex. European Union. 16 April 2014. http://data.europa.eu/eli/reg/2014/511/oj. Retrieved 01 December 2019. 
  29. 29.0 29.1 "Implementation of Nagoya Protocol: A comparison between The Netherlands, Belgium and Germany". vo.eu. 18 June 2018. https://publications.vo.eu/implementation-of-nagoya-protocol/. Retrieved 30 November 2019. 
  30. 30.0 30.1 "Frequently Asked Questions". Creative Commons. https://creativecommons.org/faq/. Retrieved 08 December 2019. 
  31. Lipinski, T.A. (2012). Librarian's Legal Companion for Licensing Information Resources and Legal Services. Neal-Schuman Publishers. p. 312. ISBN 9781555706104. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; this version lists them in order of appearance, by design. [[Category:LIMSwiki journal articles on cybersecurity]