Difference between revisions of "Journal:Risk assessment for scientific data"
Shawndouglas (talk | contribs) (Saving and adding more.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 34: | Line 34: | ||
The risk factors that a given data collection or archive may face vary, depending on the data’s characteristics, the data’s current environment, and the priorities and resources available at the time. Many risks can be reduced or eliminated by following best practices codified as certifications and guidelines, such as the CoreTrustSeal Data Repository Certification<ref name="CTSCore">{{cite web |url=https://www.coretrustseal.org/ |title=CoreTrustSeal |author=CoreTrustSeal Standards and Certification Board |date=2020}}</ref>, as well as the ISO 16363:2012 standard, which defines audit and certification procedures for trustworthy digital repositories.<ref name="ISO16363_12">{{cite web |url=https://www.iso.org/standard/56510.html |title=ISO 16363:2012 - Space data and information transfer systems — Audit and certification of trustworthy digital repositories |publisher=International Organization for Standardization |date=February 2012}}</ref> Both the CoreTrustSeal certification and ISO 16363:2012 are based on the ISO 14721:2012 standard that defines the reference model for an open archival information system (OAIS).<ref name="ISO14721_12">{{cite web |url=https://www.iso.org/standard/56510.html |title=ISO 14721:2012 - Space data and information transfer systems — Open archival information system (OAIS) — Reference model |publisher=International Organization for Standardization |date=September 2012}}</ref> But these certifications can be large and complex. Additionally, many of the organizations that hold valuable scientific data collections may not be aware of these standards, even if the organizations are potentially resourced to tackle the challenge.<ref name="MaemuraOrgan17">{{cite journal |title=Organizational assessment frameworks for digital preservation: A literature review and mapping |journal=JASIST |author=Maemura, E.; Moles, N.; Becker, C. |volume=68 |issue=7 |pages=1619–37 |year=2017 |doi=10.1002/asi.23807}}</ref> Further, the attainment of such certifications does not necessarily reduce the risks to data that are outside of the scope of a particular certification instrument. | The risk factors that a given data collection or archive may face vary, depending on the data’s characteristics, the data’s current environment, and the priorities and resources available at the time. Many risks can be reduced or eliminated by following best practices codified as certifications and guidelines, such as the CoreTrustSeal Data Repository Certification<ref name="CTSCore">{{cite web |url=https://www.coretrustseal.org/ |title=CoreTrustSeal |author=CoreTrustSeal Standards and Certification Board |date=2020}}</ref>, as well as the ISO 16363:2012 standard, which defines audit and certification procedures for trustworthy digital repositories.<ref name="ISO16363_12">{{cite web |url=https://www.iso.org/standard/56510.html |title=ISO 16363:2012 - Space data and information transfer systems — Audit and certification of trustworthy digital repositories |publisher=International Organization for Standardization |date=February 2012}}</ref> Both the CoreTrustSeal certification and ISO 16363:2012 are based on the ISO 14721:2012 standard that defines the reference model for an open archival information system (OAIS).<ref name="ISO14721_12">{{cite web |url=https://www.iso.org/standard/56510.html |title=ISO 14721:2012 - Space data and information transfer systems — Open archival information system (OAIS) — Reference model |publisher=International Organization for Standardization |date=September 2012}}</ref> But these certifications can be large and complex. Additionally, many of the organizations that hold valuable scientific data collections may not be aware of these standards, even if the organizations are potentially resourced to tackle the challenge.<ref name="MaemuraOrgan17">{{cite journal |title=Organizational assessment frameworks for digital preservation: A literature review and mapping |journal=JASIST |author=Maemura, E.; Moles, N.; Becker, C. |volume=68 |issue=7 |pages=1619–37 |year=2017 |doi=10.1002/asi.23807}}</ref> Further, the attainment of such certifications does not necessarily reduce the risks to data that are outside of the scope of a particular certification instrument. | ||
This paper presents an analysis of data risk factors that stakeholders of scientific data collections and archives may face, and a matrix to support data risk assessments to help ameliorate those risks. The three driving questions for this analysis are: | |||
* How do stakeholders assess what data are at risk? | |||
* How do stakeholders characterize what risk factors data collections and/or archives face? | |||
* How do stakeholders make the associated risks more transparent, internally and/or externally? | |||
The goals of this work are to inform and enable effective data risk assessment by: a) individuals and organizations who manage data collections, and b) individuals and organizations who want to help to reduce the risks associated with data preservation and stewardship. Stakeholders for these two activities include producers, stewards, sponsors, and users of data, as well as the management and staff of the institutions that employ them. | |||
==Background== | |||
This project was coordinated through the Data Stewardship Committee within the Earth Science Information Partners (ESIP), a non-profit organization that exists to support collection, stewardship, and use of earth science data, [[information]], and knowledge.{{Efn|See https://wiki.esipfed.org/Preservation_and_Stewardship.}} The immediate motivation for the project stemmed from the Data Stewardship Committee members engaging with groups who were undertaking grass-roots “data rescue” initiatives after the 2016 U.S. presidential election. At that time, a number of loosely organized and coordinated efforts were initiated to duplicate data from U.S. government organizations to prevent potential politically motivated data deletion or obfuscation.<ref name="DennisScien16">{{cite web |url=https://www.washingtonpost.com/news/energy-environment/wp/2016/12/13/scientists-are-frantically-copying-u-s-climate-data-fearing-it-might-vanish-under-trump/ |title=Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump |author=Dennis, B. |work=The Washington Post |date=13 December 2016}}</ref><ref name="VarinskyScien16">{{cite web |url=https://www.businessinsider.com/data-rescue-government-data-preservation-efforts-2017-2 |title=Scientists across the US are scrambling to save government research in 'Data Rescue' events |author=Varinsky, D. |work=Business Insider |date=11 February 2017}}</ref> In many cases, these initiatives specifically focused on duplicating government-hosted earth science data. | |||
==Footnotes== | ==Footnotes== |
Revision as of 20:30, 14 December 2020
Full article title | Risk assessment for scientific data |
---|---|
Journal | Data Science Journal |
Author(s) |
Mayernik, Matthew S.; Breseman, Kelsey; Downs, Robert R.; Duerr, Ruth; Garretson, Alexis; Hou, Chung-Yi, EDGI and ESIP Data Stewardship Committee[a] |
Author affiliation(s) |
National Center for Atmospheric Research, Environmental Data & Governance Initiative, Columbia University, Ronin Institute for Independent Scholarship, George Mason University |
Primary contact | Email: mayernik at ucar dot edu |
Year published | 2020 |
Volume and issue | 19(1) |
Article # | 10 |
DOI | 10.5334/dsj-2020-010 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://datascience.codata.org/articles/10.5334/dsj-2020-010/ |
Download | https://datascience.codata.org/articles/10.5334/dsj-2020-010/galley/944/download/ (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
Ongoing stewardship is required to keep data collections and archives in existence. Scientific data collections may face a range of risk factors that could hinder, constrain, or limit current or future data use. Identifying such risk factors to data use is a key step in preventing or minimizing data loss. This paper presents an analysis of data risk factors that scientific data collections may face, and a data risk assessment matrix to support data risk assessments to help ameliorate those risks. The goals of this work are to inform and enable effective data risk assessment by: a) individuals and organizations who manage data collections, and b) individuals and organizations who want to help to reduce the risks associated with data preservation and stewardship. The data risk assessment framework presented in this paper provides a platform from which risk assessments can begin, and a reference point for discussions of data stewardship resource allocations and priorities.
Keywords: risk assessment, data preservation, data stewardship, metadata
Introduction
At the “The Rescue of Data At Risk” workshop held in Boulder, Colorado on September 8 and 9, 2016[b], participants were asked the following question: “How would you define ‘at-risk’ data?” Discussions on this point ranged widely and touched on several challenges, including lack of funding or personnel support for data management, natural and political disasters, and metadata loss. One participant’s organization’s definition of risk, however, stood out: “data were considered to be at-risk unless they had a dedicated plan to not be at-risk.” This simple statement vividly depicts how data’s default state is being in a state of risk. In other words, ongoing stewardship is required to keep data collections and archives in existence.
The risk factors that a given data collection or archive may face vary, depending on the data’s characteristics, the data’s current environment, and the priorities and resources available at the time. Many risks can be reduced or eliminated by following best practices codified as certifications and guidelines, such as the CoreTrustSeal Data Repository Certification[1], as well as the ISO 16363:2012 standard, which defines audit and certification procedures for trustworthy digital repositories.[2] Both the CoreTrustSeal certification and ISO 16363:2012 are based on the ISO 14721:2012 standard that defines the reference model for an open archival information system (OAIS).[3] But these certifications can be large and complex. Additionally, many of the organizations that hold valuable scientific data collections may not be aware of these standards, even if the organizations are potentially resourced to tackle the challenge.[4] Further, the attainment of such certifications does not necessarily reduce the risks to data that are outside of the scope of a particular certification instrument.
This paper presents an analysis of data risk factors that stakeholders of scientific data collections and archives may face, and a matrix to support data risk assessments to help ameliorate those risks. The three driving questions for this analysis are:
- How do stakeholders assess what data are at risk?
- How do stakeholders characterize what risk factors data collections and/or archives face?
- How do stakeholders make the associated risks more transparent, internally and/or externally?
The goals of this work are to inform and enable effective data risk assessment by: a) individuals and organizations who manage data collections, and b) individuals and organizations who want to help to reduce the risks associated with data preservation and stewardship. Stakeholders for these two activities include producers, stewards, sponsors, and users of data, as well as the management and staff of the institutions that employ them.
Background
This project was coordinated through the Data Stewardship Committee within the Earth Science Information Partners (ESIP), a non-profit organization that exists to support collection, stewardship, and use of earth science data, information, and knowledge.[c] The immediate motivation for the project stemmed from the Data Stewardship Committee members engaging with groups who were undertaking grass-roots “data rescue” initiatives after the 2016 U.S. presidential election. At that time, a number of loosely organized and coordinated efforts were initiated to duplicate data from U.S. government organizations to prevent potential politically motivated data deletion or obfuscation.[5][6] In many cases, these initiatives specifically focused on duplicating government-hosted earth science data.
Footnotes
- ↑ We list EDGI and the ESIP Data Stewardship Committee as authors due to the contributions of many individuals from both organizations to the work described in this paper. The named authors are the individuals involved in each organization who contributed directly to the paper’s text.
- ↑ The workshop was organized under the auspices of the Research Data Alliance (RDA) and the Committee on Data (CODATA) within the International Science Council.
- ↑ See https://wiki.esipfed.org/Preservation_and_Stewardship.
References
- ↑ CoreTrustSeal Standards and Certification Board (2020). "CoreTrustSeal". https://www.coretrustseal.org/.
- ↑ "ISO 16363:2012 - Space data and information transfer systems — Audit and certification of trustworthy digital repositories". International Organization for Standardization. February 2012. https://www.iso.org/standard/56510.html.
- ↑ "ISO 14721:2012 - Space data and information transfer systems — Open archival information system (OAIS) — Reference model". International Organization for Standardization. September 2012. https://www.iso.org/standard/56510.html.
- ↑ Maemura, E.; Moles, N.; Becker, C. (2017). "Organizational assessment frameworks for digital preservation: A literature review and mapping". JASIST 68 (7): 1619–37. doi:10.1002/asi.23807.
- ↑ Dennis, B. (13 December 2016). "Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump". The Washington Post. https://www.washingtonpost.com/news/energy-environment/wp/2016/12/13/scientists-are-frantically-copying-u-s-climate-data-fearing-it-might-vanish-under-trump/.
- ↑ Varinsky, D. (11 February 2017). "Scientists across the US are scrambling to save government research in 'Data Rescue' events". Business Insider. https://www.businessinsider.com/data-rescue-government-data-preservation-efforts-2017-2.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.