Journal:Restricted data management: The current practice and the future

From LIMSWiki
Revision as of 20:43, 29 April 2024 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Restricted data management: The current practice and the future
Journal Journal of Privacy and Confidentiality
Author(s) Jang, Joy B.; Pienta, Amy; Levenstein, Margaret; Saul, Joe
Author affiliation(s) Inter-university Consortium for Political and Social Research (ICPSR) at University of Michigan
Primary contact Email: oyjang at umich dot edu
Year published 2023
Volume and issue 13(2)
Page(s) 1–9
DOI 10.29012/jpc.844
ISSN 2575-8527
Distribution license Creative Commons Attribution-NonCommercial-NoDeriv 4.0 International
Website https://journalprivacyconfidentiality.org/index.php/jpc/article/view/844
Download https://journalprivacyconfidentiality.org/index.php/jpc/article/view/844/753 (PDF)

Abstract

Many restricted data managing organizations across the world have adapted the Five Safes framework (i.e., safe data, projects, people, setting, and output) for their management of restricted and confidential data. While the Five Safes have been well integrated throughout the data life cycle, organizations observe several unintended challenges regarding making that data be FAIR (findable, accessible, interoperable, and reusable). In the current study, we review the current practice on restricted data management and discuss challenges and future directions, especially focusing on data use agreements, disclosure risks review, and training. In the future, restricted data managing organizations may need to proactively take into consideration reducing inequalities in access to scientific development, preventing unethical use of data in their management of restricted and confidential data, and managing various types of data.

Keywords: confidentiality, data governance, FAIR, training

Introduction

Since the introduction of the Five Safes in the mid-2010s [Desai, Ritchie, and Welpton, 2016; Ritchie, 2017], many organizations managing restricted data have adopted the framework for the management of restricted and confidential data. The Five Safes framework helps organizations set guidelines for safe data created by data providers, safe projects for public good, safe people who are authenticated data users, safe settings in which data are being used, and safe outputs from analyzing data. The Five Safes have been well-integrated throughout the data life cycle, and have led to good stewardship practices to make scientific data FAIR (findable, accessible, interoperable, and reusable). It also helps multiple stakeholders balance data utilization with protection of subject privacy and data confidentiality. Despite successful implementation of the Five Safes, organizations encounter unintended challenges. In this paper, we review the current practice of restricted data management and discuss challenges and future directions, focusing on data use agreements, disclosure risk review, and training for data users.

While organizations implement multiple modes of data access (e.g., virtual data enclaves [VDEs], physical data enclaves [PDEs], secure encrypted file downloads), our discussion may apply mostly to VDE and PDE. Further, our discourse is centered around quantitative data, although we do not restrict the implications to only that type of data. In other words, even though our discussion on current practices may be largely reliant on our experience with quantitative data accessible via VDE or PDE, the implications of our study may extend to newly emerged data types such as research notes, video, and electroencephalography.

Data use agreements

Data use agreements (DUAs) are risk mitigation tools that clarify expectations among multiple stakeholders. [O’Hara, 2020] DUAs must be entered into before any use or access to data by users, and may require periodic updates. DUAs may contain all Five Safes components: safe data (description of how data have been and will be treated for protection of any disclosure risks); safe people (data users’ credentials); safe projects (research proposals demonstrating the intended data use); safe setting (plans for safe data access and handling); and safe outputs (procedures or rules on output publication and release). For some organizations, DUAs are stand-alone documents containing all five components. Other organizations require quite short DUAs, accompanied by separate materials such as a detailed research proposal, approval or exemption from an Institutional Review Board (IRB), and CVs from participants in the research project. Involvement of multiple stakeholders in DUAs means that DUAs allow for negotiations and pursuit of consensus among parties.

Many organizations are bound by federal, state, and local laws, regulations, or policies reflecting their capability to access direct identifiers in the datasets. DUAs specify terms and conditions for data access and use, and clarify liability issues in advance. This upfront emphasis on DUAs would help mitigate confusion regarding liability in case of data breaches or suspected security incidents. DUAs require data users’ authenticated credentials; some organizations additionally ask for involvement of the researchers’ institutions in DUAs as a leverage to enforce consequences for the institution. [Levenstein, 2020] Not only for legal leverage, but also the involvement of institutional representatives in DUAs would help implement safe use of data by researchers. Research shows that many data users care more about their personal penalties (loss of access and funding, opinions of colleagues) rather than legal ones, if any incident happens. [Green, et al., 2017] Having multiple layers of liability may safeguard data breaches or protocol violations by users. However, involvement of the institutions in the DUAs may impose a hurdle for research teams with collaborators from multiple institutions or from different countries. DUAs for research projects of this nature may have to consider heterogeneous requirements with regard to data privacy, confidentiality, and liability issues, which may cause significant delays in the process of data use.

Below, we discuss four distinctive challenges that organizations encounter with regard to restricted data management: limited opportunities of data access for certain groups of individuals; DUAs for research projects involving multiple institutions; limitations on binding laws against failure to DUA compliance; and costs to access data.

Limited opportunities for data access by certain groups

As described, institutional involvement may help enforce consequences for both the institution and individual researchers. Data users who are affiliated with so-called typical research institutions (e.g., universities, government agencies, research institutes) have an institutional representative involved in the DUA process, and work with organizations without substantial challenges. Most of the processes are seamless, unless stakeholders raise concerns. (Even with concerns, the most serious challenge may be a delay in the process.) However, a requirement of institutional involvement can impose an insurmountable hurdle for those without an institutional affiliation, such as freelance journalists or students without academic advisors or from institutions with no experience. Researchers and institutions negotiate details in DUAs and pursue consensus with data managing organizations, which could be a tremendous burden for small institutions. While institutional involvement is meant to help keep safe people safer, it may have unintentionally excluded researchers without that leverage. An exemption for those who have been authorized and been good users at other organizations may need to be considered, and a template agreement that may mitigate the burdens should be available. [Levenstein, et al., 2018; O’Hara, 2020] Effective user training for ethical and scientific use of data may be helpful to alleviate concerns regarding data misuse by those with limited experience.

DUAs (or other supplement materials) require safe settings to access restricted data. Safe setting in DUAs designates a space in which no authorized views are allowable, for instance, an office space with a door that lacks a window. Shared space is not accepted by some organizations as a secure setting. Again, this requirement may impose a barrier for those with limited resources, such as students who would access restricted data in a shared office or cubicle. Organizations may need to consider embracing those who have limited resources by accommodating their needs (e.g., using a privacy screen for those who access data in a shared office).

DUAs for research projects involving multiple institutions

When researchers from multiple institutions collaborate in a single research project, each institution would enter into the DUAs. DUAs clarify expectations and responsibilities for each institution according to the research plan. The process is often complicated when institutions are located in different countries (e.g., legitimacy of credential authentication or IRB approval in different languages). O’Hara [2020] suggests considering other forms of documentation in multi-site research projects, such as a memorandum of understanding (MOU) and identification of conflicts of interest. In some cases, requiring identical DUAs with all participating institutions, although requiring extensive time to complete, may reduce confusion as compared to differing DUAs across institutions. Ultimately, to streamline the process of multi-site research projects, it may be helpful for organizations to consider incentives for good data users in different projects or even in different organizations. For example, the Research Passport of the Inter-university Consortium for Political and Social Research (ICPSR) expedites access to restricted data by giving researchers credits and visibility for “safe” actions in their past experiences with restricted data. [Levenstein, et al., 2018] This type of verification on users’ cumulative “safe” actions would tremendously help the procedures of DUAs across multiple institutions.

Limitations on binding laws against DUA non-compliance

Failure to comply with a DUA may result in immediate termination of data access and further actions that depend on the severity of the failure. Organizations establish procedures to respond to data security and breach incidents; some funders require a one-hour reporting and procedures to minimize the damage of the data breach or confidentiality disclosure. In the United States, violation of the Health Insurance Portability and Accountability Act (HIPAA) privacy standards can impose a civil monetary penalty on the individual by the Department of Health and Human Services. Organizations bound by specific laws such as HIPAA must follow the high-level legal boundary. Nonetheless, most data security incidents are unintentional or inadvertent violations of the protocol. They may pose minimal risk for subjects in datasets, and thus better be handled with effective user training. Organizations may better consider DUAs as a tool for all stakeholders to share responsibilities for data confidentiality (e.g., a community model) [Green, et al., 2017] rather than the one for policing or punishing one party (e.g., a policing model). [Green, et al., 2017]

Costs to access restricted data

Even marginal costs of accessing data can be burdensome to researchers, but such costs are also important to consider for organizations. Data access costs include staff efforts to set up the access and to create datasets for users. The costs could unintentionally exclude some groups of researchers, such as junior scholars without research funds. Organizations and funding agencies could proactively intervene by waiving the costs of data access for researchers with limited resources. Doing so would help achieve Open Science [OECD, 2015]—aiming to share data with minimal barriers for all researchers from different backgrounds.

Disclosure review practices

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.