LIMS Q&A:Why are the FAIR data principles increasingly important to research laboratories and their software?

From LIMSWiki
Revision as of 16:30, 8 May 2024 by Shawndouglas (talk | contribs) (Moved from sandbox to live.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
FAIR data principles.jpg

Title: Why are the FAIR data principles increasingly important to research laboratories and their software?

Author for citation: Shawn E. Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: May 2024

Introduction

Discussion of the FAIR data principles, which encourage research objects and software to be more findable, accessible, interoperable, and reusable (FAIR), continue to appear in both academic and industry circles. Originally published to describe a potential path forward for improved knowledge discovery and greater research innovation, research groups of all types have turned to the FAIR principles to modify existing research objects and make new research objects and software more FAIR. But what are the FAIR principles and why are they increasingly in the spotlight? What does FAIR mean for research laboratories?

This brief topical will examine the FAIR principles and what they mean to research objects and software. It will briefly highlight the technologies and frameworks necessary for FAIR, and give examples how the combination of FAIR research objects and software in research labs can increase the potential for greater innovation for society.

The growing importance of the FAIR principles to research laboratories

The FAIR data principles were published by Wilkinson et al. in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and information of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.[1] The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."[1] Since being published, other researchers have taken the somewhat broad set of principles and refined them to their own scientific disciplines, as well as to other types of research objects, including the research software being used by those researchers to generate research objects.[2][3][4][5][6][7]

But why are research laboratories increasingly pushing for more findable, accessible, interoperable, and reusable research objects and software? The short answer, as evidenced by the Wilkinson et al. quote above, is that greater innovation can be gained through improved knowledge discovery. The discovery process necessary for that greater innovation—whether through traditional research methods or artificial intelligence (AI)-driven methods—is enhanced when research objects and software are compatible with the core ideas of FAIR.[1][8][9]

A slightly longer answer, suitable for a Q&A topic, requires looking at a few more details of the FAIR principles as applied to both research objects and research software. Research laboratories exist to innovate. That innovation can come in the form of discovering new materials, developing a pharmaceutical to improve patient outcomes for a particular disease, or modifying an existing food or beverage recipe, among others. In academic research labs, this usually looks like advancement of theoretical knowledge and the publishing of research results, whereas in industry research labs, this typically looks like more practical applications of research concepts to new or existing products or services. In both cases, research software was likely involved at some point, whether it be something like a researcher-developed bioinformatics application or a commercial vendor-developed electronic laboratory notebook (ELN).

FAIR research objects

Regarding research objects themselves, the FAIR principles essentially say "vast amounts of data and information in largely heterogeneous formats spread across disparate sources both electronic and paper make modern research workflows difficult, tedious, and at times impossible. Further, repeatability, reproducibility, and replicability of openly published or secure internal research results is at risk, giving less confidence to academic peers in the published research or to critical stakeholders in the viability of a researched prototype." As such, research objects (which include not only their inherent data and information but also any metadata that describe features of that data and information) need to be[10]:

  • findable, with globally unique and persistent identifiers, rich metadata that link to the identifier of the data described, and an ability to be indexed as an effectively searchable resource;
  • accessible, being able to be retrieved (including metadata of data that is no longer available) by identifiers using secure standardized communication protocols that are open, free, and universally implementable with authentication and authorization mechanisms;
  • interoperable, represented using formal, accessible, shared, and relevant language models and vocabularies that abide by FAIR principles, as well as with qualified linkage to other metadata; and
  • reusable, being richly described by accurate and relevant metadata, released with a clear and accessible data usage license, associated with sufficiently detailed provenance information, and compliant with discipline-specific community standards.

All that talk of unique persistent identifiers, communication protocols, authentication mechanisms, language models (e.g., ontology languages), standardized vocabularies, provenance information, and more could make one's head spin. And, to be fair, it has been challenging for research groups to adopt FAIR, with few international efforts to translate the FAIR principles to broad research. The FAIR Cookbook represents one example of such an international collaborative effort, providing "a combination of guidance, technical, hands-on, background and review types to cover the operation steps of FAIR data management."[11] In fact, the Fair Cookbook is illustrative of the challenges of implementing FAIR in research laboratories, particularly given the diverse array of vocabularies used across the wealth of scientific disciplines, such as biobanking, biomedical engineering, botany, food science, and materials science. The way a botanical research organization makes its research objects FAIR is going to require a set of different vocabularies and frameworks than the materials science research organization. But all of them will turn to informatics software, data management plans, database tools, and more to not only transform existing non-FAIR research objects to be FAIR but also better ensure newly created research objects are FAIR.

FAIR research software

Discussion on research software and its FAIRness is more complicated. It is beyond the scope of this article to go into great detail about the concepts surrounding FAIR research software, but a brief overview will be attempted. When the FAIR principles were first published, the framework was largely being applied to research objects. However, researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts. This led to recognizing that digital research objects go beyond data and information, and that there is a "specific nature of software" used in research; that research software should not be considered "just data."[4] The end result has seen researchers begin to apply the core concepts of FAIR to research software, but slightly differently from research objects.[2][3][4][5][6][7]

Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."[12] Of note is the last part, acknowledging that software clearly designed for research can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer. As such, Microsoft Excel may not be looked upon as research software, but an ELN or laboratory information management system (LIMS) thoughtfully developed with research activities in mind could be considered research software.

More often than not, research software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.[13][14] While in the past researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"[14][15], with research software engineering efforts focusing on that concept being vital to future research outcomes. Cohen et al. add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."[15]

Hasselbring et al. state that "it is essential [for academic research groups] to publish research software in addition to research data," to increase trust in the peer review system, build new research on top of existing research, and ensure greater reproducibility of any published results.[3] As such, they extend FAIR data principles to FAIR research software, noting that[3]:

  • findable software acknowledges that "the first step in (re)using ... software is to find it";
  • accessible software acknowledges that once found, the researcher needs to know how to best access the software, recognizing authentication or authentication mechanisms may need to be in place;
  • interoperable software acknowledges that the software will need to eventually integrate with other research objects and software, demanding FAIR-driven methods and tools in the software's development; and
  • reusable software acknowledges that the software will need to not only produce research objects that can be reused, combined, and extended, but that the software itself needs to have metadata that helps make it retrievable and reusable.

The applicability of these principles is clear to academic research software developed in-house, with the concept of open science driving FAIR development and release of that software, including on platforms like GitHub.[3] It's less clear for commercial developers making research software. The growing prevalence of FAIR data and software practices in research laboratories doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.[3] However, both research software development paradigms stand to gain from the shift to more FAIR data and software.[13] Additionally, if commercial vendors of research software want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to FAIR-abiding research labs.

FAIRer research objects + better software = the potential for greater innovation

Greater research innovation can be gained through improved knowledge discovery, which is enabled by FAIR research objects and software. The FAIR principles say that when research objects and software are created, managed, updated, and developed such that they are more findable, accessible, interoperable, and reusable, then researchers and other stakeholders benefit. Published research results are more reputable, reproducible, and reusable, benefiting the overall research community. However, this extends beyond academic research. The provenance of industry research—e.g., as with the pharmaceutical industry—performed with the help of and documented within ELNs and other research management software, is better maintained using FAIR principles. As a result, clinical and preclinical studies are more reproducible, ensuring proper funneling of research funding, limiting resource waste, and limiting potential suffering of research participants.[16] Finally, patients suffering from rare diseases may benefit from FAIRer data practices that help prevent the data silos of testing, medical device use, patient outcome, treatment history, and clinical trial data. If these types of data were made more FAIR, "new diagnostics, treatments, and health care policies to benefit patients" could be developed.[17] However, in all these cases, laboratories are involved, and their software's ability to effectively ensure FAIR research objects are created is vital. As such, the implications of FAIR research objects and software on modern research laboratories' operations are undoubtable. Greater innovation and improved patient outcomes are only a few of the many benefits of the FAIR principles to society.

Conclusion

Eight years after their publishing, the importance of the FAIR data principles to research groups is greater than ever, and this brief Q&A article sought to explain why. The short answer by Wilkinson et al. is that making research objects more FAIR means better knowledge discovery, which in turn means greater innovation. This gives research labs around the world incentive to continue making existing and new research objects FAIR, for many stakeholders' benefit. The longer answer looks at the specifics of FAIR and the importance of rich metadata, persistent identifiers, standardized vocabularies, common data models, and other ontology- and semantic-driven technologies and frameworks. By implementing these open, standardized technologies and frameworks in a mindful manner to existing and new research objects, heterogeneous and disparate data pools can be unified to the benefit of research stakeholders, whether they are based in academics or industry. Of course, the FAIR concepts have since extended to the research software used by research labs. This primarily affects the software developed in these labs, particularly as it pertains to academic labs and their published research. However, commercial vendors of software made specifically for research labs have incentives to learn about the FAIR principles and associated research software engineering practices, to make their solutions more compatible for FAIR-driven research labs. These labs, and society at large, can ultimately benefit from greater innovation, improved products, and better research and patient outcomes.

References

  1. 1.0 1.1 1.2 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  2. 2.0 2.1 "fair data principles". PubMed Search. National Institutes of Health, National Library of Medicine. https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles. Retrieved 30 April 2024. 
  3. 3.0 3.1 3.2 3.3 3.4 3.5 Hasselbring, Wilhelm; Carr, Leslie; Hettrick, Simon; Packer, Heather; Tiropanis, Thanassis (25 February 2020). "From FAIR research data toward FAIR and open research software" (in en). it - Information Technology 62 (1): 39–47. doi:10.1515/itit-2019-0040. ISSN 2196-7032. https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html. 
  4. 4.0 4.1 4.2 Gruenpeter, M. (23 November 2020). "FAIR + Software: Decoding the principles" (PDF). FAIRsFAIR “Fostering FAIR Data Practices In Europe”. https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf. Retrieved 30 April 2024. 
  5. 5.0 5.1 Barker, Michelle; Chue Hong, Neil P.; Katz, Daniel S.; Lamprecht, Anna-Lena; Martinez-Ortiz, Carlos; Psomopoulos, Fotis; Harrow, Jennifer; Castro, Leyla Jael et al. (14 October 2022). "Introducing the FAIR Principles for research software" (in en). Scientific Data 9 (1): 622. doi:10.1038/s41597-022-01710-x. ISSN 2052-4463. PMC PMC9562067. PMID 36241754. https://www.nature.com/articles/s41597-022-01710-x. 
  6. 6.0 6.1 Patel, Bhavesh; Soundarajan, Sanjay; Ménager, Hervé; Hu, Zicheng (23 August 2023). "Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool" (in en). Scientific Data 10 (1): 557. doi:10.1038/s41597-023-02463-x. ISSN 2052-4463. PMC PMC10447492. PMID 37612312. https://www.nature.com/articles/s41597-023-02463-x. 
  7. 7.0 7.1 Du, Xinsong; Dastmalchi, Farhad; Ye, Hao; Garrett, Timothy J.; Diller, Matthew A.; Liu, Mei; Hogan, William R.; Brochhausen, Mathias et al. (6 February 2023). "Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software" (in en). Metabolomics 19 (2): 11. doi:10.1007/s11306-023-01974-3. ISSN 1573-3890. https://link.springer.com/10.1007/s11306-023-01974-3. 
  8. Olsen, C. (1 September 2023). "Embracing FAIR Data on the Path to AI-Readiness". Pharma's Almanac. https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness. Retrieved 03 May 2024. 
  9. Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali et al. (26 July 2023). "FAIR for AI: An interdisciplinary and international community building perspective" (in en). Scientific Data 10 (1): 487. doi:10.1038/s41597-023-02298-6. ISSN 2052-4463. PMC PMC10372139. PMID 37495591. https://www.nature.com/articles/s41597-023-02298-6. 
  10. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introducing the FAIR Principles". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  11. Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "Introduction". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  12. Gruenpeter, Morane; Katz, Daniel S.; Lamprecht, Anna-Lena; Honeyman, Tom; Garijo, Daniel; Struck, Alexander; Niehues, Anna; Martinez, Paula Andrea et al. (13 September 2021). "Defining Research Software: a controversial discussion". Zenodo. doi:10.5281/zenodo.5504016. https://zenodo.org/record/5504016. 
  13. 13.0 13.1 Moynihan, G. (7 July 2020). "The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE". Invenia Blog. Invenia Technical Computing Corporation. https://invenia.github.io/blog/2020/07/07/software-engineering/. 
  14. 14.0 14.1 Woolston, Chris (31 May 2022). "Why science needs more research software engineers" (in en). Nature: d41586–022–01516-2. doi:10.1038/d41586-022-01516-2. ISSN 0028-0836. https://www.nature.com/articles/d41586-022-01516-2. 
  15. 15.0 15.1 Cohen, Jeremy; Katz, Daniel S.; Barker, Michelle; Chue Hong, Neil; Haines, Robert; Jay, Caroline (1 January 2021). "The Four Pillars of Research Software Engineering". IEEE Software 38 (1): 97–105. doi:10.1109/MS.2020.2973362. ISSN 0740-7459. https://ieeexplore.ieee.org/document/8994167/. 
  16. Sahoo, Satya S.; Valdez, Joshua; Kim, Matthew; Rueschman, Michael; Redline, Susan (1 January 2019). "ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata" (in en). International Journal of Medical Informatics 121: 10–18. doi:10.1016/j.ijmedinf.2018.10.009. PMC PMC6343667. PMID 30545485. https://linkinghub.elsevier.com/retrieve/pii/S1386505618302697. 
  17. van Lin, Nawel; Paliouras, Georgios; Vroom, Elizabeth; ’t Hoen, Peter A.C.; Roos, Marco (2 November 2021). "How Patient Organizations Can Drive FAIR Data Efforts to Facilitate Research and Health Care: A Report of the Virtual Second International Meeting on Duchenne Data Sharing, March 3, 2021". Journal of Neuromuscular Diseases 8 (6): 1097–1108. doi:10.3233/JND-210721. PMC PMC8673524. PMID 34334415. https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/JND-210721.