|
|
(47 intermediate revisions by the same user not shown) |
Line 8: |
Line 8: |
| ==Sandbox begins below== | | ==Sandbox begins below== |
| <div class="nonumtoc">__TOC__</div> | | <div class="nonumtoc">__TOC__</div> |
| [[File:|right|520px]]
| |
| '''Title''': ''Why are the FAIR data principles increasingly important to research laboratories and their software?''
| |
|
| |
| '''Author for citation''': Shawn E. Douglas
| |
|
| |
| '''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]
| |
|
| |
| '''Publication date''': May 2024
| |
|
| |
| ==Introduction==
| |
|
| |
| ==The growing importance of the FAIR principles to research laboratories==
| |
| The [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR data principles]] were published by Wilkinson ''et al.'' in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and [[information]] of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.<ref name="WilkinsonTheFAIR16">{{Cite journal |last=Wilkinson |first=Mark D. |last2=Dumontier |first2=Michel |last3=Aalbersberg |first3=IJsbrand Jan |last4=Appleton |first4=Gabrielle |last5=Axton |first5=Myles |last6=Baak |first6=Arie |last7=Blomberg |first7=Niklas |last8=Boiten |first8=Jan-Willem |last9=da Silva Santos |first9=Luiz Bonino |last10=Bourne |first10=Philip E. |last11=Bouwman |first11=Jildau |date=2016-03-15 |title=The FAIR Guiding Principles for scientific data management and stewardship |url=https://www.nature.com/articles/sdata201618 |journal=Scientific Data |language=en |volume=3 |issue=1 |pages=160018 |doi=10.1038/sdata.2016.18 |issn=2052-4463 |pmc=PMC4792175 |pmid=26978244}}</ref> The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."<ref name="WilkinsonTheFAIR16" /> Since being published, other researchers have taken the somewhat broad set of principles and refined them to their own scientific disciplines, as well as to other types of research objects, including the research software being used by those researchers to generate research objects.<ref name="NIHPubMedSearch">{{cite web |url=https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles |title=fair data principles |work=PubMed Search |publisher=National Institutes of Health, National Library of Medicine |accessdate=30 April 2024}}</ref><ref name="HasselbringFromFAIR20">{{Cite journal |last=Hasselbring |first=Wilhelm |last2=Carr |first2=Leslie |last3=Hettrick |first3=Simon |last4=Packer |first4=Heather |last5=Tiropanis |first5=Thanassis |date=2020-02-25 |title=From FAIR research data toward FAIR and open research software |url=https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html |journal=it - Information Technology |language=en |volume=62 |issue=1 |pages=39–47 |doi=10.1515/itit-2019-0040 |issn=2196-7032}}</ref><ref name="GruenpeterFAIRPlus20">{{Cite web |last=Gruenpeter, M. |date=23 November 2020 |title=FAIR + Software: Decoding the principles |url=https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf |format=PDF |publisher=FAIRsFAIR “Fostering FAIR Data Practices In Europe” |accessdate=30 April 2024}}</ref><ref name=":0">{{Cite journal |last=Barker |first=Michelle |last2=Chue Hong |first2=Neil P. |last3=Katz |first3=Daniel S. |last4=Lamprecht |first4=Anna-Lena |last5=Martinez-Ortiz |first5=Carlos |last6=Psomopoulos |first6=Fotis |last7=Harrow |first7=Jennifer |last8=Castro |first8=Leyla Jael |last9=Gruenpeter |first9=Morane |last10=Martinez |first10=Paula Andrea |last11=Honeyman |first11=Tom |date=2022-10-14 |title=Introducing the FAIR Principles for research software |url=https://www.nature.com/articles/s41597-022-01710-x |journal=Scientific Data |language=en |volume=9 |issue=1 |pages=622 |doi=10.1038/s41597-022-01710-x |issn=2052-4463 |pmc=PMC9562067 |pmid=36241754}}</ref><ref name=":1">{{Cite journal |last=Patel |first=Bhavesh |last2=Soundarajan |first2=Sanjay |last3=Ménager |first3=Hervé |last4=Hu |first4=Zicheng |date=2023-08-23 |title=Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool |url=https://www.nature.com/articles/s41597-023-02463-x |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=557 |doi=10.1038/s41597-023-02463-x |issn=2052-4463 |pmc=PMC10447492 |pmid=37612312}}</ref><ref name=":2">{{Cite journal |last=Du |first=Xinsong |last2=Dastmalchi |first2=Farhad |last3=Ye |first3=Hao |last4=Garrett |first4=Timothy J. |last5=Diller |first5=Matthew A. |last6=Liu |first6=Mei |last7=Hogan |first7=William R. |last8=Brochhausen |first8=Mathias |last9=Lemas |first9=Dominick J. |date=2023-02-06 |title=Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software |url=https://link.springer.com/10.1007/s11306-023-01974-3 |journal=Metabolomics |language=en |volume=19 |issue=2 |pages=11 |doi=10.1007/s11306-023-01974-3 |issn=1573-3890}}</ref>
| |
|
| |
| But why are research laboratories increasingly pushing for more findable, accessible, interoperable, and reusable research objects and software? The short answer, as evidenced by the Wilkinson ''et al.'' quote above is that greater innovation can be gained through improved knowledge discovery. The discovery process necessary for that greater innovation—whether through traditional research methods or [[artificial intelligence]] (AI)-driven methods—is enhanced when research objects and software are compatible with the core ideas of FAIR.<ref name="WilkinsonTheFAIR16" /><ref name="OlsenEmbracing23">{{cite web |url=https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness |title=Embracing FAIR Data on the Path to AI-Readiness |author=Olsen, C. |work=Pharma's Almanac |date=01 September 2023 |accessdate=03 May 2024}}</ref><ref name="HuertaFAIRForAI23">{{Cite journal |last=Huerta |first=E. A. |last2=Blaiszik |first2=Ben |last3=Brinson |first3=L. Catherine |last4=Bouchard |first4=Kristofer E. |last5=Diaz |first5=Daniel |last6=Doglioni |first6=Caterina |last7=Duarte |first7=Javier M. |last8=Emani |first8=Murali |last9=Foster |first9=Ian |last10=Fox |first10=Geoffrey |last11=Harris |first11=Philip |date=2023-07-26 |title=FAIR for AI: An interdisciplinary and international community building perspective |url=https://www.nature.com/articles/s41597-023-02298-6 |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=487 |doi=10.1038/s41597-023-02298-6 |issn=2052-4463 |pmc=PMC10372139 |pmid=37495591}}</ref>
| |
|
| |
| A slightly longer answer, suitable for a Q&A topic, requires looking at a few more details of the FAIR principles as applied to both research objects and research software. Research laboratories, whether located in an organization or contracted out as third parties, exist to innovate. That innovation can come in the form of discovering new materials that may or may not have a future application, developing a pharmaceutical to improve patient outcomes for a particular disease, or modifying (for some sort of improvement) an existing food or beverage recipe, among others. In academic research labs, this usually looks like knowledge advancement and the publishing of research results, whereas in industry research labs, this typically looks like more practical applications of research concepts to new or existing products or services. In both cases, research software was likely involved at some point, whether it be something like a researcher-developed [[bioinformatics]] application or a commercial vendor-developed [[electronic laboratory notebook]] (ELN).
| |
|
| |
| ===FAIR research objects===
| |
| Regarding research objects themselves, the FAIR principles essentially say "vast amounts of data and information in largely heterogeneous formats spread across disparate sources both electronic and paper make modern research workflows difficult, tedious, and at times impossible. Further, repeatability, reproducibility, and replicability of openly published or secure internal research results is at risk, giving less confidence to academic peers in the published research, or less confidence to critical stakeholders in the viability of a researched prototype." As such, research objects (which include not only their inherent data and information but also any [[metadata]] that describe features of that data and information) need to be<ref name="Rocca-SerraFAIRCook22">{{Cite book |last=Rocca-Serra, Philippe |last2=Sansone, Susanna-Assunta |last3=Gu, Wei |last4=Welter, Danielle |last5=Abbassi Daloii, Tooba |last6=Portell-Silva, Laura |date=2022-06-30 |title=D2.1 FAIR Cookbook |url=https://zenodo.org/record/6783564 |chapter=Introducing the FAIR Principles |doi=10.5281/ZENODO.6783564}}</ref>:
| |
|
| |
| *''findable'', with globally unique and persistent identifiers, rich metadata that link to the identifier of the data described, and an ability to be indexed as an effectively searchable resource;
| |
| *''accessible'', being able to be retrieved (including metadata of data that is no longer available) by identifiers using secure standardized communication protocols that are open, free, and universally implementable with authentication and authorization mechanisms;
| |
| *''interoperable'', represented using formal, accessible, shared, and relevant language models and vocabularies that abide by FAIR principles, as well as with qualified linkage to other metadata; and
| |
| *''reusable'', being richly described by accurate and relevant metadata, released with a clear and accessible data usage license, associated with sufficiently detailed provenance information, and compliant with discipline-specific community standards.
| |
|
| |
| All that talk of unique persistent identifiers, communication protocols, authentication mechanisms, language models (e.g., [[ontology]] languages), standardized vocabularies, provenance information, and more could make one's head spin. And, to be fair, it has been challenging for research groups to adopt FAIR, with few widespread international efforts to translate the FAIR principles to broad research. The FAIR Cookbook represents one example of such international collaborative effort, providing "a combination of guidance, technical, hands-on, background and review types to cover the operation steps of FAIR data management."<ref name="Rocca-SerraFAIRCook22-1">{{Cite book |last=Rocca-Serra, Philippe |last2=Sansone, Susanna-Assunta |last3=Gu, Wei |last4=Welter, Danielle |last5=Abbassi Daloii, Tooba |last6=Portell-Silva, Laura |date=2022-06-30 |title=D2.1 FAIR Cookbook |url=https://zenodo.org/record/6783564 |chapter=Introduction |doi=10.5281/ZENODO.6783564}}</ref> In fact, the Cookbook is illustrative of the challenges of implementing FAIR in research laboratories, particularly given the diverse array of vocabularies used across the wealth of scientific disciplines, such as [[biobanking]], [[biomedical engineering]], [[botany]], [[food science]], and [[materials science]]. The way a botanical research organization makes its research objects FAIR is going to require a set of different tools than the materials science research organization. But all of them will turn to [[Informatics (academic field)|informatics]] tools, data management plans, database tools, and more to not only massage existing research objects to be FAIR but also better ensure newly created research objects are FAIR as well.
| |
|
| |
| ===FAIR research software===
| |
| Discussion on research software and its FAIRness is more complicated. It is beyond the scope of this article to go into greater detail about the concepts surrounding FAIR research software, but a brief overview will be attempted. When the FAIR principles were first published, the framework was largely being applied to research objects. However, researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts. This led to recognizing that digital research objects go beyond data and information, and that there is a "specific nature of software" used in research; that research software should not be considered "just data."<ref name="GruenpeterFAIRPlus20" /> The end result has been seen researchers begin to apply the core concepts of FAIR to research software, but slightly differently from research objects.<ref name="NIHPubMedSearch" /><ref name="HasselbringFromFAIR20" /><ref name="GruenpeterFAIRPlus20" /><ref name=":0" /><ref name=":1" /><ref name=":2" />
| |
|
| |
| Unsurprisingly, what researchers consider to be "research software" for purposes of FAIR has historically been interpreted numerous ways. Does the commercial spreadsheet software used to make calculations to research data deserve to be called research software in parallel with the lab-developed bioinformatics application used to generate that data? Given the difficulties of gaining a consensus definition of the term, a 2021 international initiative called FAIRsFAIR made a good-faith effort to define "research software" with the feedback of multiple stakeholders. The short version of their resulting definition is that, "[r]esearch software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose."<ref name="GruenpeterDefining21">{{Cite journal |last=Gruenpeter, Morane |last2=Katz, Daniel S. |last3=Lamprecht, Anna-Lena |last4=Honeyman, Tom |last5=Garijo, Daniel |last6=Struck, Alexander |last7=Niehues, Anna |last8=Martinez, Paula Andrea |last9=Castro, Leyla Jael |last10=Rabemanantsoa, Tovo |last11=Chue Hong, Neil P. |date=2021-09-13 |title=Defining Research Software: a controversial discussion |url=https://zenodo.org/record/5504016 |journal=Zenodo |doi=10.5281/zenodo.5504016}}</ref> Of note is the last part, acknowledging that research software can be developed in the lab during the research process or developed beforehand by, for example, a commercial software developer with a strong purpose of being used for research. As such, Microsoft Excel may not be looked upon as research software, but an ELN or [[laboratory information management system]] (LIMS) thoughtfully developed with research activities in mind could be considered research software. More often than not, that software is going to be developed in-house. A growing push for the FAIRification of that software, as well as commercial research solutions, has seen the emergence of "research software engineering" as a domain of practice.<ref name="MoynihanTheHitch20">{{cite web |url=https://invenia.github.io/blog/2020/07/07/software-engineering/ |title=The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE |author=Moynihan, G. |work=Invenia Blog |publisher=Invenia Technical Computing Corporation |date=07 July 2020}}</ref><ref name="WoolstonWhySci22">{{Cite journal |last=Woolston |first=Chris |date=2022-05-31 |title=Why science needs more research software engineers |url=https://www.nature.com/articles/d41586-022-01516-2 |journal=Nature |language=en |pages=d41586–022–01516-2 |doi=10.1038/d41586-022-01516-2 |issn=0028-0836}}</ref> While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research"<ref name="WoolstonWhySci22" /><ref name="CohenTheFour21">{{Cite journal |last=Cohen |first=Jeremy |last2=Katz |first2=Daniel S. |last3=Barker |first3=Michelle |last4=Chue Hong |first4=Neil |last5=Haines |first5=Robert |last6=Jay |first6=Caroline |date=2021-01 |title=The Four Pillars of Research Software Engineering |url=https://ieeexplore.ieee.org/document/8994167/ |journal=IEEE Software |volume=38 |issue=1 |pages=97–105 |doi=10.1109/MS.2020.2973362 |issn=0740-7459}}</ref>, with research software engineering efforts focusing on that concept as being vital to future research outcomes. Cohen ''et al.'' add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."<ref name="CohenTheFour21" />
| |
|
| |
| Hasselbring ''et al.'' note that "it is essential [for academic research groups] to publish research software in addition to research data," to increase trust in the peer review system, build new research on top of existing research, and ensure greater reproducibility of any published results.<ref name="HasselbringFromFAIR20" /> They extend FAIR data principles to FAIR research software, noting that<ref name="HasselbringFromFAIR20" />:
| |
|
| |
| *''findable'' software acknowledges that "the first step in (re)using ... software is to find it";
| |
| *''accessible'' software acknowledges that once found, the researcher needs to know how to best access the software, recognizing authentication or authentication mechanisms may need to be in place;
| |
| *''interoperable'' software acknowledges that the software will need to eventually integrate with other research objects and software, demanding a FAIR-driven methods and tools in the software's development; and
| |
| *''reusable'' software acknowledges that the software will need to not only produce research objects that can be reused, combined, and extended, but that the software itself should have metadata that helps make it retrievable and reusable.
| |
|
| |
| The applicability of these principles is clear to academic research software developed in-house, with the concept of open science driving FAIR development and release of that software, including on platforms like GitHub.<ref name="HasselbringFromFAIR20" /> It's less clear for commercial developers making research software. The growing prevalence of FAIR data and software practices in research laboratories doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.<ref name="HasselbringFromFAIR20" /> However, both research software development paradigms stand to gain from the shift to more FAIR data and software.<ref name="MoynihanTheHitch20" /> Additionally, if commercial vendors of research software want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to FAIR-abiding research labs.
| |
|
| |
| ===FAIRer research objects + better software = the potential for greater innovation===
| |
| As stated at the beginning of this article, greater research innovation can be gained through improved knowledge discovery, which is enabled by FAIR research objects and software. The FAIR principles say that when data and software is created, managed, updated, and developed such that they are more findable, accessible, interoperable, and reusable, researchers and other stakeholders benefit. Published research results are more reputable, reproducible, and reusable, benefiting the overall research community. However, this extends beyond academic research. The provenance of industry research—e.g., as with the pharmaceutical industry—performed with the help of and documented within ELNs and other research management software is better maintained using FAIR principles. As a result, clinical and preclinical studies are more reproducible, ensuring proper funneling of research funding, limiting resource waste, and limiting potential suffering of research participants.<ref>{{Cite journal |last=Sahoo |first=Satya S. |last2=Valdez |first2=Joshua |last3=Kim |first3=Matthew |last4=Rueschman |first4=Michael |last5=Redline |first5=Susan |date=2019-01 |title=ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata |url=https://linkinghub.elsevier.com/retrieve/pii/S1386505618302697 |journal=International Journal of Medical Informatics |language=en |volume=121 |pages=10–18 |doi=10.1016/j.ijmedinf.2018.10.009 |pmc=PMC6343667 |pmid=30545485}}</ref> Finally, patients suffering from rare diseases may benefit from FAIRer data practices that help prevent the data silos of testing, medical device use, patient outcome, treatment history, and clinical trial history data. If these types of data were made more FAIR, "new diagnostics, treatments, and health care policies to benefit patients" could be developed, at the same time empowering those patients to take their health care journey into their own hands.<ref>{{Cite journal |last=van Lin |first=Nawel |last2=Paliouras |first2=Georgios |last3=Vroom |first3=Elizabeth |last4=’t Hoen |first4=Peter A.C. |last5=Roos |first5=Marco |date=2021-11-02 |title=How Patient Organizations Can Drive FAIR Data Efforts to Facilitate Research and Health Care: A Report of the Virtual Second International Meeting on Duchenne Data Sharing, March 3, 2021 |url=https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/JND-210721 |journal=Journal of Neuromuscular Diseases |volume=8 |issue=6 |pages=1097–1108 |doi=10.3233/JND-210721 |pmc=PMC8673524 |pmid=34334415}}</ref> However, in all these cases, laboratories are involved, and their software's ability to effectively ensure FAIR research objects are created is vital. As such, the implications of FAIR research objects and software on modern research laboratories' operations are undoubtable. Greater innovation and improved patient outcomes are only part of the benefits to society.
| |
|
| |
| ==Conclusion==
| |
|
| |
|
| |
|
| |
| ==References==
| |
| {{Reflist|colwidth=30em}}
| |
|
| |
| <!---Place all category tags here-->
| |