User:Shawndouglas/sandbox/sublevel12

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This is sublevel12 of my sandbox, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see my discussion page instead.

Sandbox begins below

FAIRResourcesGraphic AustralianResearchDataCommons 2018.png

Title: What are the potential implications of the FAIR data principles to laboratory informatics applications?

Author for citation: Shawn E. Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: May 2024

Introduction

This brief topical article will examine

The "FAIR-ification" of research objects and software

First discussed during a 2014 FORCE-11 workshop dedicated to "overcoming data discovery and reuse obstacles," the FAIR Guiding Principles were published by Wilkinson et al. in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and information of all shapes and formats) become more universally findable, accessible, interoperable and reusable (FAIR) by both machines and people.^[1] The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."^[1]

Since 2016, other research stakeholders have taken to publishing their thoughts about how the FAIR principles apply to their fields of study and practice^[2], including in ways beyond what perhaps was originally imagined by Wilkinson et al.. For example, multiple authors have examined whether or not the software used in scientific endeavors itself can be considered a research object worth being developed and managed in tandem with the FAIR data principles.^[3]^[4]^[5]^[6]^[7] Researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts, recognize that digital research objects go beyond data and information, and recognize "the specific nature of software" and not consider it "just data."^[4] The end result has been applying the core concepts of FAIR but differently from data, with the added context of research software being more than just data, requiring more nuance and a different type of planning from applying FAIR to digital data and information.

A 2019 survey by Europe's FAIRsFAIR found that researchers seeking and re-using relevant research software on the internet faced multiple challenges, including understanding and/or maintaining the necessary software environment and its dependencies, finding sufficient documentation, struggling with accessibility and licensing issues, having the time and skills to install and/or use the software, finding quality control of the source code lacking, and having an insufficient (or non-existent) software sustainability and management plan.^[4] These challenges highlight the importance of software to researchers and other stakeholders, and the roll FAIR has in better ensuring such software is findable, interoperable, and reusable, which in turn better ensures researchers' software-driven research is repeatable (by the same research team, with the same experimental setup), reproducible (by a different research team, with the same experimental setup), and replicable (by a different research team, with a different experimental setup).^[4]

At this point, the topic of what "research software" represents must be addressed further, and, unsurprisingly, it's not straightforward. Ask 20 researchers what "research software" is, and you may get 20 different opinions. Some definitions can be more objectively viewed as too narrow, while others may be viewed as too broad, with some level of controversy inherent in any mutual discussion.^[8]^[9]^[10] In 2021, as part of the FAIRsFAIR initiative, Gruenpeter et al. made a good-faith effort to define "research software" with the feedback of multiple stakeholders. Their efforts resulted in this definition:

Research software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) that are used for research but were not created during, or with a clear research intent, should be considered "software [used] in research" and not research software. This differentiation may vary between disciplines. The minimal requirement for achieving computational reproducibility is that all the computational components (i.e., research software, software used in research, documentation, and hardware) used during the research are identified, described, and made accessible to the extent that is possible.

Note that while the definition primarily recognizes software created during the research process, software created (whether by the research group, other open-source software developers outside the organization, or even commercial software developers) "for a research purpose" outside the actual research process is also recognized as research software. This notably can lead to disagreement about whether a proprietary, commercial spreadsheet or laboratory information management system (LIMS) offering that conducts analyses and visualizations of research data can genuinely be called research software, or simply classified as software used in research. van Nieuwpoort and Katz further elaborated on this concept, at least indirectly, by formally defining the roles of research software in 2023. Their definition of the various roles of research software—without using terms such as "open-source," "commercial," or "proprietary"—essentially further defined what research software is^[10]:

Research software is a component of our instruments.
Research software is the instrument.
Research software analyzes research data.
Research software presents research results.
Research software assembles or integrates existing components into a working whole.
Research software is infrastructure or an underlying tool.
Research software facilitates distinctively research-oriented collaboration.

When considering these definitions^[8]^[10] of research software and their adoption by other entities^[11], it would appear that at least in part some laboratory informatics software—whether open-source or commercially proprietary—fills these roles in academic, military, and industry research laboratories of many types. In particular, electronic laboratory notebooks (ELNs) like open-source Jupyter Notebook or proprietary ELNs from commercial software developers fill the role of analyzing and visualizing research data, including developing molecular models for new promising research routes.^[10] Even more advanced LIMS solutions that go beyond simply collating, auditing, securing, and reporting analytical results could conceivably fall under the umbrella of research software, particularly if many of the analytical, integration, and collaboration tools required in modern research facilities are included in the LIMS.

Ultimately, assuming that some laboratory informatics software can be considered research software and not just "software used in research," it's tough not to arrive at some deeper implications of research organizations' increasing need for FAIR data objects and software, particularly for laboratory informatics software and the developers of it.

Implications of the FAIR concept to laboratory informatics software

The global FAIR initiative affects, and even benefits, commercial laboratory informatics research software developers as much as it does academic and institutional ones

To be clear, there is undoubtedly a difference in the software development approach of "homegrown" research software by academics and institutions, and the more streamlined and experienced approach of commercial software development houses as applied to research software. Moynihan of Invenia Technical Computing described the difference in software development approaches thusly in 2020, while discussing the concept of "research software engineering"^[12]:

Since the environment and incentives around building academic research software are very different to those of industry, the workflows around the former are, in general, not guided by the same engineering practices that are valued in the latter. That is to say: there is a difference between what is important in writing software for research, and for a user-focused software product. Academic research software prioritizes scientific correctness and flexibility to experiment above all else in pursuit of the researchers’ end product: published papers. Industry software, on the other hand, prioritizes maintainability, robustness, and testing, as the software (generally speaking) is the product. However, the two tracks share many common goals as well, such as catering to “users” [and] emphasizing performance and reproducibility, but most importantly both ventures are collaborative. Arguably then, both sets of principles are needed to write and maintain high-quality research software.

This brings us to our first point: the application of small-scale, FAIR-driven academic research software engineering practices and elements to the larger development of more commercial laboratory informatics software, and vice versa with the application of commercial-scale development practices to small FAIR-focused academic and institutional research software engineering efforts, has the potential to help better support all research laboratories using both independently-developed and commercial research software.

The concept of the research software engineer (RSE) began to take full form in 2012, and since then universities and institutions of many types have formally developed their own RSE groups and academic programs.^[13]^[14]^[15] RSEs range from pure software developers with little knowledge of a given research discipline, to scientific researchers just beginning to learn how to develop software for their research project(s). While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research."^[13]^[16] Elaborating on that concept, Cohen et al. add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."^[16]

The concept of software quality management (SQM) has traditionally not been lost on professional, commercial software development businesses. Good SQM practices have been less prevalent in homegrown research software development; however, the expanded adoption of FAIR data and FAIR software approaches has shifted the focus on to the repeatability, reproducibility, and interoperability of research results and data produced by a more sustainable research software. The adoption of FAIR by academic and institutional research labs not only brings commercial SQM and other software development approaches into their workflow, but also gives commercial laboratory informatics software developers an opportunity to embrace many aspects of the FAIR approach to laboratory research practices, including lessons learned and development practices from the growing number of RSEs. This doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.^[17] However, as Moynihan noted, both research software development paradigms stand to gain from the shift to more FAIR data and software. Additionally, if commercial laboratory informatics vendors want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to those labs.

The focus on data types and metadata within the scope of FAIR is shifting how laboratory informatics software developers and RSEs make their research software and choose their database approaches

Non-relational Resource Description Framework (RDF) knowledge graph databases used in well-designed laboratory informatics software help make research objects more FAIR.

- https://labbit.com/resources/rdf-knowledge-graph-databases-a-better-choice-for-life-science-lab-software and https://21624527.fs1.hubspotusercontent-na1.net/hubfs/21624527/Resources/RDF%20Knowledge%20Graph%20Databases%20White%20Paper.pdf

- https://biss.pensoft.net/article/37412/

- https://link.springer.com/article/10.1007/s40192-024-00348-4

- https://www.nature.com/articles/s41597-022-01352-z

- https://www.degruyter.com/document/doi/10.1515/jib-2018-0023/html

- https://arxiv.org/abs/2404.12935

- https://direct.mit.edu/dint/article/4/4/867/112737/FAIR-Versus-Open-Data-A-Comparison-of-Objectives

Applying FAIR-driven metadata schemes to laboratory informatics software development gives data a FAIRer chance at being ready for machine learning and artificial intelligence applications

By developing laboratory informatics software with a focus on FAIR-driven metadata schemes, not only are data objects more FAIR but also "clean" and machine-ready for advanced analytical uses as with machine learning and artificial intelligence either built into the laboratory informatics software, or separate from it.

- https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness

- https://arxiv.org/abs/2404.05779

- https://www.nature.com/articles/s41597-023-02298-6

- https://repositories.lib.utexas.edu/items/a366780e-3d54-4aaa-8465-6da1e38ee38a

Resources

LIMS and FAIR: Journal:A roadmap for LIMS at NIST Material Measurement Laboratory
ELNs and FAIR: Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation
LIMS+ELN and FAIR: https://datascience.codata.org/articles/10.5334/dsj-2023-044
Biomedical software and FAIR: https://www.nature.com/articles/s41597-023-02463-x
Making software workflows FAIR: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10538699/
AWS and FAIR for healthcare and life sciences: https://aws.amazon.com/blogs/industries/implement-fair-scientific-data-principles-when-building-hcls-data-lakes/
APIs and FAIR data: https://www.labguru.com/blog/fair-data-principles-and-apis
Bioinformatics LIMS and FAIR: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425304/
Labbit: https://labbit.com/fair-data-lims and https://semaphoresolutions.com/applying-fair-principles-to-lab-data/

Zontal: https://8248491.fs1.hubspotusercontent-na1.net/hubfs/8248491/Marketing%20Material/White%20paper/FAIR%20Data/FAIR%20data%20-%20how%20data%20increases%20the%20value%20of%20biotechs.pdf

Extending FAIR to data graphics: https://www.nature.com/articles/s41597-022-01352-z

https://riojournal.com/article/96075/ Importance of metadata for FAIR data objects
Deep talk about metadata: Journal:Shared metadata for data-centric materials science
More metadata, for findability: "While descriptive metadata may not be available, support for generalized CRUD operations requires essential structural and administrative metadata to be captured, stored, and made available for requestors. Metadata capture must be highly automated and reliable, both in terms of technical reliability and ensured metadata quality." Journal:Making data and workflows findable for machines
More metadat, for reusability: "make recommendations for assigning identifiers and metadata that supports sample tracking, integration, and reuse. Our goal is to provide a practical approach to sample management, geared towards ecosystem scientists who contribute and reuse sample data." Journal:Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences

"The principles should be considered during development of informatics systems to further promote data discovery and reuse. In Table 1, we have correlated the various BRICS functional components to the FAIR principles to illustrate the extent to which each of the components contributes towards the principles." Journal:Development of an informatics system for accelerating biomedical research

Restricted or personal information while still being FAIR

Linking databases of data that haven't seen proper "FAIR-ification" and metadata handling won't be as useful.
Further discussion on data quality in the scope of FAIR: Journal:Towards a contextual approach to data quality

On data integrity and FAIR: https://arkivum.com/what-is-fair-data-and-can-life-science-organisations-ensure-data-is-compliant-whilst-adhering-to-these-principles/

More: https://www.europeanpharmaceuticalreview.com/article/157371/implementing-the-fair-data-principles-is-now-a-critical-endeavour/

More: https://www.lexjansen.com/phuse/2019/sa/SA04.pdf