Difference between revisions of "User:Shawndouglas/sandbox/sublevel12"

From LIMSWiki
Jump to navigationJump to search
Tag: Reverted
 
(90 intermediate revisions by the same user not shown)
Line 8: Line 8:
==Sandbox begins below==
==Sandbox begins below==
<div class="nonumtoc">__TOC__</div>
<div class="nonumtoc">__TOC__</div>
'''Title''': ''Laboratory Informatics: Information and Workflows''
[[File:FAIRResourcesGraphic AustralianResearchDataCommons 2018.png|right|520px]]
'''Title''': ''What are the potential implications of the FAIR data principles to laboratory informatics applications?''


'''Author for citation''': Joe Liscouski
'''Author for citation''': Shawn E. Douglas


'''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]
'''License for content''': [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International]


'''Publication date''': April 2024
'''Publication date''': May 2024
 
'''NOTE''': This content originally appeared in Liscouski's ''Computerized Systems in the Modern Laboratory: A Practical Guide'' as Chapter 3 - Laboratory Informatics / Departmental Systems, published in 2015 by PDA/DHI, ISBN 193372286X. This is reproduced here with the author's / copyright holder's permission. Some changes have been made to the original material, replacing some out-of-date screen shots with vendor-neutral mock-ups, for example. In addition, note that some specifications for network speeds are out-of-date but the concerns with system performance are still realistic.


==Introduction==
==Introduction==
[[Laboratory informatics]] refers to software systems that usually are accessible at the departmental level–they are often shared between users in the [[Laboratory|lab]]-and focus on managing lab operations, and lab-wide [[information]] rather than instrument management or [[Sample (material)|sample]] preparation (Figure 3-1).
https://www.limswiki.org/index.php/Journal:Infrastructure_tools_to_support_an_effective_radiation_oncology_learning_health_system
 
 
[[File:Fig3-1 Liscouski LabInfo24.png|500px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-1.''' Laboratory informatics and departmental systems.</blockquote>
|-
|}
|}
 
Differing from standard office applications (e.g., word processing, spreadsheets, etc.), laboratory informatics software solutions include the:
 
*[[laboratory information management system]] (LIMS) and [[laboratory information system]] (LIS)
*[[electronic laboratory notebook]] (ELN)
*[[document management system]] (DMS)
*[[scientific data management system]] (SDMS)
*[[laboratory execution system]] (LES), and
*[[Inventory management|chemical inventory management]] (CIM).
 
Before we get into the details of what these technologies are, we must establish a framework for understanding laboratory operations so that we can see where products fit in the lab's [[workflow]]. The products you introduce into your lab are going to depend on the lab's needs, and given the complexity of vendor offerings and their potential interactions and overlapping capabilities, defining your requirements is going to take some thought.
 
These products are undergoing a rapid evolution driven by market pressures as vendors compete for your business, partly by trying to cover as much of a lab’s operations as they can. If you look at functional checkboxes in brochures, they often cover similar elements, but their strengths, weaknesses, and methods of operation are different, and those differences should be important to you.
 
We are going to start by comparing two different types of laboratory environments: [[research]] labs and service labs (laboratories whose role is to provide testing, and assays, for example, [[quality control]], clinical labs, etc.). That comparison and subsequent material is going to be done with the aid of an initially simple model for the development and flow of knowledge, information, and data within laboratories (Figure 3-2).
 
[[File:Fig1 Liscouski DirectLabSysOnePerPersp21.png|400px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="400px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-2.''' Basic K/I/D model. Databases for knowledge, information, and data (K/I/D) are represented as ovals, and the processes acting on them as arrows.</blockquote>
|-
|}
|}
 
The model consists of ovals and arrows; the ovals represent collections of files and databases supporting applications, and the arrows are processes for working with the materials in those collections. The “ES” abbreviation notes processes that bring in elements from sources outside the lab. The first point we need to address is the definition of “knowledge,” “information,” and “data,” as used here. We are not talking about philosophical points, but how these are represented and used in the digital world. “Knowledge” is usually reports, documents, etc. that may exist as individual files or be organized and accessed through applications software. That software would include DMSs, databases for working with hazardous materials, reference databases, and access to published material stored locally or accessed over internal/external networks.
 
“Information” consists of elements that can be understood by themselves such as [[pH Meter|pH measurements]], the results of an analysis, an object's temperature, an infrared spectrum, the name of a file, and so on. Information elements can reside as fields in databases and files. Information can be provided as meaningful answers to questions. Finally, “Data” refers to measurements that by themselves may not have any meaning, or that require conversion or be combined with other data and analyzed before they are useful information. For example, the twenty-seventh value in a digital [[Chromatography|chromatogram]] data stream, the area of a peak (needs comparison to known quantities before it is useful), millivolt readings from a pH meter (you need temperature information and other elements before you can convert it to pH), etc.
 
There are grey areas and examples where these definitions may not hold up well, but the key point is how they are obtained, stored, and used. You may want to modify the definitions to suit your work, but the comments above are how they are used here.
 
The arrows represent processes that operate on material in one storage structure and put the results of the work in another. Processes will consist of one or more steps or stages and can include work done in the lab as well as outside the lab (outsourced processing, access to networked systems, for example). It is possible that operations will be performed on material in one storage structure and have those results placed in the same storage level. For example, several “information” elements may be processed to create a new information entity.
 
Initially, we are going to discuss the model as a two-dimensional structure, but those storage systems and processes have layers of software, hardware, and networked communications. In addition, the diagram as shown is a simplification of reality since it shows only one process for an experiment. In the real world, there would be a process line for each laboratory process in your lab, and the “data” oval would represent a collection of data storage elements from each of the data acquisition/storage/analysis systems in the lab. Each experimental process would have a link to the “knowledge” structures catalog of standard operating procedures (SOPs). As we start to build on this structure we can see how requirements for products and workflow can be derived.
 
The material covered in the Lab Bench chapter (Chapter 1) fits the model as shown in Figure 3-3 (the model used in the Lab Bench discussion is shown as well; it is represented by the light-/heavy-grey lines in the K/I/D model). The sample preparation and pre-instrumental analysis work is shown as the light-grey / heavy-line portion of the “Measurement & Experiments” process, with instrumental data acquisition, storage, and analysis displayed with the darker grey/heavy line (which may be entirely or partly completed by the software; the illustration shows the “partial” case). Not all experiment/instrument interactions result in the use of a storage system. Balances, for example, may hold one reading and then send it through to the information storage on command, where it may be analyzed with other information. The procedure description needed to execute a laboratory process—the measurement and experiment—would come from material in the “Knowledge” storage structure.
 
 
[[File:Fig3-3 Liscouski LabInfo24.png|750px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="750px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-3.''' The K/I/D model showing areas covered by the Lab Bench chapter.</blockquote>
|-
|}
|}
 
What we are concerned about in laboratory informatics is what happens to the data and information that results either from data analysis or information gained directly from experiments.
 
==Comparing two different laboratory operational environments==
By its nature, research work can be varied. SOPs may change weekly or more slowly, and the information collected can change from one experiment to another. The information collected can be in the form of tables, discrete reading, images, text, video, audio, or other types of information. The work may change depending on the course of the research project. Aside from product support on the lab bench, the primary need is for a means of recording the progress of projects and managing the documents connected with the work.
 
Until recently, that need was met through the use of [[Laboratory notebook|paper notebooks]], with entries dated, signed by the author, and counter-signed by a witness. The researcher would document the work by handwritten entries, and instrument output would be recorded similarly or have printouts taped to notebook pages. Anything that could not be printed and pasted in would have references entered. There are a few problems with this approach:
 
*Handwritten records can be difficult to decipher.
*Paper is subject to deterioration by a variety of methods.
*Taped/pasted entries can come loose and be lost.
*References to files, tapes, instrument recordings, etc. that are stored separately can become difficult to track if the material is moved.
*“Backup” can be a problem since you are working with physical media; the obvious step is to make copies of every page after it has been signed and witnessed.
*The notebook contents, particularly if the notebook has been archived, are only useful for as long as someone remembers that they exist and can provide information that helps in locating the entries. Most labs have stories that start “I remember someone doing that work, but…” and the matter is either dropped or the work repeated.
 
The intellectual property recorded in notebooks is of value only if someone knows it exists, and that it can be found and understood. There is another point that needs to be mentioned here that we will reference later: lab personnel consider entries in paper lab notebooks as “their work and their information.” To the extent that it represents their efforts, they are right, but when it comes to access control and ownership, the contents belong to whoever paid for the work.
 
Research work is sometimes done by a single individual, but often it has several people working together in the same lab or collaborating with researchers in other facilities. Those cooperative programs may be based on:
 
*Researchers working on independent projects from a common database of information on biological materials, chemical compounds and their effect on bacteria or viruses, [[toxicology]], pharmacology, pharmacokinetics, reaction mechanisms, etc. in [[Life sciences industry|life sciences]], data gathered from particle collisions in physics, chemical structures in [[chemistry]], and so on.
*Researchers working in collaborative programs where the outcomes are co- authored reports, presentations, etc.
 
Whether we are looking at single individuals or cooperative work, each of those situations has an impact on the way knowledge, information, and data is collected, developed, shared, and used. For example, take a look at Figure 3-4 below.
 
 
[[File:Fig3-4 Liscouski LabInfo24.png|500px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-4.''' Research on commond datasets.</blockquote>
|-
|}
|}
 
The figure shows three researchers working on individual projects, contributing to, and working from, a common data set; this could in fact be one data system or a collection of data systems from different sources. The bottom arrow represents data coming in from some outside source.
 
Regardless of whether this is an electronic system or a paper-based process, there still is the following set of requirements:
 
*There has to be a catalog of what records are in the system, as people need the ability to read the catalog, search it, and add new material. Removing material from the catalog would be prohibited for two reasons. First, someone might be using the material and deletion would create problems. Second, if you are working in a regulated environment, deleting material is not permitted. Making this work means that someone is going to be tasked as the system administrator.
*Material cannot be edited or modified. If changes are needed, the original material remains as-is, and a new record with the changed material is created that would contain a description of the changes, why they were made, who is responsible, and when the work was done (i.e., via an [[audit trail]]). This creates a parent-child relationship between data elements that could become a lengthy chain. One simple example can be found with imaging. An image of something is taken and stored, and any enhancements or modifications would create a set of new records linked back to the original material. This is an audit trail with all the requirements that are carried with it.
*The format for files containing material should be standardized. Having similar types of material in differing file structures can significantly increase the effort and ability to work with the data (data in this case, with the same issues extending to information and knowledge). If you have two or more people working with similar instruments, having the data stored in the same format is preferable to having different formats from different vendors. Plate readers (for microplates) usually use CSV file formats; however, instruments such as chromatographs, [[Spectrometry|spectrometers]], etc. will use different file formats depending on the vendors. If this is the case, you may have to export material in a neutral format for shared access. At this point in time, the user community has not developed the standardized file format for instrumentation to make this possible, hence the use of “should” earlier in this bullet rather than a stronger statement. That could change with the finalization of the ANiML standard for analytical data.<ref name="ASTMWK23265">{{cite web |url=http://www.astm.org/DATABASE.CART/WORKITEMS/WK23265.htm |archiveurl=https://web.archive.org/web/20130910044322/http://www.astm.org/DATABASE.CART/WORKITEMS/WK23265.htm |title=ASTM WK23265 |publisher=ASTM International |archivedate=10 September 2013 |accessdate=05 April 2024}}</ref> Users can standardize the format within their organization by working within a vendor’s product family.
*The infrastructure for [[Backup|data backups]] should be instituted to protect the material and access to it. This point alone would shift the implementation toward electronic data management because of the ease with which this can be done. Data collection represents a significant investment in resources, with a corresponding value, and backup copies (local and remote) are just part of good planning.
*Policies have to be defined about when and how archiving takes place. Material may be old, but still referenced, and removing it from the system will create issues. If the implementation is electronic, storage systems are becoming inexpensive, so expanding storage should not be a problem.
*The system must have sufficient security mechanisms to protect from unauthorized access and electronic intrusion.
 
These points form a basic set of requirements, that will be expanded, for any shared knowledge/information/data repository. If we extend the model to having the three individuals combine their “knowledge” sets into one shared repository, the same issues apply; they may be collaborating on a research project. The standardization of file formats is simpler since most “knowledge” is recorded as text and the processes for working that kind of material have been addressed, at least well enough to be readily workable with existing tools.
 
Service laboratories—those that do work supporting other departments, labs, etc. such as quality control, clinical chemistry, and so on—have a different set of operational characteristics. The model for service labs is very similar to that of research with one distinction: the “Synthesis” process is absent (Figure 3-5). That process is the one used to develop new knowledge. In the case of service labs, that knowledge is represented by test or assay procedures. Those procedures normally come from other sources and are not usually developed by service labs. One departure from this view is contract testing labs which may, as part of their contract, be tasked with procedure development. Analytical research groups are another, but while they may have some service lab work, they are also doing non-routine work that includes method development, and function as a specialized research-service laboratory.
 
 
[[File:Fig3-5 Liscouski LabInfo24.png|500px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-5.''' K/I/D model of the service laboratory.</blockquote>
|-
|}
|}
 
From the test submitter's point-of-view, they request test/assay work and wait patiently for the results from service labs. Those requests may be sporadic depending on the environment the lab operates in, or regularly scheduled as they would support a production process, with requests going to quality control.
 
The perspective from inside the lab is a bit different. Test requests arrive that may or may not be accompanied by the material to be tested. In both cases, the samples are logged in, prioritized, and scheduled for work. If the samples are not there, they are either collected (may be part of the lab's work) or put on hold until the samples arrive, then they are added to the work queue. As part of the lab operations, there is a need to:
 
*Generate worklists for different tests that will include the priority of work, sample locations, and who is scheduled to carry out the analysis; analysts may be required to obtain the samples or submit requests to a (physical) sample management system.
*Support queries for sample status, i.e., the “where are my results” requests.
*Provide a means of entering test results.
*Track sample status through phases of work; some results may be completed faster than others, priorities may change after some testing is done, additional testing may be required, and some test processes may require several different tests to be performed, which is common in clinical work, product stability testing, and formulation work.
*Review results, which can cause some tests to be repeated; if results are approved, reports need to be issued.
*Maintain instrument calibrations.
*Ensure that lab personnel have their training up-to-date.
*Ensure that reagents are up-to-date and have their quality/assays reviewed/checked.
*Prepare samples and carry out testing programs, which could include testing on a series of samples, or a scheduled program of testing of one or more materials (e.g., stability testing, formulations, etc.).
 
Aside from the use of scientific instruments, much of this behavior may seem to be typical of a variety of service-oriented operations. That point will be addressed later. The nature of service lab operations is highly structured and routine. You could move from one service to another, regardless of industry, and aside from the details of testing, feel comfortable with the routine.
 
The most time-consuming part of service lab operations is managing the workflow and keeping track of incoming and outgoing test requests and samples. It is bookkeeping work. Without the aid of computer systems, it is a matter of manually logging samples in—transcribing information from request forms into log books, using the same log books to find out what work needs to be done, and so on. Most of the management effort revolves around the logbook, including logging samples out, so time is wasted waiting for access to that book.
 
There is another characteristic service labs have that is not always shared with research environments: the need to communicate with other groups. Service labs don’t exist as independent facilities, and they have to be able to send and receive documents from other groups. In a research organization, it is the research lab themselves, while in production environments it is raw material receipt (i.e., certificate of analysis approval and testing), process control (i.e., in-line testing, product quality), shipping (i.e., certificates of analysis, product release documents), and customer service (i.e., product lot specifications).
 
Not all lab situations are as cleanly divided as described although many are. In small research operations, you may find both sets of characteristics in one operation, all under the heading of “research.” The laboratory workflow model for that situation is shown in Figure 3-6.
 
 
 
[[File:Fig3-6 Liscouski LabInfo24.png|600px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-6.''' Research with a dedicated testing group.</blockquote>
|-
|}
|}
 
 
The illustration shows a research group on the left with a common “data” resource, and storing the results of their work in a common “knowledge” structure (common in collaborative work) that can consist of reports, SOPs, materials catalogs, and so on. Each researcher works on their experiments and the dedicated testing group provides characterization / analytical services where needed (the testing group would get its SOPs from the research knowledge base, it may also develop its own).
 
One thing that is important to note is that the words “information” and “data” appear on both sides of the diagram. A sample may be submitted to the testing group to determine its purity. Using instrumental techniques, UV spectrophotometry or chromatography for example, results in the testing groups “data,” which is then analyzed to give a purity value which is recorded as “information.” When the results are transmitted back to research, they may reclassify the status of the value to “data” due to the nature of the experiments being done.
 
The point here is to contrast different operating characteristics of labs so that we can later show how informatics products fit differing laboratory workflows. In reality, multiple informatics products may be used by both types of labs, some with a stronger emphasis on one over another depending on the features the products have and how they help you work. Some are better suited to the inherent flexibility required in research, others fit the laboratory management requirements of the service lab, and still others will find useful applications in both. It is a matter of understanding what your needs are now and where they might go in the future. The next section will look at product capabilities and use so that you can match your needs to potential solutions, and be able to evaluate characteristics that can impact their implementation.
 
==Informatics and laboratory functions==
Figure 3-7 shows a set of five laboratory functions and four key informatics technologies, with the approximate dates that they came on the market. Since we are looking at technologies from multiple vendors with variations in functionality, the dates are guides. Over time, the technologies noted have evolved due to requests from customers and market pressures to remain competitive in the marketplace. Along the way vendors have closed shop, kept their place in the market, and also merged with other companies with products merged, retired, or remained active.
 
 
[[File:FigA3 Liscouski DirectLabSysOnePerPersp21.png|640px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="640px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-7.''' The starting point for major classes of lab informatics systems and the lab functions they addressed.</blockquote>
|-
|}
|}
 
One thing you’ll notice is that there are no product technologies in Figure 3-7 for “Document Management” and “Experiment & Test Execution.” Document management had been considered as a text document management issue and addressed by commercial software used for that application, the same type of software used in other departments. The initial product for experiment and test execution began with Velquest’s SmartLab (Velquest is now part of BIOVIA as of 2014), which was offered in the early 2000’s as a laboratory notebook: “VelQuest's electronic Laboratory Notebook using the SmartLab (electronic Process Management & Compliance) software, with a primary focus on the Procedure Execution module including data review and data reporting.”<ref name="VelQuestSmartLabArch07">{{cite web |url=http://velquest.com/notebooks/overview.asp |archiveurl=https://web.archive.org/web/20070101085646/http://www.velquest.com/notebooks/overview.asp |title=SmartLab System Overview |publisher=VelQuest |archivedate=01 January 2007 |accessdate=05 April 2024}}</ref> As the ELN market developed, the product was recast as a LES, first by Velquest, then by Accelrys.<ref name="AccelrysLab14">{{cite web |url=http://accelrys.com/products/process-management-and-compliance/lab-execution-system/index.html  |title=Lab Execution System |publisher=Accelrys |accessdate=15 October 2014}}{{Dead link}}</ref>
 
As you will see, the market has been undergoing rapid changes since the 1960s. Those changes have been due to:
 
*Computer hardware and software technology changes, which have seen computing moved from the data center to the lab desk and bench, and commercial software becoming available with increasing capability and ease-of-use.
*The market acceptance of computing and a shift in control of computing from management information systems (MIS, the precursor to today’s IT group) to individual groups with IT oversight.
*Laboratory professionals' acceptance of the role of computers in the lab, followed by developing needs for systems with more capability. To be fair, much of this was driven by vendors who saw opportunities, created products to meet needs, and educated those working in the lab about what they could accomplish using the products. The lab-professional-as-a-computing- technology-driver has yet to be realized; this is a needed development to help vendors understand what capabilities people want and give them a better footing to build their product development programs.
 
We’ll discuss the technologies in the order they came onto the market, which will help you understand how these products developed over time. Their appearance was spurred by people's experience with the earlier products, and vendors worked to fill gaps in offerings by upgrading existing products or creating new ones. One point we will address now, from the user's perspective, is the difference between the ELN and the LIMS.
 
In the case of an ELN, information is largely organized by experiment or research project/program. The content of an ELN is varied depending on the nature of the work being done, but it is typically heavily text-oriented. ELNs are widely applicable to most lab environments, with specialized variants for biology, chemistry, etc. ELNs are primarily a tool for recording descriptions, measurements, results, plans, experiments, and research projects. They can be viewed as a shareable (if desired) electronic diary.
 
With a LIMS, information is organized by sample (or samples within a test series such as stability or formulations work). The contents are usually numerical values but may also contain brief text entries. The use of LIMS is normally limited to organizations that provide testing services in a wide range of industries and are heavily used in quality control. LIMS is primarily a workflow/workload management tool with the ability to record the results of testing on samples. The LIS is a variation of LIMS targeted for the clinical and healthcare markets.
 
You’ll also note that descriptions for research work (such as the one above) are less detailed than those for service labs. Research work is very flexible, and the structure behind the work depends on the industry and how the lead researcher chooses to go about organizing programs and projects. Service labs have a consistent structure and behavior between industries and the type of work being done (clinical, analytical, physical testing, etc.) and as a result, are easier to describe. Figure 3-8 can be applied to almost any service lab in any industry. Trying to come up with a similar diagram for research with the same level of generalization would be difficult. This has an impact on vendors and the products they develop: the better characterized a process is, and the wider the application, the more likely they are to invest in product development.
 
 
[[File:Fig3-8 Liscouski LabInfo24.png|600px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-8.''' Laboratory informatics workflow model, reprinted with permission from ASTM's E1578-13 ''Standard Guide for Laboratory Informatics''.<ref name="ASTME1578-13">{{cite web |url=https://www.astm.org/e1578-13.html |title=ASTM E1578-13 Standard Guide for Laboratory Informatics |publisher=ASTM International |date=2013 |doi=10.1520/E1578-13 |accessdate=05 April 2024}}</ref></blockquote>
|-
|}
|}
 
The sequence of computer-based product development in scientific applications isn’t an accident. Vendors began with what they knew—instruments—and built products to improve their use. In informatics, LIMS came first, because service lab environments were well characterized and understood by the vendors. In sample preparation and the application of robotics to lab work, microplate support systems have seen wide and rapid development, while general-purpose applications are well behind that development curve. The development of ELNs has been less aggressive than LIMS, because of the variability of the underlying processes. If you want products developed, give the vendors a well-characterized, widely applicable process with enough potential sales opportunities to justify the work.
 
===Instrument data capture===
This was the subject of the Lab Bench chapter, and for the most part we’ll leave it at that. The only point we will make is that as instrument data systems (in some cases, most notably chromatography) took on the ability to support multiple instruments with a single computer, those systems took on rudimentary sample tracking capability. It wasn’t at the level of a commercial LIMS, but it was sufficient to manage multiple samples with multiple users and their worklists.
 
There is one point that we need to make that will apply to all the material in this book: questions that will be in the back of your mind are “How does that relate to my lab needs?” or “How can we make use of that?” Before you consider questions like that, you have to know how you want your lab to operate/function and what your requirements are. In order for you to be successful in applying products to lab work, you have to understand what problems you want to solve and how a given product's capabilities relate to them. There is always the possibility that some technology will come along and change the way you look at lab work. In order to appreciate that possibility, you have to understand your overall plan for technology usage.
 
===Sample tracking===
In the 1960s, instrument vendors and a few lab people were experimenting with interfacing computers and instruments. These systems were noted as “data stations.” In the 1970s, Perkin-Elmer Corporation began the development of a computer system to address laboratory management problems and created the first commercial LIMS. The intent of the name was to put some distance between it and the data stations it was selling and give the impression of covering all of a lab's needs. To put things into some temporal perspective, the ELN, a text-processing-heavy application, was in the future; word processors were just coming on the market. LIMS was a poor choice for a product name since users expected it to cover all “laboratory information,” including budgets, vacation schedules, documents, and any information they might have in the lab. What it did do is help track samples. 
 
Within quality control and analytical laboratories, managing sample flow was a significant problem consuming people's time, and addressing it was a worthwhile effort (Perkin-Elmer’s – ‘P-E’ - instruments were used in chemical analysis and as a result, these labs were natural marketing targets). (The problem was described in the service lab section above and the workflow view shown in Figure 3-8.) The P-E product was the first of a long list of products to address the same issue, some with overlays to handle specific markets. One list<ref name="LWLIMSVendor">{{cite web |url=http://www.limswiki.org/index.php/LIMS_vendor |archiveurl=https://web.archive.org/web/20150226154945/https://www.limswiki.org/index.php/LIMS_vendor |title=LIMS vendor |work=LIMSwiki.org |date=04 February 2015 |archivedate=26 February 2015 |accessdate=05 April 2024}}</ref> contains over one hundred products. Among the specific markets are the:  
 
*wine industry, with its unique tests;
*food and beverage industry, with its product stability testing; and the
*pharmaceutical and biotech industries, with its formulation work (similar to stability work) concerned with environmental issues and the effectiveness of formulation over time and with different packaging/delivery systems.
 
Tailoring LIMS to address these activities can significantly reduce the amount of effort needed to manage testing and drastically reduce the amount of data entry required. One common application is product stability testing, as noted above.


A project is created where a series of samples are placed in controlled environments (i.e., temperature, humidity, etc.) and tested according to a schedule that can run to hundreds of sample-test combinations; a stability overlay asks some basic questions and then generates the test regime and logins needed to support the work. Without this type of application overlay, each sample would have to be logged in individually, opening the possibility of data entry errors and requiring someone to check each entry and correct mistakes, a time-consuming, labor-intensive process.
This brief topical article will examine


The following illustrations and descriptive text, provided and used with permission by LabLite Systems LLC, show a data entry screen (Figure 3-9a) and a search screen (Figure 3-9b).
==The "FAIR-ification" of research objects and software==
First discussed during a 2014 FORCE-11 workshop dedicated to "overcoming data discovery and reuse obstacles," the [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR data principles]] were published by Wilkinson ''et al.'' in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and [[information]] of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.<ref name="WilkinsonTheFAIR16">{{Cite journal |last=Wilkinson |first=Mark D. |last2=Dumontier |first2=Michel |last3=Aalbersberg |first3=IJsbrand Jan |last4=Appleton |first4=Gabrielle |last5=Axton |first5=Myles |last6=Baak |first6=Arie |last7=Blomberg |first7=Niklas |last8=Boiten |first8=Jan-Willem |last9=da Silva Santos |first9=Luiz Bonino |last10=Bourne |first10=Philip E. |last11=Bouwman |first11=Jildau |date=2016-03-15 |title=The FAIR Guiding Principles for scientific data management and stewardship |url=https://www.nature.com/articles/sdata201618 |journal=Scientific Data |language=en |volume=3 |issue=1 |pages=160018 |doi=10.1038/sdata.2016.18 |issn=2052-4463 |pmc=PMC4792175 |pmid=26978244}}</ref> The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."<ref name="WilkinsonTheFAIR16" />


Since 2016, other research stakeholders have taken to publishing their thoughts about how the FAIR principles apply to their fields of study and practice<ref name="NIHPubMedSearch">{{cite web |url=https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles |title=fair data principles |work=PubMed Search |publisher=National Institutes of Health, National Library of Medicine |accessdate=30 April 2024}}</ref>, including in ways beyond what perhaps was originally imagined by Wilkinson ''et al.''. For example, multiple authors have examined whether or not the software used in scientific endeavors itself can be considered a research object worth being developed and managed in tandem with the FAIR data principles.<ref>{{Cite journal |last=Hasselbring |first=Wilhelm |last2=Carr |first2=Leslie |last3=Hettrick |first3=Simon |last4=Packer |first4=Heather |last5=Tiropanis |first5=Thanassis |date=2020-02-25 |title=From FAIR research data toward FAIR and open research software |url=https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html |journal=it - Information Technology |language=en |volume=62 |issue=1 |pages=39–47 |doi=10.1515/itit-2019-0040 |issn=2196-7032}}</ref><ref name="GruenpeterFAIRPlus20">{{Cite web |last=Gruenpeter, M. |date=23 November 2020 |title=FAIR + Software: Decoding the principles |url=https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf |format=PDF |publisher=FAIRsFAIR “Fostering FAIR Data Practices In Europe” |accessdate=30 April 2024}}</ref><ref>{{Cite journal |last=Barker |first=Michelle |last2=Chue Hong |first2=Neil P. |last3=Katz |first3=Daniel S. |last4=Lamprecht |first4=Anna-Lena |last5=Martinez-Ortiz |first5=Carlos |last6=Psomopoulos |first6=Fotis |last7=Harrow |first7=Jennifer |last8=Castro |first8=Leyla Jael |last9=Gruenpeter |first9=Morane |last10=Martinez |first10=Paula Andrea |last11=Honeyman |first11=Tom |date=2022-10-14 |title=Introducing the FAIR Principles for research software |url=https://www.nature.com/articles/s41597-022-01710-x |journal=Scientific Data |language=en |volume=9 |issue=1 |pages=622 |doi=10.1038/s41597-022-01710-x |issn=2052-4463 |pmc=PMC9562067 |pmid=36241754}}</ref><ref>{{Cite journal |last=Patel |first=Bhavesh |last2=Soundarajan |first2=Sanjay |last3=Ménager |first3=Hervé |last4=Hu |first4=Zicheng |date=2023-08-23 |title=Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool |url=https://www.nature.com/articles/s41597-023-02463-x |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=557 |doi=10.1038/s41597-023-02463-x |issn=2052-4463 |pmc=PMC10447492 |pmid=37612312}}</ref><ref>{{Cite journal |last=Du |first=Xinsong |last2=Dastmalchi |first2=Farhad |last3=Ye |first3=Hao |last4=Garrett |first4=Timothy J. |last5=Diller |first5=Matthew A. |last6=Liu |first6=Mei |last7=Hogan |first7=William R. |last8=Brochhausen |first8=Mathias |last9=Lemas |first9=Dominick J. |date=2023-02-06 |title=Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software |url=https://link.springer.com/10.1007/s11306-023-01974-3 |journal=Metabolomics |language=en |volume=19 |issue=2 |pages=11 |doi=10.1007/s11306-023-01974-3 |issn=1573-3890}}</ref> Researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts, recognize that digital research objects go beyond data and information, and recognize "the specific nature of software" and not consider it "just data."<ref name="GruenpeterFAIRPlus20" /> The end result has been applying the core concepts of FAIR but differently from data, with the added context of research software being more than just data, requiring more nuance and a different type of planning from applying FAIR to digital data and information.


[[File:Fig3-9a Liscouski LabInfo24.png|500px]]
A 2019 survey by Europe's FAIRsFAIR found that researchers seeking and re-using relevant research software on the internet faced multiple challenges, including understanding and/or maintaining the necessary software environment and its dependencies, finding sufficient documentation, struggling with accessibility and licensing issues, having the time and skills to install and/or use the software, finding quality control of the source code lacking, and having an insufficient (or non-existent) software sustainability and management plan.<ref name="GruenpeterFAIRPlus20" /> These challenges highlight the importance of software to researchers and other stakeholders, and the roll FAIR has in better ensuring such software is findable, interoperable, and reusable, which in turn better ensures researchers' software-driven research is repeatable (by the same research team, with the same experimental setup), reproducible (by a different research team, with the same experimental setup), and replicable (by a different research team, with a different experimental setup).<ref name="GruenpeterFAIRPlus20" />
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="500px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-9a.''' Stability Data Entry Screen. The Stability Entry screen allows your organization to standardize the way studies are created and submitted. Each of the dropdown fields on this screen pull data from tables that are pre-populated by LabLite or an administrator. Each study can have any number of pre-defined conditions. When researchers enter their study information into the program the pulls [tests] for each study condition populate based on predefined rules. The researcher can also go into the grid for each condition and open the calendar tool to schedule additional pulls.</blockquote>
|-
|}
|}


[[File:Fig3-9b Liscouski LabInfo24.png|600px]]
At this point, the topic of what "research software" represents must be addressed further, and, unsurprisingly, it's not straightforward. Ask 20 researchers what "research software" is, and you may get 20 different opinions. Some definitions can be more objectively viewed as too narrow, while others may be viewed as too broad, with some level of controversy inherent in any mutual discussion.<ref name="GruenpeterDefining21">{{Cite journal |last=Gruenpeter, Morane |last2=Katz, Daniel S. |last3=Lamprecht, Anna-Lena |last4=Honeyman, Tom |last5=Garijo, Daniel |last6=Struck, Alexander |last7=Niehues, Anna |last8=Martinez, Paula Andrea |last9=Castro, Leyla Jael |last10=Rabemanantsoa, Tovo |last11=Chue Hong, Neil P. |date=2021-09-13 |title=Defining Research Software: a controversial discussion |url=https://zenodo.org/record/5504016 |journal=Zenodo |doi=10.5281/zenodo.5504016}}</ref><ref name="JulichWhatIsRes24">{{cite web |url=https://www.fz-juelich.de/en/rse/about-rse/what-is-research-software |title=What is Research Software? |work=JuRSE, the Community of Practice for Research Software Engineering |publisher=Forschungszentrum Jülich |date=13 February 2024 |accessdate=30 April 2024}}</ref><ref name="vanNieuwpoortDefining24">{{Cite journal |last=van Nieuwpoort |first=Rob |last2=Katz |first2=Daniel S. |date=2023-03-14 |title=Defining the roles of research software |url=https://upstream.force11.org/defining-the-roles-of-research-software |language=en |doi=10.54900/9akm9y5-5ject5y}}</ref> In 2021, as part of the FAIRsFAIR initiative, Gruenpeter ''et al.'' made a good-faith effort to define "research software" with the feedback of multiple stakeholders. Their efforts resulted in this definition<ref name="GruenpeterDefining21" />:
{{clear}}
{|  
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-9b.''' Stability Study Screen. After logging into the stability program users are brought to a search screen. From this screen you can use the dropdowns to search for study information in a variety of ways; including study start, end or pull [test] date; product name or number, study number, lot number, researcher, department, reference number, etc. You can also choose to show all items or only those that have been pulled or not pulled.</blockquote>
|-
|}
|}


Another is the clinical industry. One point that needs to be addressed is the difference between a LIMS and LIS. They are very similar systems in terms of function and capability. LIMS is used in a much broader market (e.g., life sciences, environmental, chemical, oil and gas, etc.), almost everywhere except clinical environments. LIS are rarely encountered outside clinical labs because of their close ties to patient records and healthcare. Some LIMS vendors will try to sell their products into clinical labs claiming that their offerings are technically more sophisticated and using current technologies. That said, there are two basic but significant differences:
<blockquote>Research software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) that are used for research but were not created during, or with a clear research intent, should be considered "software [used] in research" and not research software. This differentiation may vary between disciplines. The minimal requirement for achieving computational reproducibility is that all the computational components (i.e., research software, software used in research, documentation, and hardware) used during the research are identified, described, and made accessible to the extent that is possible.</blockquote>


#LIS is patient-centered. It is still sample tracking (with test results), but everything is tied to the patient, providing a higher level of organization. This allows a doctor to examine a patient’s records over time and by test. A LIMS is sample-oriented; samples may be part of a project (the project name would be part of the sample description) or a stability (or similar) test program. This difference is significant because of laws controlling access to patient health records and the need to coordinate a patient’s clinical results with other health and personal data. This brings us to the second difference.
Note that while the definition primarily recognizes software created during the research process, software created (whether by the research group, other open-source software developers outside the organization, or even commercial software developers) "for a research purpose" outside the actual research process is also recognized as research software. This notably can lead to disagreement about whether a proprietary, commercial spreadsheet or [[laboratory information management system]] (LIMS) offering that conducts analyses and visualizations of research data can genuinely be called research software, or simply classified as software used in research. van Nieuwpoort and Katz further elaborated on this concept, at least indirectly, by formally defining the roles of research software in 2023. Their definition of the various roles of research software—without using terms such as "open-source," "commercial," or "proprietary"—essentially further defined what research software is<ref name="vanNieuwpoortDefining24" />:
#LIS supports [[Health Level 7]] (HL7) communications. We’ll cover the details in the Standards Appendix, but the short version is this: clinical LIS has to communicate with systems both within the lab and outside of it. HL7 provides the mechanism to do that, and that isn’t supported (broadly) by LIMS. It allows LIS to "talk" to [[hospital]] administrative systems and instrument data systems, providing a foundational element to “total laboratory computing” systems. It helps things work together in ways that haven’t yet happened in the broader life sciences and industrial markets. This is a very large point and has contributed to cost reductions in laboratory testing.


The following list is a set of functions that you should expect a LIMS to support, the information is from the ''Laboratory, Scientific, and Health Informatics Buyer's Guide''<ref name="LWLabSciHealth15">{{cite web |url=http://www.limswiki.org/index.php?title=Laboratory,_Scientific,_and_Health_Informatics_Buyer%27s_Guide |archiveurl=https://web.archive.org/web/20150214043219/http://www.limswiki.org/index.php?title=Laboratory,_Scientific,_and_Health_Informatics_Buyer%27s_Guide |title=Laboratory, Scientific, and Health Informatics Buyer's Guide |work=LIMSwiki |date=02 February 2015 |archivedate=14 February 2015 |accessdate=05 April 2024}}</ref>:
*Research software is a component of our instruments.
*Research software is the instrument.
*Research software analyzes research data.
*Research software presents research results.
*Research software assembles or integrates existing components into a working whole.
*Research software is infrastructure or an underlying tool.
*Research software facilitates distinctively research-oriented collaboration.


*audit trail
When considering these definitions<ref name="GruenpeterDefining21" /><ref name="vanNieuwpoortDefining24" /> of research software and their adoption by other entities<ref name="F1000Open24">{{cite web |url=https://www.f1000.com/resources-for-researchers/open-research/open-source-software-code/ |title=Open source software and code |publisher=F1000 Research Ltd |date=2024 |accessdate=30 April 2024}}</ref>, it would appear that at least in part some [[laboratory informatics]] software—whether open-source or commercially proprietary—fills these roles in academic, military, and industry research laboratories of many types. In particular, [[electronic laboratory notebook]]s (ELNs) like open-source [[Jupyter Notebook]] or proprietary ELNs from commercial software developers fill the role of analyzing and visualizing research data, including developing molecular models for new promising research routes.<ref name="vanNieuwpoortDefining24" /> Even more advanced LIMS solutions that go beyond simply collating, auditing, securing, and reporting analytical results could conceivably fall under the umbrella of research software, particularly if many of the analytical, integration, and collaboration tools required in modern research facilities are included in the LIMS.
*barcoding
*batching
*chain of custody
*configurable setup
*customization capabilities
*data entry
*data warehousing and mining
*document management
*electronic data exchange
*event-driven actions
*fax and email integration
*formulas
*instrument interfacing, calibration, and maintenance
*inventory
*login and accessioning
*multiple location/department support
*pairing external files to LIMS samples
*regulatory compliance
*reporting
*review and approval
*sample management and tracking
*scheduling
*training tracking
*trending and control charting
*version control
*workload management
*workflow management


A more extensive set of functions can be found in the ASTM E1578-13 standard, including those that extend beyond LIMS and LIS.<ref name="ASTME1578-13" /> Both LIMS and LIS have functions that support the model in Figure 3-8. That said, there are a few additional points worth noting.
Ultimately, assuming that some laboratory informatics software can be considered research software and not just "software used in research," it's tough not to arrive at some deeper implications of research organizations' increasing need for FAIR data objects and software, particularly for laboratory informatics software and the developers of it.


Systems provide functions for sample logins that include:
==Implications of the FAIR concept to laboratory informatics software==
===The global FAIR initiative affects, and even benefits, commercial laboratory informatics research software developers as much as it does academic and institutional ones===
To be clear, there is undoubtedly a difference in the software development approach of "homegrown" research software by academics and institutions, and the more streamlined and experienced approach of commercial software development houses as applied to research software. Moynihan of Invenia Technical Computing described the difference in software development approaches thusly in 2020, while discussing the concept of "research software engineering"<ref name="MoynihanTheHitch20">{{cite web |url=https://invenia.github.io/blog/2020/07/07/software-engineering/ |title=The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE |author=Moynihan, G. |work=Invenia Blog |publisher=Invenia Technical Computing Corporation |date=07 July 2020}}</ref>:


*Manual logins – reading information from forms and entering it into the system; the gains in productivity come from the ease of working with the information once it is entered. Logins also support the printing of [[barcode]]d labels with sample IDs and information you feel is relevant.
<blockquote>Since the environment and incentives around building academic research software are very different to those of industry, the workflows around the former are, in general, not guided by the same engineering practices that are valued in the latter. That is to say: there is a difference between what is important in writing software for research, and for a user-focused software product. Academic research software prioritizes scientific correctness and flexibility to experiment above all else in pursuit of the researchers’ end product: published papers. Industry software, on the other hand, prioritizes maintainability, robustness, and testing, as the software (generally speaking) is the product. However, the two tracks share many common goals as well, such as catering to “users” [and] emphasizing performance and reproducibility, but most importantly both ventures are collaborative. Arguably then, both sets of principles are needed to write and maintain high-quality research software.</blockquote>
*Batch logins – being able to log in several samples with similar requirements as a group, each with its sample ID. Reduces information entry and speeds the process along.
*Remote logins – allowing submitters to log samples in.
*Scheduled logins - occuring automatically at a given frequency, producing a sample label at the remote sampling location.


Event-driven actions can be tied to the login process as well as other events. For example, if samples haven’t arrived in the lab, an “event-driven action” can notify lab personnel when the samples arrive, and their status changes from “waiting” to “in the lab.
This brings us to our first point: the application of small-scale, FAIR-driven academic research software engineering practices and elements to the larger development of more commercial laboratory informatics software, and vice versa with the application of commercial-scale development practices to small FAIR-focused academic and institutional research software engineering efforts, has the potential to help better support all research laboratories using both independently-developed and commercial research software.  


The next few illustrations (Figure 3-10a-c) are mock-ups of LIMS screens. These replace the original out-of-date graphics. The data shown in the illustrations is made-up sample information used for demonstration; if it doesn’t seem “real,” it isn’t.
The concept of the research software engineer (RSE) began to take full form in 2012, and since then universities and institutions of many types have formally developed their own RSE groups and academic programs.<ref name="WoolstonWhySci22">{{Cite journal |last=Woolston |first=Chris |date=2022-05-31 |title=Why science needs more research software engineers |url=https://www.nature.com/articles/d41586-022-01516-2 |journal=Nature |language=en |pages=d41586–022–01516-2 |doi=10.1038/d41586-022-01516-2 |issn=0028-0836}}</ref><ref name="KITRSE@KIT24">{{cite web |url=https://www.rse-community.kit.edu/index.php |title=RSE@KIT |publisher=Karlsruhe Institute of Technology |date=20 February 2024 |accessdate=01 May 2024}}</ref><ref name="PUPurdueCenter">{{cite web |url=https://www.rcac.purdue.edu/rse |title=Purdue Center for Research Software Engineering |publisher=Purdue University |date=2024 |accessdate=01 May 2024}}</ref> RSEs range from pure software developers with little knowledge of a given research discipline, to scientific researchers just beginning to learn how to develop software for their research project(s). While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research."<ref name="WoolstonWhySci22" /><ref name="CohenTheFour21">{{Cite journal |last=Cohen |first=Jeremy |last2=Katz |first2=Daniel S. |last3=Barker |first3=Michelle |last4=Chue Hong |first4=Neil |last5=Haines |first5=Robert |last6=Jay |first6=Caroline |date=2021-01 |title=The Four Pillars of Research Software Engineering |url=https://ieeexplore.ieee.org/document/8994167/ |journal=IEEE Software |volume=38 |issue=1 |pages=97–105 |doi=10.1109/MS.2020.2973362 |issn=0740-7459}}</ref> Elaborating on that concept, Cohen ''et al.'' add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."<ref name="CohenTheFour21" />


The concept of [[software quality management]] (SQM) has traditionally not been lost on professional, commercial software development businesses. Good SQM practices have been less prevalent in homegrown research software development; however, the expanded adoption of FAIR data and FAIR software approaches has shifted the focus on to the repeatability, reproducibility, and interoperability of research results and data produced by a more sustainable research software. The adoption of FAIR by academic and institutional research labs not only brings commercial SQM and other software development approaches into their workflow, but also gives commercial laboratory informatics software developers an opportunity to embrace many aspects of the FAIR approach to laboratory research practices, including lessons learned and development practices from the growing number of RSEs. This doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.<ref>{{Cite journal |last=Hasselbring |first=Wilhelm |last2=Carr |first2=Leslie |last3=Hettrick |first3=Simon |last4=Packer |first4=Heather |last5=Tiropanis |first5=Thanassis |date=2020-02-25 |title=From FAIR research data toward FAIR and open research software |url=https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html |journal=it - Information Technology |language=en |volume=62 |issue=1 |pages=39–47 |doi=10.1515/itit-2019-0040 |issn=2196-7032}}</ref> However, as Moynihan noted, both research software development paradigms stand to gain from the shift to more FAIR data and software.<ref name="MoynihanTheHitch20" /> Additionally, if commercial laboratory informatics vendors want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to those labs.


[[File:Fig3-10a Liscouski LabInfo24.png|700px]]
===The focus on data types and metadata within the scope of FAIR is shifting how laboratory informatics software developers and RSEs make their research software and choose their database approaches===
{{clear}}
Close to the core of any deep discussion of the FAIR data principles are the concepts of data models, data types, [[metadata]], and persistent unique identifiers (PIDs). Making research objects more findable, accessible, interoperable, and reusable is no easy task when data types and approaches to metadata assignment (if there even is such an approach) are widely differing and inconsistent. Metadata is a means for better storing and characterizing research objects for the purposes of ensuring provenance and reproducibility of those research objects.<ref name="GhiringhelliShared23">{{Cite journal |last=Ghiringhelli |first=Luca M. |last2=Baldauf |first2=Carsten |last3=Bereau |first3=Tristan |last4=Brockhauser |first4=Sandor |last5=Carbogno |first5=Christian |last6=Chamanara |first6=Javad |last7=Cozzini |first7=Stefano |last8=Curtarolo |first8=Stefano |last9=Draxl |first9=Claudia |last10=Dwaraknath |first10=Shyam |last11=Fekete |first11=Ádám |date=2023-09-14 |title=Shared metadata for data-centric materials science |url=https://www.nature.com/articles/s41597-023-02501-8 |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=626 |doi=10.1038/s41597-023-02501-8 |issn=2052-4463 |pmc=PMC10502089 |pmid=37709811}}</ref><ref name="FirschenAgile22">{{Cite journal |last=Fitschen |first=Timm |last2=tom Wörden |first2=Henrik |last3=Schlemmer |first3=Alexander |last4=Spreckelsen |first4=Florian |last5=Hornung |first5=Daniel |date=2022-10-12 |title=Agile Research Data Management with FDOs using LinkAhead |url=https://riojournal.com/article/96075/ |journal=Research Ideas and Outcomes |volume=8 |pages=e96075 |doi=10.3897/rio.8.e96075 |issn=2367-7163}}</ref> This means as early as possible implementing a software-based approach that is FAIR-driven, capturing FAIR metadata using flexible domain-driven [[Ontology (information science)|ontologies]] (i.e., controlled vocabularies) at the source and cleaning up old research objects that aren't FAIR-ready while also limiting hindrances to research processes as much as possible.<ref name="FirschenAgile22" /> And that approach must value the importance of metadata and PIDs. As Weigel ''et al.'' note in a discussion on making laboratory data and workflows more machine-findable: "Metadata capture must be highly automated and reliable, both in terms of technical reliability and ensured metadata quality. This requires an approach that may be very different from established procedures."<ref>{{Cite journal |last=Weigel |first=Tobias |last2=Schwardmann |first2=Ulrich |last3=Klump |first3=Jens |last4=Bendoukha |first4=Sofiane |last5=Quick |first5=Robert |date=2020-01 |title=Making Data and Workflows Findable for Machines |url=https://direct.mit.edu/dint/article/2/1-2/40-46/9994 |journal=Data Intelligence |language=en |volume=2 |issue=1-2 |pages=40–46 |doi=10.1162/dint_a_00026 |issn=2641-435X}}</ref> Enter non-relational RDF [[knowledge graph]] [[database]]s.
{|  
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-10a.''' Startup Screen – The buttons on the left give the viewer access to the major functions in the system, and the main screen is one view. Clicking on an element brings you to the detail page of that screen. What you see can be configured by application configuration settings or by editing the HTML image. The startup screen can be unique for each user; what a user sees can be limited to what they are qualified to work on.</blockquote>
|-  
|}
|}


[[File:Fig3-10b Liscouski LabInfo24.png|700px]]
This brings us to our second point: given the importance of metadata and PIDs to FAIRifying research objects (and even research software), established, more traditional research software development methods using common relational databases may not be enough, even for commercial laboratory informatics software developers. Non-relational [[Resource Description Framework]] (RDF) knowledge graph databases used in FAIR-driven, well-designed laboratory informatics software help make research objects more FAIR for all research labs.  
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-10b.''' Shows a set of samples selected within a date range. The information can be organized by several methods, and samples of a similar type with the same test requests can be used to look at variations in results to look for any outliers or systematic issues. The Folder is populated by a user-defined query to group samples or other LIMS objects by property. The result of the query can be sorted by any parameter associated with the LIMS object (e.g., priority, date logged, time in system) as well as grouped by parameter. The LIMS objects are shown in an expandable view with graphical images indicating their status and condition. The right pane is used to show various views of individual elements of the folder (e.g., sample, test, result) when selected or can show a summary of all data in the Folder (shown). The tabs in the right pane show different options available to the user and include the ability to act on the LIMS object (e.g., edit tests assigned).</blockquote>
|-
|}
|}


[[File:Fig3-10c Liscouski LabInfo24.png|700px]]
Research objects can take many forms (i.e., data types), making the storage and management of those objects challenging, particularly in research settings with great diversity of data, as with materials research. Some have approached this challenge by combining different database and systems technologies that are best suited for each data type.<ref name="AggourSemantics24">{{Cite journal |last=Aggour |first=Kareem S. |last2=Kumar |first2=Vijay S. |last3=Gupta |first3=Vipul K. |last4=Gabaldon |first4=Alfredo |last5=Cuddihy |first5=Paul |last6=Mulwad |first6=Varish |date=2024-04-09 |title=Semantics-Enabled Data Federation: Bringing Materials Scientists Closer to FAIR Data |url=https://link.springer.com/10.1007/s40192-024-00348-4 |journal=Integrating Materials and Manufacturing Innovation |language=en |doi=10.1007/s40192-024-00348-4 |issn=2193-9764}}</ref> However, while query performance and storage footprint improves with this approach, data across the different storage mechanisms typically remains unlinked and non-compliant with FAIR principles. Here, either a full RDF knowledge graph database or similar integration layer is required to better make the research objects more interoperable and reusable, whether it's materials records or specimen data.<ref name="AggourSemantics24" /><ref name="GrobeFromData19">{{Cite journal |last=Grobe |first=Peter |last2=Baum |first2=Roman |last3=Bhatty |first3=Philipp |last4=Köhler |first4=Christian |last5=Meid |first5=Sandra |last6=Quast |first6=Björn |last7=Vogt |first7=Lars |date=2019-06-26 |title=From Data to Knowledge: A semantic knowledge graph application for curating specimen data |url=https://biss.pensoft.net/article/37412/ |journal=Biodiversity Information Science and Standards |language=en |volume=3 |pages=e37412 |doi=10.3897/biss.3.37412 |issn=2535-0897}}</ref>
{{clear}}
{|  
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-10c.''' "KPI" refers to key performance indicators, and as you can see, provides management with information about how the lab is performing and if sample processing is keeping up with expectations.</blockquote>
|-  
|}
|}


In early systems, “data entry” was a manual process of reading information from a paper notebook or computer printout and retyping it into the LIMS via a data entry form. Even though systems have been improved to permit electronic transfer of data—particularly in LIS supporting HL7 communications protocols—you still find people manually entering data. This fails to take full advantage of the potential for increasing productivity by having people do work that could be done by systems, and, requiring additional work (a second person checking entries) to avoid transcription errors.  
It is beyond the scope of this Q&A article to discuss RDF knowledge graph databases at length. (For a deeper dive on this topic, see Rocca-Serra ''et al.'' and the FAIR Cookbook.<ref name="Rocca-SerraFAIRCook22">{{Cite book |last=Rocca-Serra, Philippe |last2=Sansone, Susanna-Assunta |last3=Gu, Wei |last4=Welter, Danielle |last5=Abbassi Daloii, Tooba |last6=Portell-Silva, Laura |date=2022-06-30 |title=D2.1 FAIR Cookbook |url=https://zenodo.org/record/6783564 |chapter=FAIR and Knowledge graphs |doi=10.5281/ZENODO.6783564}}</ref>) However, know that the primary strength of these databases to FAIRification of research objects is their ability to provide [[Semantics|semantic]] transparency (i.e., provide a framework for better understanding and reusing the greater research object through basic examination of the relationships of its associated metadata and their constituents), making these objects more easily accessible, interoperable, and machine-readable.<ref name="AggourSemantics24" /> The resulting knowledge graphs, with their "subject-property-object" syntax and PIDs or uniform resource identifiers (URIs) helping to link data, metadata, ontology classes, and more, can be interpreted, searched, and linked by machines, and made human-readable, resulting in better research through derivation of new knowledge from the existing research objects. The end result is a representation of heterogeneous data and metadata that complies with the FAIR guiding principles.<ref name="AggourSemantics24" /><ref name="GrobeFromData19" /><ref name="Rocca-SerraFAIRCook22" /><ref name="TomlinsonRDF23">{{cite web |url=https://21624527.fs1.hubspotusercontent-na1.net/hubfs/21624527/Resources/RDF%20Knowledge%20Graph%20Databases%20White%20Paper.pdf |format=PDF |title=RDF Knowledge Graph Databases: A Better Choice for Life Science Lab Software |author=Tomlinson, E. |publisher=Semaphore Solutions, Inc |date=28 July 2023 |accessdate=01 May 2024}}</ref><ref name="DeagenFAIRAnd22">{{Cite journal |last=Deagen |first=Michael E. |last2=McCusker |first2=Jamie P. |last3=Fateye |first3=Tolulomo |last4=Stouffer |first4=Samuel |last5=Brinson |first5=L. Cate |last6=McGuinness |first6=Deborah L. |last7=Schadler |first7=Linda S. |date=2022-05-27 |title=FAIR and Interactive Data Graphics from a Scientific Knowledge Graph |url=https://www.nature.com/articles/s41597-022-01352-z |journal=Scientific Data |language=en |volume=9 |issue=1 |pages=239 |doi=10.1038/s41597-022-01352-z |issn=2052-4463 |pmc=PMC9142568 |pmid=35624233}}</ref><ref>{{Cite journal |last=Brandizi |first=Marco |last2=Singh |first2=Ajit |last3=Rawlings |first3=Christopher |last4=Hassani-Pak |first4=Keywan |date=2018-09-25 |title=Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach |url=https://www.degruyter.com/document/doi/10.1515/jib-2018-0023/html |journal=Journal of Integrative Bioinformatics |language=en |volume=15 |issue=3 |pages=20180023 |doi=10.1515/jib-2018-0023 |issn=1613-4516 |pmc=PMC6340125 |pmid=30085931}}</ref> This concept can even be extended to ''post factum'' visualizations of the knowledge graph data<ref name="DeagenFAIRAnd22" />, as well as the FAIR management of computational laboratory [[workflow]]s.<ref>{{Cite journal |last=de Visser |first=Casper |last2=Johansson |first2=Lennart F. |last3=Kulkarni |first3=Purva |last4=Mei |first4=Hailiang |last5=Neerincx |first5=Pieter |last6=Joeri van der Velde |first6=K. |last7=Horvatovich |first7=Péter |last8=van Gool |first8=Alain J. |last9=Swertz |first9=Morris A. |last10=Hoen |first10=Peter A. C. ‘t |last11=Niehues |first11=Anna |date=2023-09-28 |editor-last=Palagi |editor-first=Patricia M. |title=Ten quick tips for building FAIR workflows |url=https://dx.plos.org/10.1371/journal.pcbi.1011369 |journal=PLOS Computational Biology |language=en |volume=19 |issue=9 |pages=e1011369 |doi=10.1371/journal.pcbi.1011369 |issn=1553-7358 |pmc=PMC10538699 |pmid=37768885}}</ref>


The reliance on manual data entry quickly generated a demand for electronic transfer of information between LIMS, instruments, and instrument data systems. Some of those connections were made by customers, and others were provided by vendors. The LIS and LIMS market approached the problem from different directions depending on how they viewed the marketplace: LIS were entirely focused on clinical and healthcare operations, LIMS vendors wanted to cover everything else (chemistry, oil and gas, pharmaceuticals, biotechnology, manufacturing QC, etc.) and clinical labs where they could. Although a working diagram (Figure 3-11) looks the same for both cases, the mindset is completely different.  
While rare, some commercial laboratory informatics vendors like Semaphore Solutions have already recognized the potential of RDF knowledge graph databases to FAIR-driven laboratory research, having implemented such structures into their offerings.<ref name="TomlinsonRDF23" /> (The use of knowledge graphs has already been demonstrated in academic research software, such as with the ELN tools developed by RSEs at the University of Rostock and University of Amsterdam.<ref>{{Cite journal |last=Schröder |first=Max |last2=Staehlke |first2=Susanne |last3=Groth |first3=Paul |last4=Nebe |first4=J. Barbara |last5=Spors |first5=Sascha |last6=Krüger |first6=Frank |date=2022-12 |title=Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation |url=https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-021-00257-x |journal=Journal of Biomedical Semantics |language=en |volume=13 |issue=1 |pages=4 |doi=10.1186/s13326-021-00257-x |issn=2041-1480 |pmc=PMC8802522 |pmid=35101121}}</ref>) As noted in the prior point, it is potentially advantageous to not only laboratory informatics vendors to provide but also research labs to use relevant and sustainable research software that has the FAIR principles embedded in the software's design. Turning to knowledge graph databases is another example of keeping such software relevant and FAIR to research labs.


===Applying FAIR-driven metadata schemes to laboratory informatics software development gives data a FAIRer chance at being ready for machine learning and artificial intelligence applications===
The third and final point for this Q&A article highlights another positive consequence of engineering laboratory informatics software with FAIR in mind: FAIRified research objects are much closer to being usable for the trending inclusion of [[machine learning]] (ML) and [[artificial intelligence]] (AI) tools in laboratory informatics platforms and other companion research software. By developing laboratory informatics software with a focus on FAIR-driven metadata and database schemes, not only are research objects more FAIR but also "cleaner" and more machine-ready for advanced analytical uses as with ML and AI.


[[File:Fig3-11 Liscouski LabInfo24.png|800px]]
To be sure, the FAIRness of any structured dataset alone is not enough to make it ready for ML and AI applications. Factors such as classification, completeness, context, correctness, duplicity, integrity, mislabeling, outliers, relevancy, sample size, and timeliness of the research object and its contents are also important to consider.<ref name="HinidumaDataRead24">{{Cite journal |last=Hiniduma |first=Kaveen |last2=Byna |first2=Suren |last3=Bez |first3=Jean Luca |date=2024 |title=Data Readiness for AI: A 360-Degree Survey |url=https://arxiv.org/abs/2404.05779 |journal=arXiv |doi=10.48550/ARXIV.2404.05779}}</ref><ref name="FletcherFAIRRe24">{{Cite journal |last=Fletcher |first=Lydia |date=2024-04-16 |others=The University Of Texas At Austin, The University Of Texas At Austin |title=FAIR Re-use: Implications for AI-Readiness |url=https://repositories.lib.utexas.edu/handle/2152/124873 |doi=10.26153/TSW/51475}}</ref> When those factors aren't appropriately addressed as part of a FAIRification effort towards AI readiness (as well as part of the development of research software of all types), research data and metadata have a higher likelihood of revealing themselves to be inconsistent. As such, searches and analytics using that data and metadata become muddled, and the ultimate ML or AI output will also be muddled (i.e., "garbage in, garbage out"). Whether retroactively updating existing research objects to a more FAIRified state or ensuring research objects (e.g., those originating in an ELN or LIMS) are more FAIR and AI-ready from the start, research software updating or generating those research objects has to address ontologies, data models, data types, identifiers, and more in a thorough yet flexible way.<ref name="OlsenEmbracing23">{{cite web |url=https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness |title=Embracing FAIR Data on the Path to AI-Readiness |author=Olsen, C. |work=Pharma's Almanac |date=01 September 2023 |accessdate=03 May 2024}}</ref>
{{clear}}
{|  
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-11.''' Lab operations with a LIMS in the general laboratory (left) vs. lab operations with an LIS in the clinical laboratory (right).</blockquote>
|-  
|}
|}


Let’s begin with the diagram on the right, a clinical system based on a LIS. On the lab bench there are a variety of instruments and data systems that communicate with the LIS, and the LIS in turn communicates with patient records and hospital administrative systems. That situation is repeated in most hospitals. The lab runs a standard suite of testing that may be expanding beyond traditional blood, urine analysis, and so on; Waters Corporation for example has introduced [[mass spectrometry]] systems for clinical laboratories.<ref name="WatersConq09">{{cite web |url=https://www.waters.com/waters/newsDetail.htm?locale=ko_KR&id=10112961 |title=Waters Conquers Mass Spectrometry Application Challenges with Eight New MassLynx Workflow Solutions |publisher=Waters Corporation |date=01 June 2009 |accessdate=05 April 2024}}</ref> This repeated model consists of a relatively limited range of procedures that is run in the labs (compared to the chemical industry, for example, which can use a wide range of procedures), with connections to a functionally common set of applications outside the lab, coupled by a key communications element created a unique ability to connect lab instruments to LIS in a manner that doesn’t exist outside of clinical systems. The ability to make those connections has changed the way lab work is done, and improved the economics of lab operations. For example<ref name=":0">{{Cite journal |last=Sarkozi |first=Laszlo |last2=Simson |first2=Elkin |last3=Ramanathan |first3=Lakshmi |date=2003-03 |title=The effects of total laboratory automation on the management of a clinical chemistry laboratory. Retrospective analysis of 36 years |url=https://linkinghub.elsevier.com/retrieve/pii/S0009898103000202 |journal=Clinica Chimica Acta |language=en |volume=329 |issue=1-2 |pages=89–94 |doi=10.1016/S0009-8981(03)00020-2}}</ref>:
Noting that Wilkinson ''et al.'' originally highlighted the importance of machine-readability of FAIR data, Huerta ''et al.'' add that that core principle of FAIRness "is synergistic with the rapid adoption and increased use of AI in research."<ref name="HuertaFAIRForAI23">{{Cite journal |last=Huerta |first=E. A. |last2=Blaiszik |first2=Ben |last3=Brinson |first3=L. Catherine |last4=Bouchard |first4=Kristofer E. |last5=Diaz |first5=Daniel |last6=Doglioni |first6=Caterina |last7=Duarte |first7=Javier M. |last8=Emani |first8=Murali |last9=Foster |first9=Ian |last10=Fox |first10=Geoffrey |last11=Harris |first11=Philip |date=2023-07-26 |title=FAIR for AI: An interdisciplinary and international community building perspective |url=https://www.nature.com/articles/s41597-023-02298-6 |journal=Scientific Data |language=en |volume=10 |issue=1 |pages=487 |doi=10.1038/s41597-023-02298-6 |issn=2052-4463 |pmc=PMC10372139 |pmid=37495591}}</ref> They go on to discuss the positive interactions of FAIR research objects with FAIR-driven, AI-based research. Among the benefits include<ref name="HuertaFAIRForAI23" />:


<blockquote>Between 1965 and 2000, the Consumer Price Index increased by a factor of 5.5 in the United States. During the same 36 years, at our institution's Chemistry Department the productivity (indicated as the number of reported test results/employee/year) increased from 10,600 to 104,558 (9.3-fold). When expressed in constant 1965 dollars, the total cost per test decreased from 0.79 dollars to 0.15 dollars.</blockquote>
*greater findability of FAIR research objects for further AI-driven scientific discovery;
*greater reproducibility of FAIR research objects and any AI models published with them;
*improved generalization of AI-driven medical research models when exposed to diverse and FAIR research objects;
*improved reporting of AI-driven research results using FAIRified research objects, lending further credibility to those results;
*more uniform comparison of AI models using well-defined hyperstructure and information training conditions from FAIRified research objects;
*more developed and interoperable "data e-infrastructure," which can further drive a more effective "AI services layer";
*reduced bias in AI-driven processes through the use of FAIR research objects and AI models; and
*improved surety of scientific correctness where reproducibility in AI-driven research can't be guaranteed.


That key communications element—the development of communications standards—was driven by the cost of lab operations and an industry-wide effort to address problems with lab economics. Clinical labs have the charges they can assess for work set by contracts. In the 1980s, it became clear that the costs of running a lab were going to exceed income. The problem was addressed by instituting successful “total laboratory automation” (TLA) programs. The programs were based not only on [[Laboratory automation|automation]] (which could have been done on a task/workstation level) but also upon integration of laboratory systems to achieve a higher level of performance that could not have been gained otherwise.<ref name=":0" /><ref>{{Cite journal |last=Young |first=D. S. |date=2000-05 |title=Laboratory automation: smart strategies and practical applications |url=https://pubmed.ncbi.nlm.nih.gov/10794771 |journal=Clinical Chemistry |volume=46 |issue=5 |pages=740–745 |issn=0009-9147 |pmid=10794771}}</ref><ref>{{Cite journal |last=Lam |first=Choong Weng |last2=Jacob |first2=Edward |date=2012-02 |title=Implementing a Laboratory Automation System: Experience of a Large Clinical Laboratory |url=https://linkinghub.elsevier.com/retrieve/pii/S2472630322016600 |journal=SLAS Technology |language=en |volume=17 |issue=1 |pages=16–23 |doi=10.1177/2211068211430186}}</ref><ref>{{Cite journal |last=Streitberg |first=George S. |last2=Bwititi |first2=Phillip T. |last3=Angel |first3=Lyndall |last4=Sikaris |first4=Kenneth A. |date=2009-04 |title=Automation and Expert Systems in a Core Clinical Chemistry Laboratory |url=http://journals.sagepub.com/doi/10.1016/j.jala.2008.12.001 |journal=Journal of the Association for Laboratory Automation |language=en |volume=14 |issue=2 |pages=94–105 |doi=10.1016/j.jala.2008.12.001}}</ref><ref>{{Cite journal |last=Zaninotto |first=Martina |last2=Plebani |first2=Mario |date=2010-07-01 |title=The “hospital central laboratory”: automation, integration and clinical usefulness |url=https://www.degruyter.com/document/doi/10.1515/CCLM.2010.192/html |journal=cclm |language=en |volume=48 |issue=7 |pages=911–917 |doi=10.1515/CCLM.2010.192 |issn=1437-4331}}</ref><ref>{{Cite journal |last=Clifford, L.J. |year=2013 |title=The ‘Smart’ LIS |url=https://www.elitelearning.com/resource-center/laboratory/the-smart-lis/ |journal=Advance / Laboratory |volume=22 |issue=10 |pages=20–25}}</ref><ref>{{Cite journal |last=Peck-Palmer |first=Octavia M. |date=2009-10 |title=Total lab automation takes teamwork |url=https://pubmed.ncbi.nlm.nih.gov/19891149 |journal=MLO: medical laboratory observer |volume=41 |issue=10 |pages=30, 32, 34 |issn=0580-7247 |pmid=19891149}}</ref>
In the end, developers of research software (whether discipline-specific research software or broader laboratory informatics solutions) would be advised to keep in mind the growing trends of FAIR research, FAIR software, and ML- and AI-driven research, especially in the [[life sciences]], but also a variety of other fields.<ref name="HuertaFAIRForAI23" />


Integration through communication standards like HL7, noted earlier, provides the key element in making automation capable of achieving its potential through error-free information interchange. Imagine the lab bench section of Figure 3-11 without the lines (thick black) connecting instrument systems to the communications service; all of the interactions between the instrument, instrument data systems, and LIS would be manual (i.e., read from here, type it in there). Integration through communications standards makes TLA work and provides instrument and computer systems vendors with a fundamental architectural infrastructure integration element that they can rely upon to make systems work together. This was an industry-driven initiative that was supported by vendors and provided a reliable mechanism for the electronic exchange of information.
===Restricted clinical data and its FAIRification for greater research innovation===
Broader discussion in the research community continues to occur in regards to how best to ethically make restricted or privacy-protected clinical data and information FAIR for greater innovation and, by extension, improved patient outcomes, particularly in the wake of the [[COVID-19]] [[pandemic]].<ref name="MaxwellFAIREthic23">{{Cite journal |last=Maxwell |first=Lauren |last2=Shreedhar |first2=Priya |last3=Dauga |first3=Delphine |last4=McQuilton |first4=Peter |last5=Terry |first5=Robert F |last6=Denisiuk |first6=Alisa |last7=Molnar-Gabor |first7=Fruzsina |last8=Saxena |first8=Abha |last9=Sansone |first9=Susanna-Assunta |date=2023-10 |title=FAIR, ethical, and coordinated data sharing for COVID-19 response: a scoping review and cross-sectional survey of COVID-19 data sharing platforms and registries |url=https://linkinghub.elsevier.com/retrieve/pii/S2589750023001292 |journal=The Lancet Digital Health |language=en |volume=5 |issue=10 |pages=e712–e736 |doi=10.1016/S2589-7500(23)00129-2 |pmc=PMC10552001 |pmid=37775189}}</ref><ref name="Queralt-RosinachApplying22">{{Cite journal |last=Queralt-Rosinach |first=Núria |last2=Kaliyaperumal |first2=Rajaram |last3=Bernabé |first3=César H. |last4=Long |first4=Qinqin |last5=Joosten |first5=Simone A. |last6=van der Wijk |first6=Henk Jan |last7=Flikkenschild |first7=Erik L.A. |last8=Burger |first8=Kees |last9=Jacobsen |first9=Annika |last10=Mons |first10=Barend |last11=Roos |first11=Marco |date=2022-12 |title=Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic |url=https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-022-00263-7 |journal=Journal of Biomedical Semantics |language=en |volume=13 |issue=1 |pages=12 |doi=10.1186/s13326-022-00263-7 |issn=2041-1480 |pmc=PMC9036506 |pmid=35468846}}</ref><ref>{{Cite journal |last=Martínez-García |first=Alicia |last2=Alvarez-Romero |first2=Celia |last3=Román-Villarán |first3=Esther |last4=Bernabeu-Wittel |first4=Máximo |last5=Luis Parra-Calderón |first5=Carlos |date=2023-05 |title=FAIR principles to improve the impact on health research management outcomes |url=https://linkinghub.elsevier.com/retrieve/pii/S2405844023029407 |journal=Heliyon |language=en |volume=9 |issue=5 |pages=e15733 |doi=10.1016/j.heliyon.2023.e15733 |pmc=PMC10189186 |pmid=37205991}}</ref> (Note that while there are other types of restricted and privacy-protected data, this section will focus largely on clinical data and research objects as the most obvious type.)


Note that when we talk about “communications,” we are covering more than the plug in the back of a device. For communications to occur, we need a means of:
These efforts have usually revolved around pulling reusable clinical patient or research data from [[hospital information system]]s (HIS), [[electronic medical record]]s (EMRs), [[clinical trial management system]]s (CTMSs), and research databases (often relational in nature) that either contain de-identified data or can de-identify aspects of data and information before access and extraction. Sometimes that clinical data or research object may have already in part been FAIRified, but often it may not be. In all cases, the concepts of privacy, security, and anonymization come up as part of any desire to gain access to that clinical material. However, any FAIRified clinical data isn't necessarily readily open for access. As Snoeijer ''et al.'' note: "The authors of the FAIR principles, however, clearly indicate that 'accessible' does not mean open. It means that clarity and transparency is required around the conditions governing access and reuse."<ref name="SnoeijerProcess19">{{cite book |url=https://phuse.s3.eu-central-1.amazonaws.com/Archive/2019/Connect/EU/Amsterdam/PAP_SA04.pdf |format=PDF |chapter=Paper SA04 - Processing big data from multiple sources |title=Proceedings of PHUSE Connect EU 2019 |author=Snoeijer, B.; Pasapula, V.; Covucci, A. et al. |publisher=PHUSE Limited |year=2019 |accessdate=03 May 2024}}</ref>


* Transferring messages – Cables (e.g., Ethernet, serial RS-232, USB), wi-fi, and so on are just the pathways for messages to travel, and aside from serial connections, the capability of managing multiple messages from different devices to each other with error detection and correction is also important. Serial communications, unless it is part of a networked protocol like TCP/IP, lack that capability without additional programming.
This is being mentioned in the context of laboratory informatics applications for a couple of reasons. First, a well-designed commercial LIMS that supports clinical research laboratory workflows is already going to address privacy and security aspects, as part of the developer recognizing the need for those labs to adhere to regulations such as the [[Health Insurance Portability and Accountability Act]] (HIPAA) and comply with standards such as [[ISO 15189]]. However, such a system may not have been developed with FAIR data principles in mind, and any built-in metadata and ontology schemes may be insufficient for full FAIRification of laboratory-based clinical trial research objects. As Queralt-Rosinach ''et al.'' note, however, "interestingly, ontologies may also be used to describe data access restrictions to complement FAIR metadata with information that supports data safety and patient privacy."<ref name="Queralt-RosinachApplying22" /> Essentially, the authors are suggesting that while a HIS or LIS may have built-in access management tools, setting up ontologies and metadata mechanisms that link privacy aspects of a research object (e.g., "has consent form for," "is de-identified," etc.) to the object's metadata allows for even more flexible, FAIR-driven approaches to privacy and security. Research software developers creating such information management tools for the regulated clinical research space may want to apply FAIR concepts such as this to how access control and privacy restrictions are managed. This will inevitably mean any research objects exported with machine-readable privacy-concerning metadata will be more reusable in a way that still "supports data safety and patient privacy."<ref name="Queralt-RosinachApplying22" />
* Understanding the structure of a message – What parts are message content (the material you want to communicate) and what parts are control elements to facilitate communications?


Think of it as part of the postal system. The postal system gets mail from one place to another using different transportation methods. The envelope holds the routing information, and inside is the actual message you want to transmit. You also have to have the ability to understand the message, the language used, the format and so on. (Different instruments of the same type will use different command structures and message formats, and a device receiving a message in the wrong format—akin to getting mail in a language you don’t understand—will lead at best to an error message return or an ignored message. At worst, it may interpret the message and do something you would have preferred it didn’t.) If any of those elements are lacking, communication doesn’t occur.  
Second, a well-designed research software solution working with clinical data will provide not only support for open, community-supported data models and vocabularies for clinical data, but also standardized community-driven ontologies that are specifically developed for access control and privacy. Queralt-Rosinach ''et al.'' continue<ref name="Queralt-RosinachApplying22" />:


Some devices cannot incorporate sophisticated communications protocols such as HL7; they have network connections but not higher-level industry-specific software. In those cases, “[[middleware]]” is used (Figure 3-12). Middleware is software (see Figure 3-10) that acts like a translator that interprets communication standard protocols/messages for devices that can’t handle them. Middleware is bi-directional so communications can occur in both directions (device to HL7 capable software and back).  
<blockquote>Also, very important for accessibility and data privacy is that the digital objects ''per se'' can accommodate the criteria and protocols necessary to comply with regulatory and governance frameworks. Ontologies can aid in opening and protecting patient data by exposing logical definitions of data use conditions. Indeed, there are ontologies to define access and reuse conditions for patient data such as the Informed Consent Ontology (ICO), the Global Alliance for Genomics and Health Data Use Ontology (DUO) standard, and the Open Digital Rights Language (ODRL) vocabulary recommended by W3C.</blockquote>


Also of note here is the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and its OHDSI standardized vocabularies. In all these cases, a developer-driven approach to research software that incorporates community-driven standards that support FAIR principles is welcome. However, as Maxwell ''et al.'' noted in their ''Lancet'' review article in late 2023, "few platforms or registries applied community-developed standards for participant-level data, further restricting the interoperability of ... data-sharing initiatives [like FAIR]."<ref name="MaxwellFAIREthic23" /> As the FAIR principles continue to gain ground in clinical research and diagnostics settings, software developers will need to be more attuned to translating old ways of development to ones that incorporate FAIR data and software principles. Demand for FAIR data will only continue to grow, and any efforts to improve interoperability and reusability while honoring (and enhancing) privacy and security aspects of restricted data will be appreciated by clinical researchers. However, just as FAIR is not an overall goal for researchers, software built with FAIR principles in mind is not the end point of research organizations managing restricted and privacy-protected research objects. Ultimately, those organizations will have make other considerations about restricted data in the scope of FAIR, including addressing data management plans, data use agreements, disclosure review practices, and training as it applies to their research software and generated research objects.<ref>{{Cite journal |last=Jang |first=Joy Bohyun |last2=Pienta |first2=Amy |last3=Levenstein |first3=Margaret |last4=Saul |first4=Joe |date=2023-12-06 |title=Restricted data management: the current practice and the future |url=https://journalprivacyconfidentiality.org/index.php/jpc/article/view/844 |journal=Journal of Privacy and Confidentiality |volume=13 |issue=2 |doi=10.29012/jpc.844 |issn=2575-8527 |pmc=PMC10956935 |pmid=38515607}}</ref>


[[File:Fig3-12 Liscouski LabInfo24.png|650px]]
==Conclusion==
{{clear}}
Laboratory informatics developers will also need to remember that FAIRification of research in itself is not a goal for research laboratories; it is a continual process that recognizes improved scientific research and greater innovation as a more likely outcome.<ref name="WilkinsonTheFAIR16" /><ref name="OlsenEmbracing23" /><ref name="HuertaFAIRForAI23" />
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="650px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-12.''' Middleware in LIS to instrument connections.</blockquote>
|-
|}
|}
 
Middleware serves the same function in LIMS to instrument connections; however, the LIMS-instrument communications protocol (layers above the network interface) is up to the vendor.
 
===Moving outside of healthcare===
Once you move from the HL7 support of the clinical laboratory to the industrial research and testing labs, things are a lot different. The left side of Figure 3-9 shows the organizational environment that a typical lab might encounter. The major difference between the left and right sides of the illustration is that while the right is well characterized, the left is generalized: products from the lab bench on up through the department and inter-department levels might be communicating with anything. The lab-bench-to-LIMS options are wide open (including in-house / contractor-developed systems), and the LIMS to the world outside the lab may be taken to the warehouse, accounting, process control, etc. systems from a variety of sources.
 
Initially, not knowing what a product might be asked to communicate with left the vendors with few options. The one taken from the instrument/data system level was to produce files plus a means of sending them and letting them go at that, i.e., let the user take it from there. Most users became acutely aware of instrument/data system communications when they wanted to install a LIMS and put pressure on the LIMS vendors to connect with lab equipment; it was either have the vendor do the work or use internal/contract resources. Lacking any drive toward industry standards for communication and integration, each vendor approached the issue in their own way. As a result, installing a LIMS without instrument connections provides one level of difficulty, while adding links to instrument data systems and robotics controllers substantially raises the level of complexity, cost, and effort.
 
Some LIMS vendors provide a suite of instrument/data system interfaces for their products, a list of devices and systems with software to support command/control and data interchange. The software is supported by the vendor, and they may also provide [[application programming interface]]s (APIs) for situations where you want to connect something that isn’t part of their offering. If you are interested in connecting a lower-cost device (e.g., balance, pH meter, etc.) and the vendor supports a model different than the one you have, it may be less expensive to buy a compatible one rather than trying to interface the one on your bench; we’ll get to why in a moment.
 
Other LIMS vendors provide little support for external devices. These are often low-cost systems that are attractive due to their low initial investment but may be costly when faced with support and customization work.
 
Interfacing instruments/data systems to LIMS (or any informatics software for that matter) is costly for the following reasons:
 
* It involves programming, and for each interfaced device that means functional requirements, planning the programming, carrying out the work, testing, and so on; each one is a project with all the baggage that comes with it, and it has to meet regulatory guidelines. This also applies to efforts to adapt interfaces for a similar device supported by the vendor to the one you are using.
* Upgrades happen. Don’t be surprised to find out that the interface you had constructed doesn’t work after the upgrade. The problems may be minor, or they may require a re-implementation of the work. You need to make sure that you have the support and maintenance documentation in place for the initial project so that you can take care of problems. Some programmers will do whatever they need to so that something “works.” There is a difference between that and properly engineered software: the latter is usually more expensive at first but pays for itself when upgrades or modifications are needed.
* The more devices you interface with, the more you are locking yourself into a particular vendor’s product set. If you’ve put a lot of effort into connecting the devices in your lab to a particular LIMS and a new/better product comes along, are you going to reinvest that effort to change products? This is one reason why extensive planning is needed for lab systems before you shop for them; they hold your knowledge, information, and data, the change is costly, and a poorly planned start can leave you at a dead end. You may hear claims that change is “just a small matter of software”; given the amount of work in making software changes, there is nothing “small” about it.
 
Communication standards have a significant impact on your lab's ability to work, and it is something the industry needs to address.
 
At a minimum, when installing a LIMS or LIS, you can expect a fair amount of work before it goes into operation. Information specific to your lab has to be entered: test information, methods of identifying samples, support for barcodes, personnel information, materials used, etc., (refer to the ASTM E1578-13 standard for details). The software is a blank slate and has to be initialized before it becomes useful. You will also have to face the full set of requirements for systems qualification and validation to ensure that the software is ready for use and meets regulatory requirements.
 
Those validation / regulatory requirements are there for your benefit. You are going to put a tool in place that is going to manage your lab's ability to work, to know what work has to be done, and to send and receive information from instruments and data systems. Being able to prove that it is ready for the work will give you the confidence needed to trust its ability to support your lab.
 
People also have to be educated in the systems used, and the need for education will be continual as new people come into the lab and changes are made to the system. You want to make sure that people are ready to use the system, and that they have confidence in themselves and their ability to work with the software. Fear of failure, and making mistakes that can’t be erased (regulatory considerations) is a serious concern when introducing new software into a laboratory. Classroom training is one thing, but there is nothing like hands-on experience to solidify what they’ve heard elsewhere. One recommended approach is to have available two versions of the system: one is the actual "live" system used for production work, and the other is a training system that people can learn on, make mistakes, try things out, etc.; it will remove the fear of working with something new by letting them practice on software that can absorb mistakes without any consequences. This educational system can also act as a test bed for new ideas, software, and instrument interfaces. Having two copies of the system may have an impact on license costs, so make this part of the system negotiations with the vendor. (Note: Some IT entities do not understand the need to have a practice system. It is necessary to ensure a smooth transition from one LIMS to another. It is also useful to run the old system and the new LIMS side by side to develop production history.{{Efn|Source: private communications with Charlotte Layton, American Refining Group, Inc., on December 2, 2014.}})
 
===Options for LIMS implementations===
====Enterprise resource planning systems as a LIMS replacement====
A few pages back we noted that from an operational point of view, service labs' operational functions were similar to operations in other service businesses. That point hasn’t been lost on [[enterprise resource planning]] (ERP) vendors; ERP software is widely used in integrating and managing business operations. Several packages have quality management (QM) modules that help companies keep track of quality issues, control charts, data management, instrument calibration, product inspection, audit management, etc. That said, the question that is raised is “Can this software be adapted to meet the functions provided by a LIMS?” You will not be too shocked to find that ERP vendors say “yes,” a view that is supported by several IT consultants; just search “LIMS ERP” in your favorite search engine. The champions of the ERP-as-a-LIMS view often come from corporate functions (often referred to as the C-suite). They see:
 
* Lab operations as matching the behavior of other service operations;
* An opportunity to reduce expenditures by avoiding the need to purchase licenses for new software when a “satisfactory” application can be constructed on existing platforms;
* Corporate IT organizations favoring it (particularly in non-science-driven companies) because it avoids having to learn and support another complex package, which would drive up the cost of their support operations that may already be stressing the budget;
* The ability to integrate lab operations and information with the rest of the company by having all operations dependent on a common software platform, avoiding duplication of information storage (the LIMS-ERP discussion is also a LIS-ERP discussion; see LIS-ERP below). This also raises the questions of whether you want a company dependent upon a single software vendor, and, whether you want your informatics solutions from a single vendor or an integrated collection of best-of-breed systems. However, labs are already dependent on a single-source vendor for their operating system (various versions of Windows), though some labs have introduced Linux, Unix, and MAC OS X as part of instrument packages and office products.
 
ERP vendors support the position, as do some stand-alone LES vendors who see LIMS as competition for their market space (the LES-ERP combination would replace the need for LIMS, as they see it). They do have a perfectly valid point: how it applies to you depends on how your lab operates and wants to operate. As we’ve noted before, the key to evaluating the discussion with respect to your lab depends on thorough technology planning and, if you are considering a LIMS, well-defined and researched product requirements documents.
 
The ERP-as-a-LIMS concept fits the capabilities of 1970s and early 1980s LIMS systems, as well as many of the low-cost modern offerings. However, several issues need to be evaluated, and some are diagrammed in Figure 3-13.
 
 
[[File:Fig3-13 Liscouski LabInfo24.png|631px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="631px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-13.''' LIMS functions to consider.</blockquote>
|-
|}
|}
 
The first is how sample information is entered into the system (i.e., descriptive information about samples; in LIS it has to be coordinated with patient information records, which may be referred to as sample meta-data). If it is a manual entry done directly in the lab or via remote access, one sample at a time, either ERP or traditional LIMS will suffice. A problem could occur with ERP systems if support was needed for stability or formulations testing, or other specialized analysis sets. LIMS can handle these, ERP systems may have them or they would have to be developed.
 
Handling the results of an analysis is another issue. Entering test information is not always as simple as doing the test, and entering the result; there are variations in the process that depend on how the lab operates and how tests/assays are performed:
 
* Some may come directly from an instrument data system in the form of a file that is sent to the LIMS; the file is parsed and the information is stored in the appropriate locations.
* Others may come in the form of worklists that were generated by the LIMS and sent to the data system. This requires bi-directional communication between the LIMS and the data systems. These are bulk transactions in which a file is generated, sent, and another file is received and processed.
* There are also interactive information-element-at-a-time transactions. Working with balances and pH meters are common examples where the LIMS would send commands to the device, get a response, and make a decision on what to do next. A balance used to count parts is a good illustration: it requires several steps to calibrate the measurements and then perform the count (the process involves weighing several parts to get an average weight per part, then weighing the test group; along the way containers have to be weighed and the balance tared).
 
These are tasks that LIMS are designed to do as a result of decades of experience. ERP-LIMS will require programming to accomplish the same work or use an intermediary package such as a LES or ELN.
 
Another area of concern is the interactions between the LIMS / EPR-LIMS and other informatics products/functions such as SDMS, ELN, LES, and laboratory inventory management. LIMS vendors, particularly those that provide one or more of these products, provide interfaces for most of these functions.
 
In order to make an ERP-LIMS work, some software development is going to have to be done, even if it is just a matter of bringing components together with a workable user interface on a manual transaction processing level. As we move toward more sophisticated laboratory operations, the extent of the work required increases substantially. This raises a simple question: is the group that is going to take on the work up to the task? This isn’t a challenge to competency; it would be the same question if the work was done by LAB-IT professionals. The issue comes from the demands on IT organizations by corporations looking for support. We are talking about building and supporting a sophisticated applications system that will require development/upgrade trouble-shooting support for the life of the product. Will competition for resources by other parts of the company push the development of the application further and further behind schedule? This is a consideration that needs to be addressed before the project begins. There have been suggestions that ELN, LES, and other third-party products could be used to solve the instrument connection problem<ref name="AdminBattle11">{{cite web |url=https://www.labnews.co.uk/features/battle-of-the-information-management-systems/ |archiveurl=https://web.archive.org/web/20111018192455/https://www.labnews.co.uk/features/battle-of-the-information-management-systems/ |title=Battle of the Information Management Systems |author=Admin |work=Laboratory News |date=24 March 2011 |archivedate=18 October 2011 |accessdate=05 April 2024}}</ref>, and that would work, but at the expense of adding the additional products. If they were already being considered for purchase, it is a moot point.
 
There are two additional items to note:
 
#While the EPR-based LIMS / LIS is being implemented, commercial systems developers are going to be improving their products; is your internal project going to be play a game of catch-up or just falling farther behind other products?
#What are you going to be using to meet your lab's needs while development is underway? Will the cost of manual lab operations overshadow any cost savings derived from an internal program?
 
The topic of LIS-ERP brings up an interesting point: using a common underlying software structure would improve information integration across the enterprise, and it would. However, we do have to be careful that using that common software basis, and giving easier programming access to laboratory data, doesn’t subvert the data review and approval process that is part of service laboratory work.
 
The availability of an ERP-based LIMS does provide another avenue for obtaining the systems support service laboratories need. Information integration can be solved by common underlying products and by effective information interchange. Evaluating the ERP-base LIMS option, as well as evaluating traditional LIMS, has to be based upon objectively developed product requirements, and those should include short- and long-term needs as well as support requirements.
 
====Software as a service (SaaS) LIMS====
(Note: When this piece was first written in 2014, [[Software as a service|SaaS]] was beginning to make serious inroads into laboratory computing. At that point, there were concerns about the viability of the infrastructure which has since been put to rest. Today in 2024, SaaS definitely has an established role in laboratory informatics. However, the issues of climate change and security have become a reality. The technology works, but planning for failures in some geographical regions subject to severe weather is needed.)
 
In September of 2013, David Pogue<ref name="PogueAdobe13">{{cite web |url=https://www.scientificamerican.com/article/adobe-software-subscription-model-means-you-cant-own-your-software/ |title=Adobe’s Software Subscription Model Means You Can’t Own Your Software |author=Pogue, D. |work=Scientific American |date=01 October 2013 |accessdate=05 April 2024}}</ref><ref name="PogueIHave13">{{cite web |url=https://www.scientificamerican.com/article/pogue-i-have-to-rent-my-software-now-how-does-that-work/ |title=I Have to Rent My Software Now? How Does That Work? |author=Pogue, D. |work=Scientific American |date=01 October 2013 |accessdate=05 April 2024}}</ref> wrote an article about a shift in the funding model for some well-known software packages. Both Microsoft and Adobe have adopted a subscription-based funding model resulting in your renting software (Adobe’s Creative Cloud, and Microsoft Office) instead of paying a one-time license fee (one time for that version, upgrades may be an additional charge). While the monthly fee isn’t significant compared to the cost of licenses for add-on software, the fees aren’t fixed, and over time can become an issue. There are some additional points. First, the consumer can become a live beta-tester for software without being aware of it as vendors update products at will instead of monthly or annually. Secondly, if you stop the subscriptions, you can lose access to the files you developed with those applications; the files will still be there, but you won’t be able to work with them.
 
The reasons behind this change in policy are about revenue streams. When desktop computing was developing and sales of systems were moving at a rapid pace, there was a substantial income stream from new users and those eager to keep software up-to-date because of new features. The downturn in the economy, a drop in people upgrading systems software, and a shift from desktop to tablet computing have moved vendors to look for new ways of supporting software development. The same issues are affecting vendors of laboratory informatics systems, who also see customers reluctant to pay the up-front cost of hardware and software, and long-term IT support costs. If they could lower the cost of access to the software, eliminate the costs of hardware and installation, and relieve the stress on IT support, more labs would take advantage of what they had to offer. SaaS could provide a workable solution.<ref name="MullinLIMS10">{{cite web |url=https://cen.acs.org/articles/88/i21/LIMS-Cloud.html |title=LIMS in the Cloud |author=Mullin, R. |work=Chemical & Engineering News |date=24 May 2010 |accessdate=05 April 2024}}</ref>
 
The basic software offering is the same as an on-site offering. There is no hardware cost for hosting the LIMS, no installation cost since the vendor is hosting the software on their remote system, no upgrade fees or effort, and as long as we are focusing on LIMS transactions alone, no IT support requirements. The time frame from the start to the working system depends upon the work involved in initializing the system (e.g., entering tests, samples, user information) and educating people on the products use. You have access to a fully functional LIMS, at least as far as lab operations and workflow are concerned. From the standpoint of the models we’ve been using, the situation looks like this:
 
 
[[File:Fig3-14 Liscouski LabInfo24.png|666px]]
{{clear}}
{|
| style="vertical-align:top;" |
{| border="0" cellpadding="5" cellspacing="0" width="666px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3-14.''' Service lab with SaaS LIMS..</blockquote>
|-
|}
|}
==Footnotes==
{{reflist|group=lower-alpha}}
 
==About the author==
Initially educated as a chemist, author Joe Liscouski (joe dot liscouski at gmail dot com) is an experienced laboratory automation/computing professional with over forty years of experience in the field, including the design and development of automation systems (both custom and commercial systems), LIMS, robotics and data interchange standards. He also consults on the use of computing in laboratory work. He has held symposia on validation and presented technical material and short courses on laboratory automation and computing in the U.S., Europe, and Japan. He has worked/consulted in pharmaceutical, biotech, polymer, medical, and government laboratories. His current work centers on working with companies to establish planning programs for lab systems, developing effective support groups, and helping people with the application of automation and information technologies in research and quality control environments.


==References==
==References==
{{Reflist|colwidth=30em}}
{{Reflist|colwidth=30em}}
<!---Place all category tags here-->
<!---Place all category tags here-->

Latest revision as of 13:29, 13 May 2024

Sandbox begins below

FAIRResourcesGraphic AustralianResearchDataCommons 2018.png

Title: What are the potential implications of the FAIR data principles to laboratory informatics applications?

Author for citation: Shawn E. Douglas

License for content: Creative Commons Attribution-ShareAlike 4.0 International

Publication date: May 2024

Introduction

https://www.limswiki.org/index.php/Journal:Infrastructure_tools_to_support_an_effective_radiation_oncology_learning_health_system

This brief topical article will examine

The "FAIR-ification" of research objects and software

First discussed during a 2014 FORCE-11 workshop dedicated to "overcoming data discovery and reuse obstacles," the FAIR data principles were published by Wilkinson et al. in 2016 as a stakeholder collaboration driven to see research "objects" (i.e., research data and information of all shapes and formats) become more universally findable, accessible, interoperable, and reusable (FAIR) by both machines and people.[1] The authors released the FAIR principles while recognizing that "one of the grand challenges of data-intensive science ... is to improve knowledge discovery through assisting both humans and their computational agents in the discovery of, access to, and integration and analysis of task-appropriate scientific data and other scholarly digital objects."[1]

Since 2016, other research stakeholders have taken to publishing their thoughts about how the FAIR principles apply to their fields of study and practice[2], including in ways beyond what perhaps was originally imagined by Wilkinson et al.. For example, multiple authors have examined whether or not the software used in scientific endeavors itself can be considered a research object worth being developed and managed in tandem with the FAIR data principles.[3][4][5][6][7] Researchers quickly recognized that any planning around updating processes and systems to make research objects more FAIR would have to be tailored to specific research contexts, recognize that digital research objects go beyond data and information, and recognize "the specific nature of software" and not consider it "just data."[4] The end result has been applying the core concepts of FAIR but differently from data, with the added context of research software being more than just data, requiring more nuance and a different type of planning from applying FAIR to digital data and information.

A 2019 survey by Europe's FAIRsFAIR found that researchers seeking and re-using relevant research software on the internet faced multiple challenges, including understanding and/or maintaining the necessary software environment and its dependencies, finding sufficient documentation, struggling with accessibility and licensing issues, having the time and skills to install and/or use the software, finding quality control of the source code lacking, and having an insufficient (or non-existent) software sustainability and management plan.[4] These challenges highlight the importance of software to researchers and other stakeholders, and the roll FAIR has in better ensuring such software is findable, interoperable, and reusable, which in turn better ensures researchers' software-driven research is repeatable (by the same research team, with the same experimental setup), reproducible (by a different research team, with the same experimental setup), and replicable (by a different research team, with a different experimental setup).[4]

At this point, the topic of what "research software" represents must be addressed further, and, unsurprisingly, it's not straightforward. Ask 20 researchers what "research software" is, and you may get 20 different opinions. Some definitions can be more objectively viewed as too narrow, while others may be viewed as too broad, with some level of controversy inherent in any mutual discussion.[8][9][10] In 2021, as part of the FAIRsFAIR initiative, Gruenpeter et al. made a good-faith effort to define "research software" with the feedback of multiple stakeholders. Their efforts resulted in this definition[8]:

Research software includes source code files, algorithms, scripts, computational workflows, and executables that were created during the research process, or for a research purpose. Software components (e.g., operating systems, libraries, dependencies, packages, scripts, etc.) that are used for research but were not created during, or with a clear research intent, should be considered "software [used] in research" and not research software. This differentiation may vary between disciplines. The minimal requirement for achieving computational reproducibility is that all the computational components (i.e., research software, software used in research, documentation, and hardware) used during the research are identified, described, and made accessible to the extent that is possible.

Note that while the definition primarily recognizes software created during the research process, software created (whether by the research group, other open-source software developers outside the organization, or even commercial software developers) "for a research purpose" outside the actual research process is also recognized as research software. This notably can lead to disagreement about whether a proprietary, commercial spreadsheet or laboratory information management system (LIMS) offering that conducts analyses and visualizations of research data can genuinely be called research software, or simply classified as software used in research. van Nieuwpoort and Katz further elaborated on this concept, at least indirectly, by formally defining the roles of research software in 2023. Their definition of the various roles of research software—without using terms such as "open-source," "commercial," or "proprietary"—essentially further defined what research software is[10]:

  • Research software is a component of our instruments.
  • Research software is the instrument.
  • Research software analyzes research data.
  • Research software presents research results.
  • Research software assembles or integrates existing components into a working whole.
  • Research software is infrastructure or an underlying tool.
  • Research software facilitates distinctively research-oriented collaboration.

When considering these definitions[8][10] of research software and their adoption by other entities[11], it would appear that at least in part some laboratory informatics software—whether open-source or commercially proprietary—fills these roles in academic, military, and industry research laboratories of many types. In particular, electronic laboratory notebooks (ELNs) like open-source Jupyter Notebook or proprietary ELNs from commercial software developers fill the role of analyzing and visualizing research data, including developing molecular models for new promising research routes.[10] Even more advanced LIMS solutions that go beyond simply collating, auditing, securing, and reporting analytical results could conceivably fall under the umbrella of research software, particularly if many of the analytical, integration, and collaboration tools required in modern research facilities are included in the LIMS.

Ultimately, assuming that some laboratory informatics software can be considered research software and not just "software used in research," it's tough not to arrive at some deeper implications of research organizations' increasing need for FAIR data objects and software, particularly for laboratory informatics software and the developers of it.

Implications of the FAIR concept to laboratory informatics software

The global FAIR initiative affects, and even benefits, commercial laboratory informatics research software developers as much as it does academic and institutional ones

To be clear, there is undoubtedly a difference in the software development approach of "homegrown" research software by academics and institutions, and the more streamlined and experienced approach of commercial software development houses as applied to research software. Moynihan of Invenia Technical Computing described the difference in software development approaches thusly in 2020, while discussing the concept of "research software engineering"[12]:

Since the environment and incentives around building academic research software are very different to those of industry, the workflows around the former are, in general, not guided by the same engineering practices that are valued in the latter. That is to say: there is a difference between what is important in writing software for research, and for a user-focused software product. Academic research software prioritizes scientific correctness and flexibility to experiment above all else in pursuit of the researchers’ end product: published papers. Industry software, on the other hand, prioritizes maintainability, robustness, and testing, as the software (generally speaking) is the product. However, the two tracks share many common goals as well, such as catering to “users” [and] emphasizing performance and reproducibility, but most importantly both ventures are collaborative. Arguably then, both sets of principles are needed to write and maintain high-quality research software.

This brings us to our first point: the application of small-scale, FAIR-driven academic research software engineering practices and elements to the larger development of more commercial laboratory informatics software, and vice versa with the application of commercial-scale development practices to small FAIR-focused academic and institutional research software engineering efforts, has the potential to help better support all research laboratories using both independently-developed and commercial research software.

The concept of the research software engineer (RSE) began to take full form in 2012, and since then universities and institutions of many types have formally developed their own RSE groups and academic programs.[13][14][15] RSEs range from pure software developers with little knowledge of a given research discipline, to scientific researchers just beginning to learn how to develop software for their research project(s). While in the past, broadly speaking, researchers often cobbled together research software with less a focus on quality and reproducibility and more on getting their research published, today's push for FAIR data and software by academic journals, institutions, and other researchers seeking to collaborate has placed a much greater focus on the concept of "better software, better research."[13][16] Elaborating on that concept, Cohen et al. add that "ultimately, good research software can make the difference between valid, sustainable, reproducible research outputs and short-lived, potentially unreliable or erroneous outputs."[16]

The concept of software quality management (SQM) has traditionally not been lost on professional, commercial software development businesses. Good SQM practices have been less prevalent in homegrown research software development; however, the expanded adoption of FAIR data and FAIR software approaches has shifted the focus on to the repeatability, reproducibility, and interoperability of research results and data produced by a more sustainable research software. The adoption of FAIR by academic and institutional research labs not only brings commercial SQM and other software development approaches into their workflow, but also gives commercial laboratory informatics software developers an opportunity to embrace many aspects of the FAIR approach to laboratory research practices, including lessons learned and development practices from the growing number of RSEs. This doesn't mean commercial developers are going to suddenly take an open-source approach to their code, and it doesn't mean academic and institutional research labs are going to give up the benefits of the open-source paradigm as applied to research software.[17] However, as Moynihan noted, both research software development paradigms stand to gain from the shift to more FAIR data and software.[12] Additionally, if commercial laboratory informatics vendors want to continue to competitively market relevant and sustainable research software to research labs, they frankly have little choice but to commit extra resources to learning about the application of FAIR principles to their offerings tailored to those labs.

The focus on data types and metadata within the scope of FAIR is shifting how laboratory informatics software developers and RSEs make their research software and choose their database approaches

Close to the core of any deep discussion of the FAIR data principles are the concepts of data models, data types, metadata, and persistent unique identifiers (PIDs). Making research objects more findable, accessible, interoperable, and reusable is no easy task when data types and approaches to metadata assignment (if there even is such an approach) are widely differing and inconsistent. Metadata is a means for better storing and characterizing research objects for the purposes of ensuring provenance and reproducibility of those research objects.[18][19] This means as early as possible implementing a software-based approach that is FAIR-driven, capturing FAIR metadata using flexible domain-driven ontologies (i.e., controlled vocabularies) at the source and cleaning up old research objects that aren't FAIR-ready while also limiting hindrances to research processes as much as possible.[19] And that approach must value the importance of metadata and PIDs. As Weigel et al. note in a discussion on making laboratory data and workflows more machine-findable: "Metadata capture must be highly automated and reliable, both in terms of technical reliability and ensured metadata quality. This requires an approach that may be very different from established procedures."[20] Enter non-relational RDF knowledge graph databases.

This brings us to our second point: given the importance of metadata and PIDs to FAIRifying research objects (and even research software), established, more traditional research software development methods using common relational databases may not be enough, even for commercial laboratory informatics software developers. Non-relational Resource Description Framework (RDF) knowledge graph databases used in FAIR-driven, well-designed laboratory informatics software help make research objects more FAIR for all research labs.

Research objects can take many forms (i.e., data types), making the storage and management of those objects challenging, particularly in research settings with great diversity of data, as with materials research. Some have approached this challenge by combining different database and systems technologies that are best suited for each data type.[21] However, while query performance and storage footprint improves with this approach, data across the different storage mechanisms typically remains unlinked and non-compliant with FAIR principles. Here, either a full RDF knowledge graph database or similar integration layer is required to better make the research objects more interoperable and reusable, whether it's materials records or specimen data.[21][22]

It is beyond the scope of this Q&A article to discuss RDF knowledge graph databases at length. (For a deeper dive on this topic, see Rocca-Serra et al. and the FAIR Cookbook.[23]) However, know that the primary strength of these databases to FAIRification of research objects is their ability to provide semantic transparency (i.e., provide a framework for better understanding and reusing the greater research object through basic examination of the relationships of its associated metadata and their constituents), making these objects more easily accessible, interoperable, and machine-readable.[21] The resulting knowledge graphs, with their "subject-property-object" syntax and PIDs or uniform resource identifiers (URIs) helping to link data, metadata, ontology classes, and more, can be interpreted, searched, and linked by machines, and made human-readable, resulting in better research through derivation of new knowledge from the existing research objects. The end result is a representation of heterogeneous data and metadata that complies with the FAIR guiding principles.[21][22][23][24][25][26] This concept can even be extended to post factum visualizations of the knowledge graph data[25], as well as the FAIR management of computational laboratory workflows.[27]

While rare, some commercial laboratory informatics vendors like Semaphore Solutions have already recognized the potential of RDF knowledge graph databases to FAIR-driven laboratory research, having implemented such structures into their offerings.[24] (The use of knowledge graphs has already been demonstrated in academic research software, such as with the ELN tools developed by RSEs at the University of Rostock and University of Amsterdam.[28]) As noted in the prior point, it is potentially advantageous to not only laboratory informatics vendors to provide but also research labs to use relevant and sustainable research software that has the FAIR principles embedded in the software's design. Turning to knowledge graph databases is another example of keeping such software relevant and FAIR to research labs.

Applying FAIR-driven metadata schemes to laboratory informatics software development gives data a FAIRer chance at being ready for machine learning and artificial intelligence applications

The third and final point for this Q&A article highlights another positive consequence of engineering laboratory informatics software with FAIR in mind: FAIRified research objects are much closer to being usable for the trending inclusion of machine learning (ML) and artificial intelligence (AI) tools in laboratory informatics platforms and other companion research software. By developing laboratory informatics software with a focus on FAIR-driven metadata and database schemes, not only are research objects more FAIR but also "cleaner" and more machine-ready for advanced analytical uses as with ML and AI.

To be sure, the FAIRness of any structured dataset alone is not enough to make it ready for ML and AI applications. Factors such as classification, completeness, context, correctness, duplicity, integrity, mislabeling, outliers, relevancy, sample size, and timeliness of the research object and its contents are also important to consider.[29][30] When those factors aren't appropriately addressed as part of a FAIRification effort towards AI readiness (as well as part of the development of research software of all types), research data and metadata have a higher likelihood of revealing themselves to be inconsistent. As such, searches and analytics using that data and metadata become muddled, and the ultimate ML or AI output will also be muddled (i.e., "garbage in, garbage out"). Whether retroactively updating existing research objects to a more FAIRified state or ensuring research objects (e.g., those originating in an ELN or LIMS) are more FAIR and AI-ready from the start, research software updating or generating those research objects has to address ontologies, data models, data types, identifiers, and more in a thorough yet flexible way.[31]

Noting that Wilkinson et al. originally highlighted the importance of machine-readability of FAIR data, Huerta et al. add that that core principle of FAIRness "is synergistic with the rapid adoption and increased use of AI in research."[32] They go on to discuss the positive interactions of FAIR research objects with FAIR-driven, AI-based research. Among the benefits include[32]:

  • greater findability of FAIR research objects for further AI-driven scientific discovery;
  • greater reproducibility of FAIR research objects and any AI models published with them;
  • improved generalization of AI-driven medical research models when exposed to diverse and FAIR research objects;
  • improved reporting of AI-driven research results using FAIRified research objects, lending further credibility to those results;
  • more uniform comparison of AI models using well-defined hyperstructure and information training conditions from FAIRified research objects;
  • more developed and interoperable "data e-infrastructure," which can further drive a more effective "AI services layer";
  • reduced bias in AI-driven processes through the use of FAIR research objects and AI models; and
  • improved surety of scientific correctness where reproducibility in AI-driven research can't be guaranteed.

In the end, developers of research software (whether discipline-specific research software or broader laboratory informatics solutions) would be advised to keep in mind the growing trends of FAIR research, FAIR software, and ML- and AI-driven research, especially in the life sciences, but also a variety of other fields.[32]

Restricted clinical data and its FAIRification for greater research innovation

Broader discussion in the research community continues to occur in regards to how best to ethically make restricted or privacy-protected clinical data and information FAIR for greater innovation and, by extension, improved patient outcomes, particularly in the wake of the COVID-19 pandemic.[33][34][35] (Note that while there are other types of restricted and privacy-protected data, this section will focus largely on clinical data and research objects as the most obvious type.)

These efforts have usually revolved around pulling reusable clinical patient or research data from hospital information systems (HIS), electronic medical records (EMRs), clinical trial management systems (CTMSs), and research databases (often relational in nature) that either contain de-identified data or can de-identify aspects of data and information before access and extraction. Sometimes that clinical data or research object may have already in part been FAIRified, but often it may not be. In all cases, the concepts of privacy, security, and anonymization come up as part of any desire to gain access to that clinical material. However, any FAIRified clinical data isn't necessarily readily open for access. As Snoeijer et al. note: "The authors of the FAIR principles, however, clearly indicate that 'accessible' does not mean open. It means that clarity and transparency is required around the conditions governing access and reuse."[36]

This is being mentioned in the context of laboratory informatics applications for a couple of reasons. First, a well-designed commercial LIMS that supports clinical research laboratory workflows is already going to address privacy and security aspects, as part of the developer recognizing the need for those labs to adhere to regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and comply with standards such as ISO 15189. However, such a system may not have been developed with FAIR data principles in mind, and any built-in metadata and ontology schemes may be insufficient for full FAIRification of laboratory-based clinical trial research objects. As Queralt-Rosinach et al. note, however, "interestingly, ontologies may also be used to describe data access restrictions to complement FAIR metadata with information that supports data safety and patient privacy."[34] Essentially, the authors are suggesting that while a HIS or LIS may have built-in access management tools, setting up ontologies and metadata mechanisms that link privacy aspects of a research object (e.g., "has consent form for," "is de-identified," etc.) to the object's metadata allows for even more flexible, FAIR-driven approaches to privacy and security. Research software developers creating such information management tools for the regulated clinical research space may want to apply FAIR concepts such as this to how access control and privacy restrictions are managed. This will inevitably mean any research objects exported with machine-readable privacy-concerning metadata will be more reusable in a way that still "supports data safety and patient privacy."[34]

Second, a well-designed research software solution working with clinical data will provide not only support for open, community-supported data models and vocabularies for clinical data, but also standardized community-driven ontologies that are specifically developed for access control and privacy. Queralt-Rosinach et al. continue[34]:

Also, very important for accessibility and data privacy is that the digital objects per se can accommodate the criteria and protocols necessary to comply with regulatory and governance frameworks. Ontologies can aid in opening and protecting patient data by exposing logical definitions of data use conditions. Indeed, there are ontologies to define access and reuse conditions for patient data such as the Informed Consent Ontology (ICO), the Global Alliance for Genomics and Health Data Use Ontology (DUO) standard, and the Open Digital Rights Language (ODRL) vocabulary recommended by W3C.

Also of note here is the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and its OHDSI standardized vocabularies. In all these cases, a developer-driven approach to research software that incorporates community-driven standards that support FAIR principles is welcome. However, as Maxwell et al. noted in their Lancet review article in late 2023, "few platforms or registries applied community-developed standards for participant-level data, further restricting the interoperability of ... data-sharing initiatives [like FAIR]."[33] As the FAIR principles continue to gain ground in clinical research and diagnostics settings, software developers will need to be more attuned to translating old ways of development to ones that incorporate FAIR data and software principles. Demand for FAIR data will only continue to grow, and any efforts to improve interoperability and reusability while honoring (and enhancing) privacy and security aspects of restricted data will be appreciated by clinical researchers. However, just as FAIR is not an overall goal for researchers, software built with FAIR principles in mind is not the end point of research organizations managing restricted and privacy-protected research objects. Ultimately, those organizations will have make other considerations about restricted data in the scope of FAIR, including addressing data management plans, data use agreements, disclosure review practices, and training as it applies to their research software and generated research objects.[37]

Conclusion

Laboratory informatics developers will also need to remember that FAIRification of research in itself is not a goal for research laboratories; it is a continual process that recognizes improved scientific research and greater innovation as a more likely outcome.[1][31][32]

References

  1. 1.0 1.1 1.2 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  2. "fair data principles". PubMed Search. National Institutes of Health, National Library of Medicine. https://pubmed.ncbi.nlm.nih.gov/?term=fair+data+principles. Retrieved 30 April 2024. 
  3. Hasselbring, Wilhelm; Carr, Leslie; Hettrick, Simon; Packer, Heather; Tiropanis, Thanassis (25 February 2020). "From FAIR research data toward FAIR and open research software" (in en). it - Information Technology 62 (1): 39–47. doi:10.1515/itit-2019-0040. ISSN 2196-7032. https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html. 
  4. 4.0 4.1 4.2 4.3 Gruenpeter, M. (23 November 2020). "FAIR + Software: Decoding the principles" (PDF). FAIRsFAIR “Fostering FAIR Data Practices In Europe”. https://www.fairsfair.eu/sites/default/files/FAIR%20%2B%20software.pdf. Retrieved 30 April 2024. 
  5. Barker, Michelle; Chue Hong, Neil P.; Katz, Daniel S.; Lamprecht, Anna-Lena; Martinez-Ortiz, Carlos; Psomopoulos, Fotis; Harrow, Jennifer; Castro, Leyla Jael et al. (14 October 2022). "Introducing the FAIR Principles for research software" (in en). Scientific Data 9 (1): 622. doi:10.1038/s41597-022-01710-x. ISSN 2052-4463. PMC PMC9562067. PMID 36241754. https://www.nature.com/articles/s41597-022-01710-x. 
  6. Patel, Bhavesh; Soundarajan, Sanjay; Ménager, Hervé; Hu, Zicheng (23 August 2023). "Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool" (in en). Scientific Data 10 (1): 557. doi:10.1038/s41597-023-02463-x. ISSN 2052-4463. PMC PMC10447492. PMID 37612312. https://www.nature.com/articles/s41597-023-02463-x. 
  7. Du, Xinsong; Dastmalchi, Farhad; Ye, Hao; Garrett, Timothy J.; Diller, Matthew A.; Liu, Mei; Hogan, William R.; Brochhausen, Mathias et al. (6 February 2023). "Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software" (in en). Metabolomics 19 (2): 11. doi:10.1007/s11306-023-01974-3. ISSN 1573-3890. https://link.springer.com/10.1007/s11306-023-01974-3. 
  8. 8.0 8.1 8.2 Gruenpeter, Morane; Katz, Daniel S.; Lamprecht, Anna-Lena; Honeyman, Tom; Garijo, Daniel; Struck, Alexander; Niehues, Anna; Martinez, Paula Andrea et al. (13 September 2021). "Defining Research Software: a controversial discussion". Zenodo. doi:10.5281/zenodo.5504016. https://zenodo.org/record/5504016. 
  9. "What is Research Software?". JuRSE, the Community of Practice for Research Software Engineering. Forschungszentrum Jülich. 13 February 2024. https://www.fz-juelich.de/en/rse/about-rse/what-is-research-software. Retrieved 30 April 2024. 
  10. 10.0 10.1 10.2 10.3 van Nieuwpoort, Rob; Katz, Daniel S. (14 March 2023) (in en). Defining the roles of research software. doi:10.54900/9akm9y5-5ject5y. https://upstream.force11.org/defining-the-roles-of-research-software. 
  11. "Open source software and code". F1000 Research Ltd. 2024. https://www.f1000.com/resources-for-researchers/open-research/open-source-software-code/. Retrieved 30 April 2024. 
  12. 12.0 12.1 Moynihan, G. (7 July 2020). "The Hitchhiker’s Guide to Research Software Engineering: From PhD to RSE". Invenia Blog. Invenia Technical Computing Corporation. https://invenia.github.io/blog/2020/07/07/software-engineering/. 
  13. 13.0 13.1 Woolston, Chris (31 May 2022). "Why science needs more research software engineers" (in en). Nature: d41586–022–01516-2. doi:10.1038/d41586-022-01516-2. ISSN 0028-0836. https://www.nature.com/articles/d41586-022-01516-2. 
  14. "RSE@KIT". Karlsruhe Institute of Technology. 20 February 2024. https://www.rse-community.kit.edu/index.php. Retrieved 01 May 2024. 
  15. "Purdue Center for Research Software Engineering". Purdue University. 2024. https://www.rcac.purdue.edu/rse. Retrieved 01 May 2024. 
  16. 16.0 16.1 Cohen, Jeremy; Katz, Daniel S.; Barker, Michelle; Chue Hong, Neil; Haines, Robert; Jay, Caroline (1 January 2021). "The Four Pillars of Research Software Engineering". IEEE Software 38 (1): 97–105. doi:10.1109/MS.2020.2973362. ISSN 0740-7459. https://ieeexplore.ieee.org/document/8994167/. 
  17. Hasselbring, Wilhelm; Carr, Leslie; Hettrick, Simon; Packer, Heather; Tiropanis, Thanassis (25 February 2020). "From FAIR research data toward FAIR and open research software" (in en). it - Information Technology 62 (1): 39–47. doi:10.1515/itit-2019-0040. ISSN 2196-7032. https://www.degruyter.com/document/doi/10.1515/itit-2019-0040/html. 
  18. Ghiringhelli, Luca M.; Baldauf, Carsten; Bereau, Tristan; Brockhauser, Sandor; Carbogno, Christian; Chamanara, Javad; Cozzini, Stefano; Curtarolo, Stefano et al. (14 September 2023). "Shared metadata for data-centric materials science" (in en). Scientific Data 10 (1): 626. doi:10.1038/s41597-023-02501-8. ISSN 2052-4463. PMC PMC10502089. PMID 37709811. https://www.nature.com/articles/s41597-023-02501-8. 
  19. 19.0 19.1 Fitschen, Timm; tom Wörden, Henrik; Schlemmer, Alexander; Spreckelsen, Florian; Hornung, Daniel (12 October 2022). "Agile Research Data Management with FDOs using LinkAhead". Research Ideas and Outcomes 8: e96075. doi:10.3897/rio.8.e96075. ISSN 2367-7163. https://riojournal.com/article/96075/. 
  20. Weigel, Tobias; Schwardmann, Ulrich; Klump, Jens; Bendoukha, Sofiane; Quick, Robert (1 January 2020). "Making Data and Workflows Findable for Machines" (in en). Data Intelligence 2 (1-2): 40–46. doi:10.1162/dint_a_00026. ISSN 2641-435X. https://direct.mit.edu/dint/article/2/1-2/40-46/9994. 
  21. 21.0 21.1 21.2 21.3 Aggour, Kareem S.; Kumar, Vijay S.; Gupta, Vipul K.; Gabaldon, Alfredo; Cuddihy, Paul; Mulwad, Varish (9 April 2024). "Semantics-Enabled Data Federation: Bringing Materials Scientists Closer to FAIR Data" (in en). Integrating Materials and Manufacturing Innovation. doi:10.1007/s40192-024-00348-4. ISSN 2193-9764. https://link.springer.com/10.1007/s40192-024-00348-4. 
  22. 22.0 22.1 Grobe, Peter; Baum, Roman; Bhatty, Philipp; Köhler, Christian; Meid, Sandra; Quast, Björn; Vogt, Lars (26 June 2019). "From Data to Knowledge: A semantic knowledge graph application for curating specimen data" (in en). Biodiversity Information Science and Standards 3: e37412. doi:10.3897/biss.3.37412. ISSN 2535-0897. https://biss.pensoft.net/article/37412/. 
  23. 23.0 23.1 Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Gu, Wei; Welter, Danielle; Abbassi Daloii, Tooba; Portell-Silva, Laura (30 June 2022). "FAIR and Knowledge graphs". D2.1 FAIR Cookbook. doi:10.5281/ZENODO.6783564. https://zenodo.org/record/6783564. 
  24. 24.0 24.1 Tomlinson, E. (28 July 2023). "RDF Knowledge Graph Databases: A Better Choice for Life Science Lab Software" (PDF). Semaphore Solutions, Inc. https://21624527.fs1.hubspotusercontent-na1.net/hubfs/21624527/Resources/RDF%20Knowledge%20Graph%20Databases%20White%20Paper.pdf. Retrieved 01 May 2024. 
  25. 25.0 25.1 Deagen, Michael E.; McCusker, Jamie P.; Fateye, Tolulomo; Stouffer, Samuel; Brinson, L. Cate; McGuinness, Deborah L.; Schadler, Linda S. (27 May 2022). "FAIR and Interactive Data Graphics from a Scientific Knowledge Graph" (in en). Scientific Data 9 (1): 239. doi:10.1038/s41597-022-01352-z. ISSN 2052-4463. PMC PMC9142568. PMID 35624233. https://www.nature.com/articles/s41597-022-01352-z. 
  26. Brandizi, Marco; Singh, Ajit; Rawlings, Christopher; Hassani-Pak, Keywan (25 September 2018). "Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach" (in en). Journal of Integrative Bioinformatics 15 (3): 20180023. doi:10.1515/jib-2018-0023. ISSN 1613-4516. PMC PMC6340125. PMID 30085931. https://www.degruyter.com/document/doi/10.1515/jib-2018-0023/html. 
  27. de Visser, Casper; Johansson, Lennart F.; Kulkarni, Purva; Mei, Hailiang; Neerincx, Pieter; Joeri van der Velde, K.; Horvatovich, Péter; van Gool, Alain J. et al. (28 September 2023). Palagi, Patricia M.. ed. "Ten quick tips for building FAIR workflows" (in en). PLOS Computational Biology 19 (9): e1011369. doi:10.1371/journal.pcbi.1011369. ISSN 1553-7358. PMC PMC10538699. PMID 37768885. https://dx.plos.org/10.1371/journal.pcbi.1011369. 
  28. Schröder, Max; Staehlke, Susanne; Groth, Paul; Nebe, J. Barbara; Spors, Sascha; Krüger, Frank (1 December 2022). "Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation" (in en). Journal of Biomedical Semantics 13 (1): 4. doi:10.1186/s13326-021-00257-x. ISSN 2041-1480. PMC PMC8802522. PMID 35101121. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-021-00257-x. 
  29. Hiniduma, Kaveen; Byna, Suren; Bez, Jean Luca (2024). "Data Readiness for AI: A 360-Degree Survey". arXiv. doi:10.48550/ARXIV.2404.05779. https://arxiv.org/abs/2404.05779. 
  30. Fletcher, Lydia (16 April 2024). FAIR Re-use: Implications for AI-Readiness. The University Of Texas At Austin, The University Of Texas At Austin. doi:10.26153/TSW/51475. https://repositories.lib.utexas.edu/handle/2152/124873. 
  31. 31.0 31.1 Olsen, C. (1 September 2023). "Embracing FAIR Data on the Path to AI-Readiness". Pharma's Almanac. https://www.pharmasalmanac.com/articles/embracing-fair-data-on-the-path-to-ai-readiness. Retrieved 03 May 2024. 
  32. 32.0 32.1 32.2 32.3 Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali et al. (26 July 2023). "FAIR for AI: An interdisciplinary and international community building perspective" (in en). Scientific Data 10 (1): 487. doi:10.1038/s41597-023-02298-6. ISSN 2052-4463. PMC PMC10372139. PMID 37495591. https://www.nature.com/articles/s41597-023-02298-6. 
  33. 33.0 33.1 Maxwell, Lauren; Shreedhar, Priya; Dauga, Delphine; McQuilton, Peter; Terry, Robert F; Denisiuk, Alisa; Molnar-Gabor, Fruzsina; Saxena, Abha et al. (1 October 2023). "FAIR, ethical, and coordinated data sharing for COVID-19 response: a scoping review and cross-sectional survey of COVID-19 data sharing platforms and registries" (in en). The Lancet Digital Health 5 (10): e712–e736. doi:10.1016/S2589-7500(23)00129-2. PMC PMC10552001. PMID 37775189. https://linkinghub.elsevier.com/retrieve/pii/S2589750023001292. 
  34. 34.0 34.1 34.2 34.3 Queralt-Rosinach, Núria; Kaliyaperumal, Rajaram; Bernabé, César H.; Long, Qinqin; Joosten, Simone A.; van der Wijk, Henk Jan; Flikkenschild, Erik L.A.; Burger, Kees et al. (1 December 2022). "Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic" (in en). Journal of Biomedical Semantics 13 (1): 12. doi:10.1186/s13326-022-00263-7. ISSN 2041-1480. PMC PMC9036506. PMID 35468846. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-022-00263-7. 
  35. Martínez-García, Alicia; Alvarez-Romero, Celia; Román-Villarán, Esther; Bernabeu-Wittel, Máximo; Luis Parra-Calderón, Carlos (1 May 2023). "FAIR principles to improve the impact on health research management outcomes" (in en). Heliyon 9 (5): e15733. doi:10.1016/j.heliyon.2023.e15733. PMC PMC10189186. PMID 37205991. https://linkinghub.elsevier.com/retrieve/pii/S2405844023029407. 
  36. Snoeijer, B.; Pasapula, V.; Covucci, A. et al. (2019). "Paper SA04 - Processing big data from multiple sources" (PDF). Proceedings of PHUSE Connect EU 2019. PHUSE Limited. https://phuse.s3.eu-central-1.amazonaws.com/Archive/2019/Connect/EU/Amsterdam/PAP_SA04.pdf. Retrieved 03 May 2024. 
  37. Jang, Joy Bohyun; Pienta, Amy; Levenstein, Margaret; Saul, Joe (6 December 2023). "Restricted data management: the current practice and the future". Journal of Privacy and Confidentiality 13 (2). doi:10.29012/jpc.844. ISSN 2575-8527. PMC PMC10956935. PMID 38515607. https://journalprivacyconfidentiality.org/index.php/jpc/article/view/844.