Journal:AI4Green: An open-source ELN for green and sustainable chemistry

From LIMSWiki
Revision as of 18:47, 5 June 2023 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title AI4Green: An open-source ELN for green and sustainable chemistry
Journal Journal of Chemical Information and Modeling
Author(s) Boobier, Samuel; Davies, Joseph C.; Derbenev, Ivan N.; Handley, Christopher M.; Hirst, Jonathan D.
Author affiliation(s) University of Nottingham
Year published 2023
Volume and issue 63(10)
Page(s) 2895–2901
DOI 10.1021/acs.jcim.3c00306
ISSN 1549-960X
Distribution license Creative Commons Attribution 4.0 International
Website https://pubs.acs.org/doi/10.1021/acs.jcim.3c00306
Download https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.3c00306

Abstract

This paper presents the free and open-source, web-based electronic laboratory notebook (ELN) AI4Green, which combines features such as data archiving, collaboration tools, and green and sustainability metrics for organic chemistry. AI4Green offers the core functionality of an ELN, namely, the ability to store reactions securely and share them among different members of a research team. As users plan their reactions and record them in the ELN, green and sustainable chemistry is encouraged by automatically calculating green metrics and color-coding hazards, solvents, and reaction conditions. The interface links a database constructed from data extracted from PubChem, enabling the automatic collation of information for reactions. The application’s design facilitates the development of auxiliary sustainability applications, such as our Solvent Guide module. As more reaction data are captured, subsequent work will focus on providing “intelligent” sustainability suggestions to the user.

Keywords: electronic laboratory notebook, ELN, green and sustainable chemistry, open-source, data sharing

Introduction

For researchers to communicate their findings between their team and the wider scientific community, data must be shared and stored. Paper-based laboratory notebooks are traditionally used to record experiments, and little has changed over the last few decades, despite the ubiquity of digital technology. Over the past 20 years, electronic laboratory notebooks (ELNs) have become more prevalent as the benefits of digitization are realized. [1] Despite this, there remains a significant barrier to the uptake of ELNs, especially in the academic community. [2] In 2017, a survey at a BioSistemika webinar revealed that only 7% of respondents used an ELN in their daily laboratory routine. [2] Another survey from the same study showed that the main barriers were the cost associated with implementing an ELN and the system’s usability.

Recently, in a comprehensive comparison of commercial and open-source ELNs, it was discovered that the majority of the 96 currently active ELNs are commercial. [3] It was also noted that open-source codebases have the advantage that users could more directly contribute to the development of new features and have more control over the underlying software. However, there is more onus on the institution to install, host, and maintain the infrastructure. Chemotion is an open-source ELN designed for synthetic chemistry with a growing user base and a strong focus on data sharing and integrity [4,5], but not a particular emphasis on green and sustainable chemistry. Another open-source solution is eLabFTW, an ELN suitable for storing data from various scientific disciplines. [6]

Research data management is fundamental to scientific research. [7] Transitioning from paper-based laboratory notebooks to ELNs is crucial for adhering to data standards when reporting and publishing studies. [8] ELNs are also vital in making data FAIR (findable, accessible, interoperable, and reusable). [9] ELNs allow data sharing among colleagues and institutions, while also facilitating public access. [10] Open science [11,12] can also be enabled using ELNs, where data is curated using a standard data format, expediting data searches and preparation for machine learning (ML), where large data sets are often required to train insightful models. Recent examples of such databases include the Open Reaction Database [13] and the Chemotion Repository. [14]

Sustainability and reducing waste are vital considerations in laboratory-based projects. "Sustainable" refers to both the environmental and socio-economic impacts of a process. [15] Making processes more sustainable is not just a requirement of government regulations. There are also the benefits of cost reductions, improved worker health and safety, and the reduction of impact on the environment. [16,17] Current software tools for green and sustainable chemistry have recently been reviewed. [18] ELNs also offer the opportunity for collecting data that can be used to monitor sustainability targets (such as the reduction of hazardous solvents) and share knowledge among colleagues. [19]

In this work, we present AI4Green, designed to fulfill the core functionality of an ELN for synthetic organic chemistry in academic and industry settings, while also encouraging green and sustainable chemistry. The software automatically presents the hazards and sustainability of an inputted reaction by calculating sustainability metrics and a color-coded assessment of solvents and reaction conditions. While the web application is open-source, the software is provided in a manner that has a low barrier to installation and hosting, has a user-friendly interface, and is easily customizable. As the number of users grows, the captured reaction data will be subsequently leveraged using ML to provide “intelligent” suggestions to users on improving their reactions’ sustainability.

Implementation

AI4Green is a web application written in Python, JavaScript, HTML, and CSS (Figure 1). The application is hosted on the cloud and available for general use at https://ai4green.app. Alternatively, visit the GitHub page for simple instructions detailing installation and hosting via Docker either locally or on an organization’s local server.


Fig1 Boobier JofChemInfoModel2023 63-10.gif

Figure 1. Web application architecture showing programming languages, databases, and how users interact with the application.

The backend server is built with Python and Flask and is linked to a Postgres relational database. The advantage of using Python for the backend is the access to many popular standard chemistry libraries, such as RDKit, [20] and a low barrier for developers to add additional features. Flask is a well-documented and easy-to-use web application framework [21], and blueprints provide a clear structure to the code and facilitate expansion. Flask separates the Python backend from the JavaScript, HTML, and CSS-controlled frontend, where users input their data via MarvinJS [22] chemical editor or in a number of formats, including SMILES. [23] JavaScript AJAX requests are used to update pages dynamically, e.g., automatically calculating green and sustainable metrics when inputting user data. A summary and sustainability report are presented back to the user which can be exported as a PDF or CSV file. A single database was implemented in Postgres, constructed from compound data extracted from PubChem [24] and CHEM21 sustainability data [25], to provide chemical information automatically, and separated into tables for users, workgroups, workbooks, solvents, hazards, compounds (reagents), and reactions.

Results

Workgroups, workbooks, and user types

AI4Green installations may have one or more admin users. These users, typically system administrators for an institution, review requests to make new workgroups and can monitor the number of users, compounds, and reactions on the server. Users must register for an account to use AI4Green, at which point those in the principal investigator (PI) or equivalent roles, are prompted to make a workgroup as a space for their research group; other users are directed to join the workgroup their PI has created.

Within workgroups, there are workbooks that are designed to contain reactions for a specific project (Figure 2). Workgroups have three roles with different permission levels. PIs are the workgroup owner and have full permission to create workbooks and add or remove users from the workgroup and any workbooks within it. It is permissible to have two or more PIs in a workgroup. The senior researcher role, suitable for postdoctoral researchers or equivalent, can create new workbooks and add or remove users to these workbooks. They have no such rights for workgroups. The standard member role, suitable for postgraduate researchers or equivalent, have no editing rights but can request to be added to workbooks.

Using this flexible approach, a user can belong to multiple workgroups in different roles, e.g., a PI in one workgroup and a senior researcher in another. Reactions are only shared within the same workbook, as are any novel compounds added to the database. A user owns the reactions they create, which are also available as read-only entries to all members of the workbook, thus enabling data sharing between team members. This is especially useful when teams are spread over multiple locations while preserving data privacy.


Fig2 Boobier JofChemInfoModel2023 63-10.png

Figure 2. Web application architecture showing programming languages, databases, and how users interact with the application.

Reaction Builder

The core functionality of AI4Green is the Reaction Builder module. In later sections, we give further details on the different components of the Reaction Builder. Reactions can be created by navigating to a workgroup, choosing a workbook, and selecting “New Reaction.” The user is prompted to enter a name for the reaction, which must be unique within the workbook. In addition, a unique code is also assigned to every new reaction. Users draw their reaction into the Marvin JS reaction sketcher. Next, the user is prompted to fill in the Reaction Table, for example, inputting the amount of each reaction component. At this stage, further solvents and reagents can be added. Finally, the Summary Table, which contains several automatically calculated green and sustainable metrics and detailed health and safety information, is generated. The reaction is automatically saved when new changes are made and can be reloaded and edited at a later date. The Summary Table can be exported to PFG or printed for use as a risk assessment.

Marvin JS

Users must first input their reaction with the Marvin JS reaction sketcher (Figure 3). This sketcher is easy-to-use and well-documented. [22] For users familiar with other sketchers, it is possible to import structures in several formats, e.g., SMILES [23], which are easily exported from other sketchers. Reagents or solvents above or below the arrow are not currently accepted. However, these can be added directly to the Reaction Table. When the reaction is submitted, the reaction SMILES (RXSMILES) is exported from the sketcher to the Reaction Table. The database, containing information from PubChem, is queried for the reactants and products in the RXSMILES to obtain density, molecular weight, and hazard codes automatically. All compound data have been collected from PubChem laboratory chemical safety sheets (LCSS). The hazard data are presented as Globally Harmonized System of Classification and Labelling of Chemicals (GHS) hazard codes.[1] The hazard data are only collected from the references provided by the European Chemicals Agency (ECHA). [26]


Fig3 Boobier JofChemInfoModel2023 63-10.png

Figure 3. Users draw their reaction using the Marvin JS reaction sketcher. Reactions can also be imported in a variety of formats, including SMILES.

Reaction Table

The user is prompted to populate the Reaction Table (Figure 4) and provide extra information if any reactant or product is not in the database. This “novel compound” is saved to the database and can be reused, but only within the same workbook. Reagents can be added from the PubChem compound database by searching name or CAS; they can also be added to the database like a novel compound. Solvents can also be added from a predefined list and by searching name or CAS. “Novel solvents” can be added in the same way as “novel compounds”. Solvents are color-coded according to the four-tier CHEM21 classification—recommended, problematic, hazardous, and highly hazardous [25]—providing immediate feedback to the user on the sustainability of their solvent choice. Users will then input the details of their reaction into the Reaction Table. Physical forms of all reactants, reagents, solvents, and products must be provided to assess the reaction’s risk. Additionally, limiting reagent mass and the equivalence of all other reactants and reagents are required to proceed. Any suspected incorrect data from the database can be reported to system administrators for review at any point in the procedure. There is also space to describe the experimental procedure and any observations made during the reaction.


Fig4 Boobier JofChemInfoModel2023 63-10.png

Figure 4. A partially complete Reaction Table. Users are directed to provide information about the reaction (highlighted in red). Reagents, solvents, and novel compounds can be added or removed. Some information such as molecular weight and hazard codes are automatically populated from the PubChem database.

Summary Table

With the Reaction Table complete, users are directed to the Summary Table (Figure 5). Information is automatically passed from the Reaction Table to the Summary Table. Visual assessments of the greenness and sustainability of the reaction are displayed to the users. These are either flagged as unsustainable (red) or given a traffic light system (i.e., red = not recommended/hazardous; yellow = problematic; and green = recommended). The specific colors and shades for these ratings can be altered on the accessibility page. An overall hazard rating is generated from the hazard codes, denoted as Low (L), Medium (M), Hazardous (H), or Very Hazardous (VH). The threshold of the sustainability levels of the following metrics was in accordance with the CHEM21 project. [27] Several of these metrics are calculated automatically, like the sustainability of the chemical elements used in the reaction and the atom efficiency. Other metrics must be inputted by the user, such as the temperature of the reaction, batch or flow reaction conditions, the isolation method, the use of a catalyst, and whether that catalyst was recovered. A risk assessment section follows, which allows users to identify standard protocols, disposal of waste materials, spillage procedures, and any other risks associated with the reaction. An overall risk score can then be computed by self-assessment of the reaction’s hazards, risks, and consequences. Typically, a reaction would be performed at this point. After the reaction run, the user can return to the Summary Table and input unreacted and actual product mass. Using these inputs, four more metrics are computed: mass efficiency, yield, conversion, and selectivity. The reaction can be marked as complete and locked to further editing at this stage. For increased data integrity, reactions modifications are time-stamped in the database. Reactions may currently be searched alphabetically or by most recently created.


Fig5 Boobier JofChemInfoModel2023 63-10.png

Figure 5. Part of the Summary Table showing information about the hazards of the reactions and various green and sustainability metrics and considerations.

References

  1. "About the GHS". United Nations Economic Commission for Europe. 2021. https://unece.org/about-ghs. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added. This version adds a reference to the GHS.