Journal:GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration

From LIMSWiki
Revision as of 20:35, 3 April 2024 by Shawndouglas (talk | contribs) (Text replacement - "\[\[LabArchives, LLC(.*)" to "[[Vendor:LabArchives, LLC$1")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title GitHub as an open electronic laboratory notebook for real-time sharing of knowledge and collaboration
Journal Digital Discovery
Author(s) Scroggie, Kymberley R.; Burell-Sander, Klementine J.; Rutledge, Peter J.; Motion, Alice
Author affiliation(s) University of Sydney
Primary contact Email: alice dot motion at sydney dot edu dot au
Year published 2023
Volume and issue 2
Page(s) 1188-1196
DOI 10.1039/D3DD00032J
ISSN 2635-098X
Distribution license Creative Commons Attribution-NonCommercial 3.0 Unported
Website https://pubs.rsc.org/en/content/articlehtml/2023/dd/d3dd00032j
Download https://pubs.rsc.org/en/content/articlepdf/2023/dd/d3dd00032j (PDF)

Abstract

Electronic laboratory notebooks (ELNs) have expanded the utility of the paper laboratory notebook beyond that of a simple record keeping tool. Open ELNs offer additional benefits to the scientific community, including increased transparency, reproducibility, and integrity. A key element underpinning these benefits is facile and expedient knowledge sharing which aids communication and collaboration. In previous projects, we have used LabTrove and LabArchives as open ELNs, in partnership with GitHub (an open-source web-based platform originally developed for collaborative coding) for communication and discussion. Here we present our personal experiences using GitHub as the central platform for many aspects of the scientific process, including version-controlled recording of experiments, results and interpretation, data storage, project management, workflows, communication, and collaboration. We report on the utility of GitHub as an open ELN for chemistry research, and we discuss our experiences employing it with the Open Source Mycetoma and Open Source Tuberculosis consortia. By outlining its features and shortcomings through their implementation in our work, we demonstrate how using GitHub as a central platform can aid the real-time sharing of knowledge and collaboration, and further democratize scientific research within both open and traditional research models.

Keywords: electronic laboratory notebook, ELN, GitHub, data sharing, knowledge sharing, chemistry, open research

Introduction

Technological advances have allowed scientists to move beyond the primitive utility of the paper laboratory notebook as a record-keeping tool. In 1994, Borman noted that electronic laboratory notebooks (ELNs) “could revolutionize how scientists record their research, manage their data, and share their information with others."[1] ELNs have indeed been integrated into laboratory information management systems (LIMS) and electronic laboratory environments (ELEs), but they have also revolutionized the way in which scientists disseminate knowledge, particularly through the internet.

ELNs enable knowledge sharing, facilitating faster transfer of knowledge and collaboration, which in turn expedites future knowledge generation and improves research efficiency.[2][3] The digital storage of information further increases efficiency with greater longevity, readability, and searchability. Despite these benefits, the shift away from paper to electronic has been an evolutionary process rather than a revolutionary one, and scientists—particularly those in academia—have been slow to accept and adopt ELNs.[4]

The ability of scientists to move to electronic documentation of their work with minimal disruption has been identified as the key factor for broader acceptance of ELNs in an academic setting.[5] However, the highly diverse nature of different disciplines within academia leads to a broad range of specific needs that require highly specialized or custom ELNs to affect a seamless transition. While some commercial ELNs can support many specialized requirements, their licensing and maintenance costs often put them out of reach for individual academic research groups.[6][7] Instead, many have made use of generic, freely available platforms such as OneNote[8], EverNote[9][10], or Google Docs[11], with others developing their own ELNs to reap the specific benefits they require.[12][13][14]

We at the University of Sydney have successfully used several different ELNs for our own work as part of different open-source drug discovery consortia, including Open Source Malaria[15], Open Source Mycetoma[16], and Open Source Tuberculosis. Open-source drug discovery is a new approach to drug discovery in which all aspects of research are shared publicly and in real-time (i.e., immediately as it is produced) to facilitate collaboration and knowledge sharing.[17] These consortia follow the principles of open science, in which scientific knowledge is developed collaboratively and made freely accessible to any interested parties[18], and more specifically Todd's Six Laws of Open Science.[19]

In line with openly sharing our research, we have hosted ELNs on the open-source software platform LabTrove[20] and the commercial ELN LabArchives,) while simultaneously using GitHub to support discussion and collaboration. To bring together the sharing of knowledge and collaboration into a single open and central location, we have now explored the use of GitHub itself as the ELN (Fig. 1). Using GitHub as both an ELN and a hub for instant communication elevates it to the status of a “collaboratory” as envisioned by Wulf, as a “centre without walls, in which the nation's researchers can perform their research without regard to geographical location, interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries.”[21]

This article draws on the experiences of two of the authors using GitHub as an ELN for various synthetic chemistry projects and provides preliminary findings into its usability. We report on the utility of GitHub as an open ELN, detail its features in this dimension, and discuss its implementation for open-source drug discovery. We also share an ELN template GitHub repository for those considering alternative ELNs. While we have used GitHub as an open ELN, and repositories are open by default, we note that for projects that require confidentiality or follow a traditional research methodology, information and data can be held within closed repositories with access limited to only invited users.


Fig1 Scroggie DigDisc2023 2.gif

Figure 1. How we share scientific data and knowledge with the community in real-time.

GitHub

GitHub is a web-based graphical interface for Git, an open-source version control system. It was originally designed for software developers to work collaboratively on open-source code; however, in recent years the GitHub community has expanded. After software development, education, and data, science now represents the fourth largest category of users.[22] Examples range from machine learning programs like the tuberculosis and lung cancer screening initiative AiAi.care Project, to organic chemistry applications, including reaction visualizers, spectroscopic databases, and chemistry learning tools. Open Source Malaria, Open Source Mycetoma, Open Source Tuberculosis, and Open Source Antibiotics represent four examples of open-source drug discovery hosted on the platform, making use of GitHub's forum-like structure to facilitate open, real-time collaboration and discussion among teams of scientists all around the world.

The version control enabled by Git is directly transferable to ELNs. Importantly, for the validity and verifiability of scientific research, using Git enables users to keep track of the who, what, when and even why: when saving changes, GitHub offer the option to provide a short description of what was changed and why the change was made. This record-keeping enables greater transparency, making it easy to see if an edit was made to fix typos, add information, or alter data, and it's crucial in maintaining data integrity and preventing misunderstanding or misuse of data.[23] Furthermore, all activities are attributed to the user via their display name, bestowing a level of accountability and responsibility, while also ensuring that contributors receive attribution for their work.

A number of user interfaces (UIs) for Git exist, including GitHub, GitLab, and Gitea, each offering slightly different user experiences. Each can be used as an ELN, as described in this article; however, GitHub is more openly accessibly and offers additional UI features (e.g., a Discussions tab for public discourse), making it more suitable for hosting open-source and collaborative projects.

GitHub's accessibility is also important to the open science ethos. No account or subscription is required to view work within a public GitHub repository, allowing people to access data without concerns of cost or association with institutions. Through a standard internet browser, anyone can view content as soon as it is published without the researcher needing to “share” their work, or the reader having to access any proprietary products. In contrast, to view content on GitLab it is required to have an account and be signed in, while Gitea is a self-hosted UI.

Not only is the content on GitHub openly accessible, but users can connect to content on GitHub in different ways: from the web-based site, desktop app, or mobile app. The mobile app is available for Android and iOS and is easy to use on a standard smartphone or tablet. Many popular ELNs are primarily laptop-based[23], and while no research has yet specifically examined the use of mobile apps for ELNs, we envision that this mode of access will improve record-keeping in laboratory settings due to the ease of access, portability, and ubiquity of mobile devices. Most, if not all, researchers are able to access the GitHub app on their device to swiftly read through past methods, add details and observations in the moment, or snap a photo for the ELN. A similar sentiment has been expressed by others who suggest that many researchers are likely to prefer mobile-based ELNs for their portability and extra features, like the built-in camera and option to annotate images using a stylus.[24][25]

We have used GitHub repositories as an ELN for both laboratory-based synthetic projects and computer-based social science projects, and we describe our experiences using it in the synthetic chemistry laboratory as a case study below.

ELN structure and utility

At the top layer, GitHub uses repositories to organize and store data and information. Each repository has Code, Issues, Discussions, Projects and Wiki tabs, all of which contribute to the ELN workflow (Fig. 2). Repositories also contain Pull Requests, Actions, and Insight tabs, which are currently not used in our ELN workflow, along with Security and Settings tabs which are not discussed here.


Fig2 Scroggie DigDisc2023 2.gif

Figure 2. Overview of a GitHub ELN repository and its structure and utility.

Repositories can be set up either by an individual or an organization (e.g., research group) and assigned to individuals. Within Open Source Mycetoma and Open Source Tuberculosis there are topic-specific repositories which support discussion and collaboration, while ELN repositories are created by individuals and linked to the relevant organization's repositories. This gives researchers the freedom to organize ELN repositories in a way that suits their individual needs. For example, while multiple projects can be contained in a single repository, a researcher may choose to have multiple repositories, one for each project they are involved in. Alternatively, a research group could set up repositories for each project with all researchers working on the project contributing to the single repository. Either way, an overview of all repositories can be viewed on both the individual's and organization's profile. This interconnectivity of related work and segregation of distinct topics makes GitHub a useful tool, not only as an ELN, but also as a platform for the presentation of research and collaboration.

Notebook pages

The Issues tab is used to house notebook pages: each represents an individual experiment and contains all the essential information, including the title, aim, quantity of reagents, methods, results, discussion, and conclusions, along with linked references. In creating a new issue, the title, hyperlink to the risk assessment, reaction scheme (uploaded as an image), and table of reagents are posted. Plain text is formatted using Markdown, creating headings, tables, and hyperlinks to aid clarity and readability. Making use of the forum-like structure, each subsequent addition to the notebook page is posted as a comment and conveniently time stamped. All experimental, observational, and analytical data are also uploaded to the relevant issue. Once an experiment is completed, the issue is "closed." This keeps the Issues landing page free of clutter and "open" issues (active experiments) easily accessible, as open and closed issues are segregated. Examples of a typical Issues landing page and notebook page are shown in Fig. 3 and 4, respectively.


Fig3 Scroggie DigDisc2023 2.gif

Figure 3. An example of the Issues landing page showing four recent experiments.

Fig4 Scroggie DigDisc2023 2.gif

Figure 4. Example of a new issue as ELN entry, including the title, hyperlink to the risk assessment, reaction scheme and table of reagents formatted using Markdown. A completed experiment example can be found on https://github.com/TheBreakingGoodProject/ELN-Kymberley-Scroggie/issues/57.

Data management

GitHub supports numerous common file types, including Microsoft Office files, PDFs, image and video files, and ZIP files. The ability to upload ZIP files to an issue is particularly advantageous as it allows both processed and raw data to be included, promoting best practice in data storage and facilitating reuse.[26][27][28] Data files are easily added via the web browser version using a simple drag-and-drop method that should be intuitive to most computer users. Files can be uploaded to a relevant issue, or under the Code tab for centralized data storage. Files stored centrally on this tab can then be hyperlinked to relevant issues, so that users can rapidly access data related to the topic at hand. Furthermore, GitHub offers a desktop application, which can be used to add and curate files within the Code tab in a system analogous to typical file management operating systems.

Metadata and curation

The use of metadata and curation aids the organization and accessibility of the ELN to those beyond of the individual ELN user.[29] GitHub contains numerous forms of metadata, including timestamps; indication of contributors; specific, customizable descriptors in the form of Labels; and categorization according to projects and status. Labels, analogous to the hash tags used ubiquitously on social media sites, are short descriptive statements such as “new experiment,” “upscale,” or “help needed.” GitHub automatically suggests labels, drawing either from a default list (in the case of new users) or from previously used ones (added by a user). These labels are added to each issue and used to filter experiments by their various attributes. As issues are created in the order experiments are planned or performed and appear on the landing page in this order, labels make it easier to find relevant issues.

Along with labels, issues are sorted into projects, consolidating all experimental work relevant to a given branch of investigation in one central location. Each project has its own landing page, accessible from the Projects tab, which contains links to all issues assigned to it as cards. Cards can be further sorted into columns and categorized. The authors favor the division into "To Do," "In Progress," and "Done" categories. Using this system to organize their work, it is easy to keep track of planned, ongoing, and completed experiments and assess progress. The process can also be automated, so that performing specific actions automatically shifts cards into a new column within its assigned project. For example, one author has a workflow whereby assigning a newly created issue to a project adds the respective card "To Do," and closing an issue moves it to "Done." As with issues, projects are "closed" and archived once the line of investigation is completed. This capacity for curation is an important workflow tool, as it prevents landing pages from being cluttered with obsolete links or information.

This method of automated workflow integrated into the capture of metadata at the source (the initial creation of a new issue) helps reduce the burden of curation.[30] Previous work has noted the “blank canvas effect,” whereby researchers fail to add metadata due to unfamiliarity, rather than unwillingness.[31] GitHub actively encourages the assignment of metadata through labels and project categorization, and the capture of metadata at the source. Upon creation of a new issue, GitHub prompts users to add labels. This comparatively strong metadata support and active encouragement may be more effective than expecting users to create and curate their own labels without prompting.[32] We suggest that the project, status, and label features offered by GitHub facilitate individual project management, thus making researchers more likely to incorporate them into their ELNs for strategic reasons rather than because the system requires it.

Another important aspect of curation which is especially useful for making open-source work accessible to those not already involved in a project is the Wiki tool. This provides a place for a formal presentation of the work contained within the ELN, with pages organized according to topic. In this synthetic chemistry case study, a page in the Wiki has been dedicated to every different reaction, with this reaction landing page housing links to the notebook page for each attempt at the reaction, both successful and unsuccessful, alongside optimized methods, exemplar characterization data, and other relevant notes. These pages provide meta context and information that is often not present within the notebook pages of a typical ELN and are easily cross-linked, so that each page refers to multiple other relevant pages within the Wiki. This makes it easier for other researchers to find useful methods and data, while maintaining a high level of transparency, which improves both research integrity and reproducibility of results.

It is important to note that this method of organizing data means that all attempts at a reaction are made publicly available, not just the successful experiments. This not only allows other researchers to come to their own conclusions about the results obtained, it also prevents duplication of unsuccessful efforts, as researchers can easily see what has and hasn't worked. While such complete transparency about the scientific process can appear intimidating to some researchers, it is a valuable asset of open ELNs and a powerful tool to support research integrity.[29][33][34]

Real-time sharing of knowledge and collaboration

Arguably GitHub's biggest advantage over other currently available ELNs is its capacity for rapid, real-time sharing of knowledge and collaboration. In the traditional scientific model, new work is typically shared only after significant positive results are found and through avenues that entail considerable delays, including conference presentations that may occur infrequently or the famously lengthy peer review process. In this context, real-time sharing means making experimental data available as soon as it is produced. Openly sharing knowledge in this way increases transparency, reproducibility, integrity[35][36], collaboration[37][38], and impact[39], and groups that collaborate on their research recognize the benefits of sharing and discussing their work.[40] In our use of GitHub, the Issues and Wiki tabs are used for the real-time sharing of knowledge, while collaboration is facilitated through comments on specific issues. Publications in preparation are maintained through third-party platforms that allow real-time collaborative editing, such as Google Drive, with links posted to GitHub. This simplifies the process of keeping track of successive versions of manuscripts. These functions allow researchers to communicate in real-time, discussing their work in a format that is rapid, direct, and accessible.

The immediate publication of additions to notebook or wiki pages, along with the ease of sharing these updates via email or social media sites, allows new ideas and experiments to be made available to the broader public almost immediately, and avoids the delays typically seen when research is shared through formal channels, as with research papers and conference presentations. Indeed, the real-time sharing of research through wiki pages and specific issues was particularly useful for work conducted as part of the Open Source Tuberculosis project, which involved using Twitter to seek advice and suggestions on synthetic procedures. Both specific attempts at a reaction and a proposed synthetic scheme could be easily shared online, making it straightforward for interested parties to read more and make informed recommendations based on the experimental work already completed. Sharing this work resulted in a number of useful suggestions on alternative reagents and reaction conditions for experiments, providing further evidence that openly sharing knowledge in real-time is a powerful collaboration tool.

Unlike other UIs for Git, GitHub also facilitates public collaboration in a forum-like structure through the Discussions tab. Discussions is a relatively new feature, and before its introduction, the authors conducted discussions through dedicated issues. However, as it is typical of any online forum, we propose that its major application in collaboration lies in its ease of access for GitHub users. Having separate, dedicated spaces for the ELN and discussion within a single central system will conceivably facilitate conversations which might not be related to a specific experiment, such as conversations about a project's overall direction, organization of meetings, or general brainstorming and sharing of ideas. Each Discussion can be organized into appropriate categories, so that users can quickly find the conversations they are interested in without having to sift through irrelevant topics, with links to relevant issues and wiki pages as appropriate.

Additionally, the customizability of GitHub allows repositories to be set up in a way that makes it easy for unfamiliar readers to quickly acquaint themselves with both the overall project and any recent updates, aiding the collaborative process. When accessing a repository, visitors are initially directed to the Code tab, making this the ideal place for introducing the ELN's owner(s) or curator(s), and the project(s) it relates to, through a README.md file. Hyperlinks guide visitors to other relevant sites, such as researchers' websites and social media profiles, and other GitHub repositories related to the project.

Shortcomings

In using GitHub as an ELN, the authors experienced several benefits as discussed above, but we also faced challenges. In the sections below we address these challenges and provide recommendations and workarounds we have found useful.

Markdown learning curve

Unlike many ELNs currently available to researchers, GitHub is not specifically designed for scientists. While this is part of what makes it such a versatile ELN and useful for diverse types of research, it also creates some hurdles for researchers wishing to move an established workflow to GitHub. As individuals not familiar with computer science or programming languages, the biggest hurdle we have experienced is the need to use Markdown to format text. Markdown is a software used to format text using a plain-text editor, whereby text and images can be formatted by inserting extra characters or commands (e.g., to bold text, the user inserts two asterisks before and two asterisks after that text). Unless a user is already familiar with this syntax, it can be a challenging learning curve, as the new syntax must be learned and applied for users to create clear, easy-to-read entries in their ELN and fully benefit from GitHub's potential.

Fortunately, several factors and workarounds make Markdown a less imposing challenge than it first seems. First, GitHub provides users with a truncated menu of “clickable” formatting options which insert the characters or commands required to render common format styles. In addition, there is a Preview tab available for all text entries, which shows the user how the rendered and formatted text will appear, and so speeds the learning process. Users can start by clicking a desired formatting option, and gradually learn the necessary text entry, much the same as learning a hotkey for a formatting option in Microsoft Office. Secondly, because Markdown is a commonly used markup language, many guides to using it are available online (e.g., https://www.markdownguide.org/ and https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github). There are also apps that enable users to write and format as they normally would in programs like Microsoft Word, with the text automatically converted to a Markdown format, which can then be copied across to GitHub.

Another way to circumvent the challenges inherent in learning Markdown is to build and share templates. GitHub offers the option of creating issue templates, which facilitates quicker creation of new issues. This works particularly well for synthetic experiments, with the same basic template being followed when writing up most experiments. The chosen template contains the desired formatting with space for researchers to add their own methods, results, and data in the appropriate sections. A template may be created by researchers themselves, sourced from other group members, or from online resources; we have created and shared a number of templates to be used in different settings. Templates are already used in other ELNs to expedite the creation of lab entries with similar or near-identical information, as occurs with repeated or parallel reactions and procedures.[13][27][41] Although previous work suggests templates are not always widely adopted[20], GitHub's requirement that users work with Markdown to create posts makes them more attractive.

Data storage limitation

GitHub supports file sizes up to 25 MB when attaching files via a browser, and files up to 100 MB when uploaded using the desktop application. Files greater than 100 MB in size can be uploaded if they are first converted to multiple smaller files, which can be re-assembled after download, but this requires both uploader and downloader to have the relevant programs and technical knowledge to perform these conversions and re-assemblies. This is a significant issue for certain disciplines in which data files may often be larger than this threshold, such as X-ray spectroscopy.[20] However, this problem of data storage is not unique to GitHub. Many ELNs have limits on file size[7]; for instance, LabArchives has an accumulative limit of 100 GB per user. It is also not static: as computing progresses and large files become more prevalent, it is likely that GitHub and other cloud-based ELN providers will update their capacity and increase data size limits. A workaround for research involving large data files is to upload data to data repositories like Zenodo or Open Source Framework and then share the appropriate links on GitHub.

Integration of discipline-specific applications

Many ELNs are tailored to particular disciplines and include integration of applications commonly used within those disciplines. For instance, Chemotion ELN is a free, open-source ELN designed for organic chemists, and it features chemical drawing tools, mole calculators, and integration with SciFinder and PubChem. GitHub does allow integration with many applications, but as yet does not offer chemistry-specific tools. Instead, reaction schemes can be uploaded as images, and Excel files take the place of reaction tables. Raw and processed data from chemical drawing programs and chemical analysis software can also be uploaded, allowing users to access original files. Finally, while these discipline-specific inclusions make Chemotion and similar ELNS attractive options for organic chemists, they are not well suited to any other field of research. This shortcoming is further compounded by the lack of a dedicated space for discussion, making such ELNs less appropriate for multidisciplinary knowledge sharing and collaboration.

Outlook

Perhaps the most immediately applicable future use of GitHub is to connect with new collaborators. GitHub's keyword tagging system extends beyond intra-repository linking, as each repository itself can be tagged with relevant terms, which are discoverable to the broader user base. This means that anybody interested in a given topic can quickly find others working on the same subject by following these links. Furthermore, GitHub-based ELNs are very easily shared online, encouraging participation from an audience beyond fellow GitHub users. Our work often involves sharing experimental work on Twitter and other social networks to solicit advice and publicize the project. Our GitHub ELN enables expedient sharing of both individual experiments (issues) and overviews of general synthetic approaches (wikis) with a simple URL. More broadly, GitHub's extensive array of options for communication and discussion, as well as the minimal barrier to using the site, make it straightforward for new collaborators to get involved at whatever level they wish. We also believe that GitHub has potential to extend beyond the functionality of an ELN that "seamlessly integrates the process of data collection, data processing, and data publication with minimal overheads for the researcher," as recently envisioned and outlined by Jablonka et al.[42]

Many discussions of ELNs also envisage an extension of online programs to encompass the broader experimental environment, in so-called ELEs.[43] These may include integration of certain workflows into an ELE, so that certain experimental parameters, conditions, and results can be automatically updated in and between connected ELNs.[41] Future applications in this space could expand capabilities to include functionalities of specific use for researchers; for example, integration of commonly used chemistry programs like molecular structure drawing tools would make it easier to add relevant data to GitHub. More ambitious proposals include incorporating a LIMS or existing browser-based sites like Reaxys and SciFinder, allowing researchers to quickly scan the web for information on specific substances or reactions from within an interlinked ELE. While these converging functionalities are currently beyond our capacity and in some part contingent on GitHub becoming more established as a site for hosting ELNs and other aspects of scientific research, GitHub does currently offer integration with many applications and actions to automate workflows.[44] Furthermore, GitHub is home to software developers and thus an ideal location to recruit collaborators with the requisite knowledge to develop code-based solutions.

Conclusions

Many ELNs have been developed by first studying how a given discipline uses paper notebooks or other record-keeping tools, then using this information to design a fit-for-purpose ELN.[40] Conversely, GitHub was designed for a different purpose, and we have appropriated it for use as an ELN. Our experience to date demonstrates the versatility of GitHub for use as an ELN, and showcases the practical implications of its features in a synthetic chemistry context, along with the flexibility it offers researchers seeking to share their work and collaborate with others in real time.

Importantly, it offers version control, encourages and enables the inclusion of metadata and curation, and expedites the sharing of knowledge with real-time updates and very low barriers to discussion and collaboration between interested parties. The fact that GitHub has not been designed with a single field of research in mind also makes it ideal for cross-discipline collaboration, as each discipline can adapt different elements of GitHub's functionality for their own use while maintaining the same core GitHub infrastructure. Instead of having to familiarize themselves with new ELN software and layouts, or swap between multiple ELN providers, researchers working on multidisciplinary projects can use a single, centralized service with consistent controls and familiar structure.

While there are some features which are undeniably more oriented towards coders, such as the Actions tab in which users can set up workflows using code, these features do not detract from GitHub's usefulness as an ELN, which lies mainly in its adaptability and capacity for knowledge-sharing and collaboration. To overcome the potential impediment of using Markdown, we offer guides for those looking to trial GitHub for scientific research, as well as a template repository containing an issue template with appropriate labels, and a wiki template with suggested headings and formatting. Additionally, although this work features GitHub's application in the context of open science, we note that repositories can also be made private. Such repositories are accessible only by invitation, and thus appropriate for settings in which confidentiality is required.

GitHub's practical features and free, open source nature make it an attractive alternative not just to paper-based laboratory notebooks, but also to other ELNs, which can be expensive, inflexible, exclusive, and unsuitable for openly accessible research. We therefore encourage researchers in all disciplines to trial GitHub as an ELN and to share their experiences in using it for their own projects (https://github.com/TheBreakingGoodProject/ELN-Templates/discussions/2).

Acknowledgements

We acknowledge and pay respect to the Gadigal people of the Eora Nation, the traditional owners of the land on which we research, teach, and collaborate at the University of Sydney. This work was supported by funding from the Westpac Research Fellowship (Motion), Google Impact Challenge, and the University of Sydney Drug Discovery Initiative. KJB-S gratefully acknowledges the support of the John A. Lamberton Research Scholarship. The authors acknowledge the facilities and the scientific and technical assistance of staff within the Sydney Analytical Core Research Facility and the School of Chemistry at the University of Sydney. Additionally, we acknowledge the broader members of the Open Source Malaria, Open Source Mycetoma, and Open Source Tuberculosis communities of which we are part and with whom we collaborate using GitHub.

Author contributions

K. R. S. and K. J. B.-S. contributed equally. All authors; conceptualisation. A. M.; funding acquisition. K. R. S. and K. J. B.-S.; investigation. K. R. S. and A. M.; project administration. K. R. S.; visualisation. K. R. S. and K. J. B.-S.; writing – original draft. All authors; writing – review and editing.

Data availability

This paper describes open source notebooks using GitHub as an ELN. All data relates to those shared on GitHub and can be found at https://github.com/TheBreakingGoodProject and at links within:

Conflict of interest

There are no conflicts to declare.

References

  1. Borman, Stu (23 May 1994). "Electronic Laboratory Notebooks May Revolutionize Research Record Keeping: Programs that help scientists record observations, capture and organize data, and share information with work groups may soon be more widely used" (in en). Chemical & Engineering News Archive 72 (21): 10–20. doi:10.1021/cen-v072n021.p010. ISSN 0009-2347. https://pubs.acs.org/doi/abs/10.1021/cen-v072n021.p010. 
  2. Jeschke, Jonathan M.; Lokatis, Sophie; Bartram, Isabelle; Tockner, Klement (1 June 2019). Klenk, Nicole L.. ed. "Knowledge in the dark: scientific challenges and ways forward" (in en). FACETS 4 (1): 423–441. doi:10.1139/facets-2019-0007. ISSN 2371-1671. http://www.facetsjournal.com/doi/10.1139/facets-2019-0007. 
  3. Woelfle, Michael; Olliaro, Piero; Todd, Matthew H. (1 October 2011). "Open science is a research accelerator" (in en). Nature Chemistry 3 (10): 745–748. doi:10.1038/nchem.1149. ISSN 1755-4330. https://www.nature.com/articles/nchem.1149. 
  4. Kloeckner, Frederik; Farkas, Robert; Franken, Tobias; Schmitz-Rode, Thomas (1 January 2014). "Development of a prediction model on the acceptance of electronic laboratory notebooks in academic environments". Biomedical Engineering / Biomedizinische Technik 59 (2). doi:10.1515/bmt-2013-0023. ISSN 1862-278X. https://www.degruyter.com/document/doi/10.1515/bmt-2013-0023/html. 
  5. Kanza, Samantha; Willoughby, Cerys; Gibbins, Nicholas; Whitby, Richard; Frey, Jeremy Graham; Erjavec, Jana; Zupančič, Klemen; Hren, Matjaž et al. (1 December 2017). "Electronic lab notebooks: can they replace paper?" (in en). Journal of Cheminformatics 9 (1): 31. doi:10.1186/s13321-017-0221-3. ISSN 1758-2946. PMC PMC5443717. PMID 29086051. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0221-3. 
  6. Rudolphi, Felix; Goossen, Lukas J. (27 February 2012). "Electronic Laboratory Notebook: The Academic Point of View" (in en). Journal of Chemical Information and Modeling 52 (2): 293–301. doi:10.1021/ci2003895. ISSN 1549-9596. https://pubs.acs.org/doi/10.1021/ci2003895. 
  7. 7.0 7.1 Higgins, Stuart G.; Nogiwa-Valdez, Akemi A.; Stevens, Molly M. (1 February 2022). "Considerations for implementing electronic laboratory notebooks in an academic research environment" (in en). Nature Protocols 17 (2): 179–189. doi:10.1038/s41596-021-00645-8. ISSN 1754-2189. https://www.nature.com/articles/s41596-021-00645-8. 
  8. Guerrero, Santiago; López-Cortés, Andrés; García-Cárdenas, Jennyfer M.; Saa, Pablo; Indacochea, Alberto; Armendáriz-Castillo, Isaac; Zambrano, Ana Karina; Yumiceba, Verónica et al. (9 May 2019). Ouellette, Francis. ed. "A quick guide for using Microsoft OneNote as an electronic laboratory notebook" (in en). PLOS Computational Biology 15 (5): e1006918. doi:10.1371/journal.pcbi.1006918. ISSN 1553-7358. PMC PMC6508581. PMID 31071077. https://dx.plos.org/10.1371/journal.pcbi.1006918. 
  9. Van Dyke, Aaron R.; Smith-Carpenter, Jillian (9 May 2017). "Bring Your Own Device: A Digital Notebook for Undergraduate Biochemistry Laboratory Using a Free, Cross-Platform Application" (in en). Journal of Chemical Education 94 (5): 656–661. doi:10.1021/acs.jchemed.6b00622. ISSN 0021-9584. https://pubs.acs.org/doi/10.1021/acs.jchemed.6b00622. 
  10. Walsh, Emily; Cho, Ilseung (1 June 2013). "Using Evernote as an Electronic Lab Notebook in a Translational Science Laboratory" (in en). SLAS Technology 18 (3): 229–234. doi:10.1177/2211068212471834. https://linkinghub.elsevier.com/retrieve/pii/S2472630322016181. 
  11. Bromfield Lee, Deborah (10 July 2018). "Implementation and Student Perceptions on Google Docs as an Electronic Laboratory Notebook in Organic Chemistry" (in en). Journal of Chemical Education 95 (7): 1102–1111. doi:10.1021/acs.jchemed.7b00518. ISSN 0021-9584. https://pubs.acs.org/doi/10.1021/acs.jchemed.7b00518. 
  12. Tremouilhac, Pierre; Nguyen, An; Huang, Yu-Chieh; Kotov, Serhii; Lütjohann, Dominic Sebastian; Hübsch, Florian; Jung, Nicole; Bräse, Stefan (1 December 2017). "Chemotion ELN: an Open Source electronic lab notebook for chemists in academia" (in en). Journal of Cheminformatics 9 (1): 54. doi:10.1186/s13321-017-0240-0. ISSN 1758-2946. PMC PMC5612905. PMID 29086216. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0240-0. 
  13. 13.0 13.1 Milsted, Andrew J.; Hale, Jennifer R.; Frey, Jeremy G.; Neylon, Cameron (23 July 2013). Smalheiser, Neil R.. ed. "LabTrove: A Lightweight, Web Based, Laboratory “Blog” as a Route towards a Marked Up Record of Work in a Bioscience Research Laboratory" (in en). PLoS ONE 8 (7): e67460. doi:10.1371/journal.pone.0067460. ISSN 1932-6203. PMC PMC3720848. PMID 23935832. https://dx.plos.org/10.1371/journal.pone.0067460. 
  14. Patiny, Luc; Zasso, Michaël; Kostro, Daniel; Bernal, Andrés; Castillo, Andrés M.; Bolaños, Alejandro; Asencio, Miguel A.; Pellet, Norman et al. (1 June 2018). "The C6H6 NMR repository: An integral solution to control the flow of your data from the magnet to the public" (in en). Magnetic Resonance in Chemistry 56 (6): 520–528. doi:10.1002/mrc.4669. ISSN 0749-1581. https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/mrc.4669. 
  15. Williamson, Alice E.; Ylioja, Paul M.; Robertson, Murray N.; Antonova-Koch, Yevgeniya; Avery, Vicky; Baell, Jonathan B.; Batchu, Harikrishna; Batra, Sanjay et al. (26 October 2016). "Open Source Drug Discovery: Highly Potent Antimalarial Compounds Derived from the Tres Cantos Arylpyrroles" (in en). ACS Central Science 2 (10): 687–701. doi:10.1021/acscentsci.6b00086. ISSN 2374-7943. PMC PMC5084075. PMID 27800551. https://pubs.acs.org/doi/10.1021/acscentsci.6b00086. 
  16. Lim, Wilson; Melse, Youri; Konings, Mickey; Phat Duong, Hung; Eadie, Kimberly; Laleu, Benoît; Perry, Benjamin; Todd, Matthew H. et al. (26 April 2018). Reynolds, Todd B.. ed. "Addressing the most neglected diseases through an open research model: The discovery of fenarimols as novel drug candidates for eumycetoma" (in en). PLOS Neglected Tropical Diseases 12 (4): e0006437. doi:10.1371/journal.pntd.0006437. ISSN 1935-2735. PMC PMC5940239. PMID 29698504. https://dx.plos.org/10.1371/journal.pntd.0006437. 
  17. Robertson, Murray N.; Ylioja, Paul M.; Williamson, Alice E.; Woelfle, Michael; Robins, Michael; Badiola, Katrina A.; Willis, Paul; Olliaro, Piero et al. (1 January 2014). "Open source drug discovery – A limited tutorial" (in en). Parasitology 141 (1): 148–157. doi:10.1017/S0031182013001121. ISSN 0031-1820. PMC PMC3884843. PMID 23985301. https://www.cambridge.org/core/product/identifier/S0031182013001121/type/journal_article. 
  18. Vicente-Saez, Ruben; Martinez-Fuentes, Clara (1 July 2018). "Open Science now: A systematic literature review for an integrated definition" (in en). Journal of Business Research 88: 428–436. doi:10.1016/j.jbusres.2017.12.043. https://linkinghub.elsevier.com/retrieve/pii/S0148296317305441. 
  19. Todd, Matthew H. (6 November 2019). "Six Laws of Open Source Drug Discovery" (in en). ChemMedChem 14 (21): 1804–1809. doi:10.1002/cmdc.201900565. ISSN 1860-7179. PMC PMC6899868. PMID 31612602. https://chemistry-europe.onlinelibrary.wiley.com/doi/10.1002/cmdc.201900565. 
  20. 20.0 20.1 20.2 Badiola, Katrina A.; Bird, Colin; Brocklesby, William S.; Casson, John; Chapman, Richard T.; Coles, Simon J.; Cronshaw, James R.; Fisher, Adam et al. (2015). "Experiences with a researcher-centric ELN" (in en). Chemical Science 6 (3): 1614–1629. doi:10.1039/C4SC02128B. ISSN 2041-6520. PMC PMC5639792. PMID 29308130. http://xlink.rsc.org/?DOI=C4SC02128B. 
  21. Kouzes, R.T.; Myers, J.D.; Wulf, W.A. (Aug./1996). "Collaboratories: doing science on the Internet". Computer 29 (8): 40–46. doi:10.1109/2.532044. http://ieeexplore.ieee.org/document/532044/. 
  22. GitHub (2020). "The 2020 State of the Octoverse". GitHub. https://arxiv.org/ftp/arxiv/papers/2110/2110.10248.pdf. 
  23. 23.0 23.1 Bird, Colin L.; Frey, Jeremy G. (2013). "Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences" (in en). Chemical Society Reviews 42 (16): 6754. doi:10.1039/c3cs60050e. ISSN 0306-0012. http://xlink.rsc.org/?DOI=c3cs60050e. 
  24. Kwok, Roberta (1 August 2018). "How to pick an electronic laboratory notebook" (in en). Nature 560 (7717): 269–270. doi:10.1038/d41586-018-05895-3. ISSN 0028-0836. https://www.nature.com/articles/d41586-018-05895-3. 
  25. Colabroy, Keri; Bell, Jessica K. (1 January 2019), Bussey, Thomas J.; Linenberger Cortes, Kimberly; Austin, Rodney C., eds., "Lab eNotebooks" (in en), ACS Symposium Series (Washington, DC: American Chemical Society) 1337: 173–195, doi:10.1021/bk-2019-1337.ch008, ISBN 978-0-8412-3633-2, https://pubs.acs.org/doi/abs/10.1021/bk-2019-1337.ch008. Retrieved 2024-01-10 
  26. Hart, Edmund M.; Barmby, Pauline; LeBauer, David; Michonneau, François; Mount, Sarah; Mulrooney, Patrick; Poisot, Timothée; Woo, Kara H. et al. (20 October 2016). Markel, Scott. ed. "Ten Simple Rules for Digital Data Storage" (in en). PLOS Computational Biology 12 (10): e1005097. doi:10.1371/journal.pcbi.1005097. ISSN 1553-7358. PMC PMC5072699. PMID 27764088. https://dx.plos.org/10.1371/journal.pcbi.1005097. 
  27. 27.0 27.1 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  28. Bechhofer, Sean; Buchan, Iain; De Roure, David; Missier, Paolo; Ainsworth, John; Bhagat, Jiten; Couch, Philip; Cruickshank, Don et al. (1 February 2013). "Why linked data is not enough for scientists" (in en). Future Generation Computer Systems 29 (2): 599–611. doi:10.1016/j.future.2011.08.004. https://linkinghub.elsevier.com/retrieve/pii/S0167739X11001439. 
  29. 29.0 29.1 Solle, Dörte (1 July 2020). "Be FAIR to your data" (in en). Analytical and Bioanalytical Chemistry 412 (17): 3961–3965. doi:10.1007/s00216-020-02526-7. ISSN 1618-2642. PMC PMC7320032. PMID 32300841. https://link.springer.com/10.1007/s00216-020-02526-7. 
  30. Frey, Jeremy (2 December 2008). "Curation of Laboratory Experimental Data as Part of the Overall Data Lifecycle". International Journal of Digital Curation 3 (1): 44–62. doi:10.2218/ijdc.v3i1.41. ISSN 1746-8256. http://ijdc.net/article/view/62. 
  31. Willoughby, Cerys; Bird, Colin L.; Coles, Simon J.; Frey, Jeremy G. (22 December 2014). "Creating Context for the Experiment Record. User-Defined Metadata: Investigations into Metadata Usage in the LabTrove ELN" (in en). Journal of Chemical Information and Modeling 54 (12): 3268–3283. doi:10.1021/ci500469f. ISSN 1549-9596. https://pubs.acs.org/doi/10.1021/ci500469f. 
  32. Willoughby, Cerys; Logothetis, Thomas A.; Frey, Jeremy G. (1 December 2016). "Effects of using structured templates for recalling chemistry experiments" (in en). Journal of Cheminformatics 8 (1): 9. doi:10.1186/s13321-016-0118-6. ISSN 1758-2946. PMC PMC4759737. PMID 26900406. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-016-0118-6. 
  33. Buck, Stuart (26 June 2015). "Solving reproducibility" (in en). Science 348 (6242): 1403–1403. doi:10.1126/science.aac8041. ISSN 0036-8075. https://www.science.org/doi/10.1126/science.aac8041. 
  34. Resnik, David B.; Shamoo, Adil E. (17 February 2017). "Reproducibility and Research Integrity" (in en). Accountability in Research 24 (2): 116–123. doi:10.1080/08989621.2016.1257387. ISSN 0898-9621. PMC PMC5244822. PMID 27820655. https://www.tandfonline.com/doi/full/10.1080/08989621.2016.1257387. 
  35. Schapira, Matthieu; The Open Lab Notebook Consortium; Harding, Rachel J. (2 April 2019). "Open laboratory notebooks: good for science, good for society, good for scientists" (in en). F1000Research 8: 87. doi:10.12688/f1000research.17710.2. ISSN 2046-1402. PMC PMC6694453. PMID 31448096. https://f1000research.com/articles/8-87/v2. 
  36. Munafò, Marcus R.; Nosek, Brian A.; Bishop, Dorothy V. M.; Button, Katherine S.; Chambers, Christopher D.; Percie du Sert, Nathalie; Simonsohn, Uri; Wagenmakers, Eric-Jan et al. (10 January 2017). "A manifesto for reproducible science" (in en). Nature Human Behaviour 1 (1): 0021. doi:10.1038/s41562-016-0021. ISSN 2397-3374. PMC PMC7610724. PMID 33954258. https://www.nature.com/articles/s41562-016-0021. 
  37. Speicher, D.; Cremers, A. B. (2020). "Computational Notebooks in Public Repositories". IPSI Transactions on Internet Research 16 (1): 38–44. http://tir.ipsitransactions.org/indexTIR_all.php. 
  38. Wang, Aiwu; Kang, Fengwen; Wang, Zhigang; Shao, Qingguo; Li, Zhe; Zhu, Guangyu; Lu, Jian; Li, Yang Yang (1 March 2019). "Facile Synthesis of Nitrogen‐Rich Carbon Dots as Fertilizers for Mung Bean Sprouts" (in en). Advanced Sustainable Systems 3 (3): 1800132. doi:10.1002/adsu.201800132. ISSN 2366-7486. https://onlinelibrary.wiley.com/doi/10.1002/adsu.201800132. 
  39. Harding, Rachel J. (28 January 2019). "Open notebook science can maximize impact for rare disease projects" (in en). PLOS Biology 17 (1): e3000120. doi:10.1371/journal.pbio.3000120. ISSN 1545-7885. PMC PMC6366684. PMID 30689629. https://dx.plos.org/10.1371/journal.pbio.3000120. 
  40. 40.0 40.1 Bird, Colin L.; Willoughby, Cerys; Frey, Jeremy G. (2013). "Laboratory notebooks in the digital era: the role of ELNs in record keeping for chemistry and other sciences" (in en). Chemical Society Reviews 42 (20): 8157. doi:10.1039/c3cs60122f. ISSN 0306-0012. http://xlink.rsc.org/?DOI=c3cs60122f. 
  41. 41.0 41.1 Piccione, Patrick M. (1 April 2020). "Systematizing scientific laboratory work by a workflow and template for electronic laboratory notebooks" (in en). Education for Chemical Engineers 31: 42–53. doi:10.1016/j.ece.2020.03.004. https://linkinghub.elsevier.com/retrieve/pii/S174977282030021X. 
  42. Jablonka, Kevin Maik; Patiny, Luc; Smit, Berend (1 April 2022). "Making the collective knowledge of chemistry open and machine actionable" (in en). Nature Chemistry 14 (4): 365–376. doi:10.1038/s41557-022-00910-7. ISSN 1755-4330. https://www.nature.com/articles/s41557-022-00910-7. 
  43. Taylor, Keith T. (16 May 2011), Ekins, Sean; Hupcey, Maggie A. Z.; Williams, Antony J., eds., "Evolution of Electronic Laboratory Notebooks" (in en), Collaborative Computational Technologies for Biomedical Research (Wiley): 301–320, doi:10.1002/9781118026038.ch19, ISBN 978-0-470-63803-3, https://onlinelibrary.wiley.com/doi/10.1002/9781118026038.ch19. Retrieved 2024-01-10 
  44. "Extend GitHub". GitHub, Inc.. 2022. https://github.com/marketplace. Retrieved 01 April 2022. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. In some cases important information was missing from the references, and that information was added. The footnote at the end of the original version was turned into a formal citation for this version.