Difference between revisions of "Journal:ROBOT: A Tool for Automating Ontology Workflows"
Shawndouglas (talk | contribs) (Saving and adding more.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 52: | Line 52: | ||
Slimmer<ref name="GitHubSlimmer">{{cite web |url=https://github.com/enanomapper/slimmer/ |title=enanomapper / slimmer |work=GitHub |accessdate=21 May 2019}}</ref>, created as part of the eNanoMapper ontology project<ref name="HastingsEnano15">{{cite journal |title=eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment |journal=Journal of Biomedical Semantics |author=Hastings, J.; Jeliazkova, N.; Owen, G. et al. |volume=6 |at=10 |year=2015 |doi=10.1186/s13326-015-0005-5 |pmid=25815161 |pmc=PMC4374589}}</ref>, uses OWL API to create ontology subsets (also known as “slims”). A configuration file allows the user to specify which terms to include and which annotations to include on those terms. OntoPilot<ref name="StuckyOnto17">{{cite journal |title=OntoPilot: New Software to Simplify and Accelerate Ontology Development and Deployment |journal=Biodiversity Information Science and Standards |author=Stucky, B.J.; Luc, A. |volume=1 |at=e20192 |year=2017 |doi=10.3897/tdwgproceedings.1.20192}}</ref>, developed for the Plant Phenotype Ontology, uses OWL API via Jython (a version of Python that runs on the Java Virtual Machine) to provide an integrated ontology development framework, including term imports, term creation, releases, and documentation. | Slimmer<ref name="GitHubSlimmer">{{cite web |url=https://github.com/enanomapper/slimmer/ |title=enanomapper / slimmer |work=GitHub |accessdate=21 May 2019}}</ref>, created as part of the eNanoMapper ontology project<ref name="HastingsEnano15">{{cite journal |title=eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment |journal=Journal of Biomedical Semantics |author=Hastings, J.; Jeliazkova, N.; Owen, G. et al. |volume=6 |at=10 |year=2015 |doi=10.1186/s13326-015-0005-5 |pmid=25815161 |pmc=PMC4374589}}</ref>, uses OWL API to create ontology subsets (also known as “slims”). A configuration file allows the user to specify which terms to include and which annotations to include on those terms. OntoPilot<ref name="StuckyOnto17">{{cite journal |title=OntoPilot: New Software to Simplify and Accelerate Ontology Development and Deployment |journal=Biodiversity Information Science and Standards |author=Stucky, B.J.; Luc, A. |volume=1 |at=e20192 |year=2017 |doi=10.3897/tdwgproceedings.1.20192}}</ref>, developed for the Plant Phenotype Ontology, uses OWL API via Jython (a version of Python that runs on the Java Virtual Machine) to provide an integrated ontology development framework, including term imports, term creation, releases, and documentation. | ||
The lack of automation seen circa 2010 led directly to a lack of standardization, with each ontology editor or group adopting a slightly different approach to manual editing in Protégé. This diversity of practices, even within the OBO community, made it a challenge to develop tools to serve multiple ontology projects. OWLTools<ref name="GitHubOwlTools">{{cite web |url=https://github.com/owlcollab/owltools |title=owlcollab / owltools |work=GitHub |accessdate=21 May 2019}}</ref> was designed for use by multiple OBO ontology projects, providing convenience methods on top of the OWL API. OWLTools includes the OBO Ontology Release Tool (OORT)<ref name="GitHubOwlTools">{{cite web |url=https://github.com/owlcollab/owltools/wiki/Oort-Intro |title=owlcollab / owltools : Oort Intro |work=GitHub |accessdate=21 May 2019}}</ref>, a command-line tool to release OWL- and OBO-format ontologies. OORT provides a series of basic commands to create a release pipeline for an ontology, including module extraction with MIREOT, support for multiple input ontologies, reasoning, and creation of "main" and "simple" release products. | The lack of automation seen circa 2010 led directly to a lack of standardization, with each ontology editor or group adopting a slightly different approach to manual editing in Protégé. This diversity of practices, even within the OBO community, made it a challenge to develop tools to serve multiple ontology projects. OWLTools<ref name="GitHubOwlTools">{{cite web |url=https://github.com/owlcollab/owltools |title=owlcollab / owltools |work=GitHub |accessdate=21 May 2019}}</ref> was designed for use by multiple OBO ontology projects, providing convenience methods on top of the OWL API. OWLTools includes the OBO Ontology Release Tool (OORT)<ref name="GitHubOwlTools-Oort">{{cite web |url=https://github.com/owlcollab/owltools/wiki/Oort-Intro |title=owlcollab / owltools : Oort Intro |work=GitHub |accessdate=21 May 2019}}</ref>, a command-line tool to release OWL- and OBO-format ontologies. OORT provides a series of basic commands to create a release pipeline for an ontology, including module extraction with MIREOT, support for multiple input ontologies, reasoning, and creation of "main" and "simple" release products. | ||
ROBOT (a recursive acronym for “ROBOT is an OBO Tool”) was developed to replace OWLTools and OORT with a more modular and maintainable code base. It builds on previous experience to include a comprehensive set of automation capabilities to support an even wider range of OBO projects. Development began in 2015 and continues today, with more than 1000 commits from a dozen contributors. ROBOT is freely available open-source software. Although we do not track our users, a recent GitHub search shows that at least 26 ontology projects in the OBO community have adopted ROBOT. | ROBOT (a recursive acronym for “ROBOT is an OBO Tool”) was developed to replace OWLTools and OORT with a more modular and maintainable code base. It builds on previous experience to include a comprehensive set of automation capabilities to support an even wider range of OBO projects. Development began in 2015 and continues today, with more than 1000 commits from a dozen contributors. ROBOT is freely available open-source software. Although we do not track our users, a recent GitHub search shows that at least 26 ontology projects in the OBO community have adopted ROBOT. | ||
Line 58: | Line 58: | ||
==Implementation== | ==Implementation== | ||
===Overview=== | ===Overview=== | ||
ROBOT provides a standardized yet configurable way to support the ontology development lifecycle via a library of common high-level functionality and a command-line interface. ROBOT builds on OWL API and is compatible with all ontology syntaxes that OWL API supports: RDF/XML, OWL/XML, Turtle, OWL Functional Syntax, OWL Manchester Syntax, and OBO format. The source code is written in Java and is available from our GitHub repository<ref name="GitHubROBOT">{{cite web |url=https://github.com/ontodev/robot |title=ontodev / robot |work=GitHub |accessdate=09 October 2018}}</ref> under an open source (BSD 3) license. It is also released as a Java library on Maven Central. ROBOT code can be used from any programming language that runs on the Java Virtual Machine (JVM). The command-line tool is packaged as a JAR file that can be run on Unix (including macOS and Linux), Windows, and other platforms supported by the JVM. This JAR file is available for download from the ROBOT GitHub site<ref name="GitHubRobot" />, along with platform-specific scripts for using ROBOT from the command line. Installation instructions and documentation are available from the [http://robot.obolibrary.org/ OBO Library]. | |||
===Architecture=== | |||
We previously described the basic architecture of the tool<ref name="OvertonROBOT15">{{cite journal |title=ROBOT: A command-line tool for ontology development |journal=Proceedings of the 2015 International Conference on Biomedical Ontology |author=Overton, J.A.; Dietze, H.; Essaid, S. et al. |pages=1–2 |year=2015 |url=http://ceur-ws.org/Vol-1515/}}</ref>, which we summarize here. | |||
The ROBOT source code consists of two parts: ‘robot-core’ and ‘robot-command.’ ‘robot-core’ is a library supporting common ontology development tasks, which we call “operations.” ‘robot-command’ provides a command-line interface divided into “commands,” each of which wraps a ‘robot-core’ operation. | |||
Most ROBOT operations package low-level functionality provided by OWL API into high-level functionality common to ontology development workflows in the OBO community. For best compatibility, we aim to match the exact version of OWL API used by ROBOT with the exact version used by the latest Protégé release. Some operations use Apache Jena.<ref name="CarrollJena04">{{cite journal |title=Jena: Implementing the semantic web recommendations |journal=WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters |author=Carroll, J.J.; Dickinson. I.; Dollin, C. et al. |pages=74–83 |year=2004 |doi=10.1145/1013367.1013381}}</ref> Each operation works with Java objects that represent OWL ontologies, OWL reasoners, OWL classes, etc., while each command works with command-line option strings and files. The commands also perform various conversion and validation steps. The command-line interface uses the Apache Commons CLI library<ref name="ApacheCommons">{{cite web |url=https://commons.apache.org/proper/commons-cli/ |title=Commons CLI |work=Apache Commons |accessdate=23 May 2019}}</ref> for parsing commands. | |||
==References== | ==References== |
Revision as of 19:05, 17 February 2020
Full article title | ROBOT: A Tool for Automating Ontology Workflows |
---|---|
Journal | BMC Bioinformatics |
Author(s) | Jackson, Rebecca C.; Balhoff, James, P.; Douglass, Eric; Harris, Nomi L.; Mungall, Christopher J.; Overton, James A. |
Author affiliation(s) | Knocean, Inc.; University of North Carolina; Lawrence Berkeley National Laboratory |
Primary contact | Email: via SpringerLink |
Year published | 2019 |
Volume and issue | 20 |
Page(s) | 407 |
DOI | 10.1186/s12859-019-3002-3 |
ISSN | 1471-2105 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://link.springer.com/article/10.1186/s12859-019-3002-3 |
Download | https://link.springer.com/content/pdf/10.1186/s12859-019-3002-3.pdf (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
Background: Ontologies are invaluable in the life sciences, but building and maintaining ontologies often requires a challenging number of distinct tasks such as running automated reasoners and quality control checks, extracting dependencies and application-specific subsets, generating standard reports, and generating release files in multiple formats. Similar to more general software development, automation is the key to executing and managing these tasks effectively and to releasing more robust products in standard forms.
For ontologies using the Web Ontology Language (OWL), the OWL API (application programming interface) Java library is the foundation for a range of software tools, including the Protégé ontology editor. In the Open Biological and Biomedical Ontologies (OBO) community, we recognized the need to package a wide range of low-level OWL API functionality into a library of common higher-level operations and to make those operations available as a command-line tool.
Results: ROBOT (a recursive acronym for “ROBOT is an OBO Tool”) is an open-source library and command-line tool for automating ontology development tasks. The library can be called from any programming language that runs on the Java Virtual Machine (JVM). Most usage is through the command-line tool, which runs on macOS, Linux, and Windows. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks. These commands can be combined into larger workflows using a separate task execution system such as GNU Make, and workflows can be automatically executed within continuous integration systems.
Conclusions: ROBOT supports automation of a wide range of ontology development tasks, focusing on OBO conventions. It packages common high-level ontology development functionality into a convenient library and makes it easy to configure, combine, and execute individual tasks in comprehensive, automated workflows. This helps ontology developers to efficiently create, maintain, and release high-quality ontologies so they can spend more time focusing on development tasks. It also helps guarantee released ontologies are free of certain types of logical errors and conform to standard quality control checks, increasing the overall robustness and efficiency of the ontology development lifecycle.
Keywords: ontology development, automation, ontology release, reasoning, workflows, quality control, import management
Background
Ontologies are vital parts of the informatics ecosystem, supporting life science research, enabling analysis of high-throughput datasets, data standardization and integration, search, and discovery. However, there is a lack of tools supporting the complete ontology development lifecycle, especially when compared with the software development lifecycle. This has resulted in many groups developing their own ad hoc ontology development workflows, often with time-consuming and inefficient manual steps. In some cases, groups release ontologies without any kind of systematic workflow or quality control process, which can result in errors or problems with downstream applications or analyses.
Noy et al. (2010) describe a general ontology lifecycle, with a focus on bio-ontologies.[1] First, requirements for the ontology are gathered. Then, the ontology is collaboratively developed in an ontology editor such as Protégé.[2] Once the requirements have been fulfilled, the ontology is published and feedback is solicited from the community. Feedback is integrated back into development, and the ontology is continuously updated and released. At any point after the initial publication, the ontology may be deployed in other applications.
In broad strokes, this ontology development lifecycle reflects much of our experience of ontology development in the Open Biological and Biomedical Ontologies (OBO) community[3], circa 2010. A wide range of Semantic Web-based software exists to support these steps, including many tools for Web Ontology Language (OWL) ontology development. In practice, though, the OBO community has relied predominantly on the free and open-source Protégé OWL editor for manual editing and conversion, and on a small set of other tools supporting OBO conventions.
Other than Protégé, the most prominent suite of tools used by the OBO community has been the Onto-animal suite, developed by the He Group[4], which includes Ontobee[5], Ontofox[6], and Ontorat.[7] These tools are free web services backed by a Virtuoso triplestore loaded with the latest version of all available OBO community ontologies, as well as some other ontologies. Ontobee is an ontology term browser. Ontofox implements the MIREOT term extraction method.[8] Ontorat implements template-based ontology term creation. Together with a few other tools, these support an extensible ontology development strategy[9] covering a range of ontology development tasks, many of which can combined and automated using a sequence of web-based application programming interface (API) calls.
The core operations of the Onto-Animal suite are driven by SPARQL queries against the centralized triplestore. This results in a number of limitations. First, only the specific version of each ontology loaded into that triplestore can be used. This is a particularly severe limitation during ontology development. Second, processing is done on the centralized server, limiting the processing power available to any user. Third, SPARQL has limited utility when working with OWL logical axioms.
These limitations are mitigated by running software locally, loading the desired versions of the desired ontologies, and using OWL API[10] for OWL-native processing. A number of tools used in the OBO community have done precisely this. We have seen a spectrum of development, from tools that are focused on a single project, to tools used by a dozen related projects, to the current push for tools that are shared across the OBO community.
Slimmer[11], created as part of the eNanoMapper ontology project[12], uses OWL API to create ontology subsets (also known as “slims”). A configuration file allows the user to specify which terms to include and which annotations to include on those terms. OntoPilot[13], developed for the Plant Phenotype Ontology, uses OWL API via Jython (a version of Python that runs on the Java Virtual Machine) to provide an integrated ontology development framework, including term imports, term creation, releases, and documentation.
The lack of automation seen circa 2010 led directly to a lack of standardization, with each ontology editor or group adopting a slightly different approach to manual editing in Protégé. This diversity of practices, even within the OBO community, made it a challenge to develop tools to serve multiple ontology projects. OWLTools[14] was designed for use by multiple OBO ontology projects, providing convenience methods on top of the OWL API. OWLTools includes the OBO Ontology Release Tool (OORT)[15], a command-line tool to release OWL- and OBO-format ontologies. OORT provides a series of basic commands to create a release pipeline for an ontology, including module extraction with MIREOT, support for multiple input ontologies, reasoning, and creation of "main" and "simple" release products.
ROBOT (a recursive acronym for “ROBOT is an OBO Tool”) was developed to replace OWLTools and OORT with a more modular and maintainable code base. It builds on previous experience to include a comprehensive set of automation capabilities to support an even wider range of OBO projects. Development began in 2015 and continues today, with more than 1000 commits from a dozen contributors. ROBOT is freely available open-source software. Although we do not track our users, a recent GitHub search shows that at least 26 ontology projects in the OBO community have adopted ROBOT.
Implementation
Overview
ROBOT provides a standardized yet configurable way to support the ontology development lifecycle via a library of common high-level functionality and a command-line interface. ROBOT builds on OWL API and is compatible with all ontology syntaxes that OWL API supports: RDF/XML, OWL/XML, Turtle, OWL Functional Syntax, OWL Manchester Syntax, and OBO format. The source code is written in Java and is available from our GitHub repository[16] under an open source (BSD 3) license. It is also released as a Java library on Maven Central. ROBOT code can be used from any programming language that runs on the Java Virtual Machine (JVM). The command-line tool is packaged as a JAR file that can be run on Unix (including macOS and Linux), Windows, and other platforms supported by the JVM. This JAR file is available for download from the ROBOT GitHub site[17], along with platform-specific scripts for using ROBOT from the command line. Installation instructions and documentation are available from the OBO Library.
Architecture
We previously described the basic architecture of the tool[18], which we summarize here.
The ROBOT source code consists of two parts: ‘robot-core’ and ‘robot-command.’ ‘robot-core’ is a library supporting common ontology development tasks, which we call “operations.” ‘robot-command’ provides a command-line interface divided into “commands,” each of which wraps a ‘robot-core’ operation.
Most ROBOT operations package low-level functionality provided by OWL API into high-level functionality common to ontology development workflows in the OBO community. For best compatibility, we aim to match the exact version of OWL API used by ROBOT with the exact version used by the latest Protégé release. Some operations use Apache Jena.[19] Each operation works with Java objects that represent OWL ontologies, OWL reasoners, OWL classes, etc., while each command works with command-line option strings and files. The commands also perform various conversion and validation steps. The command-line interface uses the Apache Commons CLI library[20] for parsing commands.
References
- ↑ Noy, N.; Tudorache, T.; Nyulas, C. et al. (2010). "The ontology life cycle: Integrated tools for editing, publishing, peer review, and evolution of ontologies". AMIA Annual Symposium Proceedings 2010: 552–6. PMC PMC3041389. PMID 21347039. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041389.
- ↑ Horridge, M.; Tsarkov, D. (2006). "Supporting early adoption of OWL 1.1 with Protégé-OWL and FaCT++". OWL: Experiences and Directions 2006: 1–7. http://webont.org/owled/2006/accepted06.html.
- ↑ Smith, B.; Ashburner, M.; Rosse, C. et al. (2007). "The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology 25 (11): 1251–5. doi:10.1038/nbt1346. PMC PMC2814061. PMID 17989687. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2814061.
- ↑ He, Y.; Zheng, J.; Lin, Y. (2015). "Onto-animal tools for reusing ontologies, generating and editing ontology terms, and dereferencing ontology terms". Proceedings of the 2015 International Conference on Biomedical Ontology: 1–2. http://ceur-ws.org/Vol-1515/.
- ↑ Xiang, Z.; Mungall, C.; Ruttenberg, A. et al. (2012). "Ontobee: A Linked Data Server and Browser for Ontology Terms". Proceedings of the Second International Conference on Biomedical Ontology: 279–81. http://ceur-ws.org/Vol-833/.
- ↑ Xiang, Z.; Courtot, M.; Brinkman, R.R. et al. (2010). "OntoFox: Web-based support for ontology reuse". BMC Research Notes 3: 175. doi:10.1186/1756-0500-3-175. PMC PMC2911465. PMID 20569493. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2911465.
- ↑ Xiang, Z.; Zheng, J.; Lin. Y. et al. (2015). "Ontorat: Automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns". Journal of Biomedical Semantics 6: 4. doi:10.1186/2041-1480-6-4. PMC PMC4362828. PMID 25785185. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4362828.
- ↑ Courtot, M.; Gibson, F.; Lister, A. L. et al. (2011). "MIREOT: The minimum information to reference an external ontology term". Applied Ontology 6 (1): 23–33. doi:10.3233/AO-2011-0087.
- ↑ He, Y.; Xiang, Z.; Zheng, J. et al. (2018). "The eXtensible ontology development (XOD) principles and tool implementation to support ontology interoperabilitye". Journal of Biomedical Semantics 9: 3. doi:10.1186/s13326-017-0169-2.
- ↑ Horridge, M.; Bechhofer, S.; Noppens, O. (2007). "Igniting the OWL 1.1 Touch Paper: The OWL API". OWL: Experiences and Directions 2007: 1–9. http://webont.org/owled/2007/Proceedings.html.
- ↑ "enanomapper / slimmer". GitHub. https://github.com/enanomapper/slimmer/. Retrieved 21 May 2019.
- ↑ Hastings, J.; Jeliazkova, N.; Owen, G. et al. (2015). "eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment". Journal of Biomedical Semantics 6: 10. doi:10.1186/s13326-015-0005-5. PMC PMC4374589. PMID 25815161. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4374589.
- ↑ Stucky, B.J.; Luc, A. (2017). "OntoPilot: New Software to Simplify and Accelerate Ontology Development and Deployment". Biodiversity Information Science and Standards 1: e20192. doi:10.3897/tdwgproceedings.1.20192.
- ↑ "owlcollab / owltools". GitHub. https://github.com/owlcollab/owltools. Retrieved 21 May 2019.
- ↑ "owlcollab / owltools : Oort Intro". GitHub. https://github.com/owlcollab/owltools/wiki/Oort-Intro. Retrieved 21 May 2019.
- ↑ "ontodev / robot". GitHub. https://github.com/ontodev/robot. Retrieved 09 October 2018.
- ↑ Cite error: Invalid
<ref>
tag; no text was provided for refs namedGitHubRobot
- ↑ Overton, J.A.; Dietze, H.; Essaid, S. et al. (2015). "ROBOT: A command-line tool for ontology development". Proceedings of the 2015 International Conference on Biomedical Ontology: 1–2. http://ceur-ws.org/Vol-1515/.
- ↑ Carroll, J.J.; Dickinson. I.; Dollin, C. et al. (2004). "Jena: Implementing the semantic web recommendations". WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters: 74–83. doi:10.1145/1013367.1013381.
- ↑ "Commons CLI". Apache Commons. https://commons.apache.org/proper/commons-cli/. Retrieved 23 May 2019.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference.