Journal:FAIR and interactive data graphics from a scientific knowledge graph
Full article title | FAIR and interactive data graphics from a scientific knowledge graph |
---|---|
Journal | Scientific Data |
Author(s) | Deagen, Michael E.; McCusker, Jamie P.; Fateye, Tolulomo; Stouffer, Samuel; Brinson, L. Cate; McGuinness, Deborah L.; Schadler, Linda S. |
Author affiliation(s) | University of Vermont, Rensselaer Polytechnic Institute, Duke University |
Primary contact | Email: mdeagen at mit dot edu |
Year published | 2022 |
Volume and issue | 9 |
Article # | 239 |
DOI | 10.1038/s41597-022-01352-z |
ISSN | 2052-4463 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.nature.com/articles/s41597-022-01352-z |
Download | https://www.nature.com/articles/s41597-022-01352-z.pdf (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
Graph databases capture richly linked domain knowledge by integrating heterogeneous data and metadata into a unified representation. Here, we present the use of bespoke, interactive data graphics (e.g., bar charts, scatter plots, etc.) for visual exploration of a knowledge graph. By modeling a chart as a set of metadata that describes semantic context (SPARQL query) separately from visual context (Vega-Lite specification), we leverage the high-level, declarative nature of the SPARQL and Vega-Lite grammars to concisely specify web-based, interactive data graphics synchronized to a knowledge graph. Resources with dereferenceable uniform resource identifiers (URIs) can employ the hyperlink encoding channel or image marks in Vega-Lite to amplify the information content of a given data graphic, and published charts populate a browsable gallery of the database. We discuss design considerations that arise in relation to portability, persistence, and performance. Altogether, this pairing of SPARQL and Vega-Lite—demonstrated here in the domain of polymer nanocomposite materials science—offers an extensible approach to FAIR (findable, accessible, interoperable, reusable) scientific data visualization within a knowledge graph framework.
Keywords: FAIR, graph database, knowledge graph, materials science, research management
Introduction
From early cartography to modern digital interfaces, data visualization—the display of abstract information in graphical form—has helped humans navigate unknown and complex spaces with a history of conceptual advancements alongside innovations in printing and reproduction. [1] Today, the widespread availability of digitized information, and the ability to process and display it with computers and web browsers, has brought interaction to the fore as a facilitator of higher-level cognitive processing on multidimensional datasets. [2] Interactive data visualization supports human reasoning and understanding through iterative exploration and investigation. [3] Given the deluge of data in many scientific domains, human-interpretable means for managing, troubleshooting, and disseminating information—particularly those that preserve machine-interpretability—remain essential in scientific research. This article illustrates such an approach, on a knowledge graph database, through the combination of a robust visualization grammar (Vega-Lite) and the query language for the semantic web (SPARQL) (Fig. 1).
|
In response to challenges around the reuse of scholarly data [4], scientific communities have mobilized around a set of four guiding principles for data management: ensuring that data is findable, accessible, interoperable, and reusable . [5] Known by the acronym FAIR, these principles aim to preserve the value of digital assets through machine-interpretable metadata standards and schema. In the materials science domain, the FAIR guiding principles have been embraced by numerous data resources and repositories, ushering in the development of modern data infrastructures for materials research. [6,7,8,9,10] The backbone and nervous system for these and other scientific data infrastructures build upon the foundation of the World Wide Web (WWW).
Since the early vision of the semantic web to make data on the internet machine-interpretable [11], the WWW has evolved from a repository of linked documents to an omnipresent medium for information exchange. The Resource Description Framework (RDF), a metadata model for the semantic web, captures knowledge through expressions known as triples, each comprising two nodes and a directional edge, that form a directed graph-based data representation inside a database, or triple store. SPARQL, a query language for RDF, uses graph-based expressions to retrieve sets of matches, or bindings, of variables in a graph pattern to content in a triple store. In the case of SELECT queries in SPARQL, sets of bindings take on a tabular form. The RDF model achieves interoperability through shared ontologies, or structured vocabularies that form the basis for capturing and reasoning over domain knowledge. Graph databases, such as knowledge graphs [12], can build on the infrastructure of the internet by using uniform resource identifiers (URIs) that follow the well-established hypertext transfer protocol (HTTP) to ensure global uniqueness. Contrary to digital object identifiers (DOIs), which represent digital resources, URIs can represent anything (e.g., physical objects, abstract concepts, etc.). However, similar to the way a DOI is accessible via redirection when “https://dx.doi.org/” is placed in front, URIs can serve representations in a process known as dereferencing, offering a way to capture information stored elsewhere on the web. Despite challenges around the implementation of truly distributed knowledge representations [13], this extensible data and metadata format shows promise as a FAIR mechanism for storing and linking scientific data.
References
Notes
This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.