Journal:FAIR and interactive data graphics from a scientific knowledge graph

Full article title	FAIR and interactive data graphics from a scientific knowledge graph
Journal	Scientific Data
Author(s)	Deagen, Michael E.; McCusker, Jamie P.; Fateye, Tolulomo; Stouffer, Samuel; Brinson, L. Cate; McGuinness, Deborah L.; Schadler, Linda S.
Author affiliation(s)	University of Vermont, Rensselaer Polytechnic Institute, Duke University
Primary contact	Email: mdeagen at mit dot edu
Year published	2022
Volume and issue	9
Article #	239
DOI	10.1038/s41597-022-01352-z
ISSN	2052-4463
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.nature.com/articles/s41597-022-01352-z
Download	https://www.nature.com/articles/s41597-022-01352-z.pdf (PDF)

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Graph databases capture richly linked domain knowledge by integrating heterogeneous data and metadata into a unified representation. Here, we present the use of bespoke, interactive data graphics (e.g., bar charts, scatter plots, etc.) for visual exploration of a knowledge graph. By modeling a chart as a set of metadata that describes semantic context (SPARQL query) separately from visual context (Vega-Lite specification), we leverage the high-level, declarative nature of the SPARQL and Vega-Lite grammars to concisely specify web-based, interactive data graphics synchronized to a knowledge graph. Resources with dereferenceable uniform resource identifiers (URIs) can employ the hyperlink encoding channel or image marks in Vega-Lite to amplify the information content of a given data graphic, and published charts populate a browsable gallery of the database. We discuss design considerations that arise in relation to portability, persistence, and performance. Altogether, this pairing of SPARQL and Vega-Lite—demonstrated here in the domain of polymer nanocomposite materials science—offers an extensible approach to FAIR (findable, accessible, interoperable, reusable) scientific data visualization within a knowledge graph framework.

Keywords: FAIR, graph database, knowledge graph, materials science, research management

Introduction

From early cartography to modern digital interfaces, data visualization—the display of abstract information in graphical form—has helped humans navigate unknown and complex spaces with a history of conceptual advancements alongside innovations in printing and reproduction. [1] Today, the widespread availability of digitized information, and the ability to process and display it with computers and web browsers, has brought interaction to the fore as a facilitator of higher-level cognitive processing on multidimensional datasets. [2] Interactive data visualization supports human reasoning and understanding through iterative exploration and investigation. [3] Given the deluge of data in many scientific domains, human-interpretable means for managing, troubleshooting, and disseminating information—particularly those that preserve machine-interpretability—remain essential in scientific research. This article illustrates such an approach, on a knowledge graph database, through the combination of a robust visualization grammar (Vega-Lite) and the query language for the semantic web (SPARQL) (Fig. 1).

Figure 1. Extending FAIR to data graphics. In the paradigm of charts as metadata, a chart object is modeled as a set of metadata that includes semantic context (SPARQL query) and visual context (Vega-Lite chart specification). With the SPARQL query language and the Vega-Lite grammar of interactive graphics, one can specify interactive charts (e.g., bar charts, scatter plots, heat maps, etc.) that remain synchronized to the content of the knowledge graph and whose data marks can link to dereferenceable URIs (e.g., DOIs, images, other charts, etc.) through hyperlink encoding channels. Combined, these tools offer a human- and machine-interpretable way to explore and share scientific data.

In response to challenges around the reuse of scholarly data [4], scientific communities have mobilized around a set of four guiding principles for data management: ensuring that data is findable, accessible, interoperable, and reusable . [5] Known by the acronym FAIR, these principles aim to preserve the value of digital assets through machine-interpretable metadata standards and schema. In the materials science domain, the FAIR guiding principles have been embraced by numerous data resources and repositories, ushering in the development of modern data infrastructures for materials research. [6,7,8,9,10] The backbone and nervous system for these and other scientific data infrastructures build upon the foundation of the World Wide Web (WWW).

Since the early vision of the semantic web to make data on the internet machine-interpretable [11], the WWW has evolved from a repository of linked documents to an omnipresent medium for information exchange. The Resource Description Framework (RDF), a metadata model for the semantic web, captures knowledge through expressions known as triples, each comprising two nodes and a directional edge, that form a directed graph-based data representation inside a database, or triple store. SPARQL, a query language for RDF, uses graph-based expressions to retrieve sets of matches, or bindings, of variables in a graph pattern to content in a triple store. In the case of SELECT queries in SPARQL, sets of bindings take on a tabular form. The RDF model achieves interoperability through shared ontologies, or structured vocabularies that form the basis for capturing and reasoning over domain knowledge. Graph databases, such as knowledge graphs [12], can build on the infrastructure of the internet by using uniform resource identifiers (URIs) that follow the well-established hypertext transfer protocol (HTTP) to ensure global uniqueness. Contrary to digital object identifiers (DOIs), which represent digital resources, URIs can represent anything (e.g., physical objects, abstract concepts, etc.). However, similar to the way a DOI is accessible via redirection when “https://dx.doi.org/” is placed in front, URIs can serve representations in a process known as dereferencing, offering a way to capture information stored elsewhere on the web. Despite challenges around the implementation of truly distributed knowledge representations [13], this extensible data and metadata format shows promise as a FAIR mechanism for storing and linking scientific data.

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.