Difference between revisions of "Journal:FAIR and interactive data graphics from a scientific knowledge graph"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Finished adding rest of content)
 
Line 18: Line 18:
|website      = [https://www.nature.com/articles/s41597-022-01352-z https://www.nature.com/articles/s41597-022-01352-z]
|website      = [https://www.nature.com/articles/s41597-022-01352-z https://www.nature.com/articles/s41597-022-01352-z]
|download    = [https://www.nature.com/articles/s41597-022-01352-z.pdf https://www.nature.com/articles/s41597-022-01352-z.pdf] (PDF)
|download    = [https://www.nature.com/articles/s41597-022-01352-z.pdf https://www.nature.com/articles/s41597-022-01352-z.pdf] (PDF)
}}
{{ombox
| type      = notice
| image    = [[Image:Emblem-important-yellow.svg|40px]]
| style    = width: 500px;
| text      = This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.
}}
}}
==Abstract==
==Abstract==
Line 31: Line 25:


==Introduction==
==Introduction==
From early cartography to modern digital interfaces, [[data visualization]]—the display of abstract information in graphical form—has helped humans navigate unknown and complex spaces with a history of conceptual advancements alongside innovations in printing and reproduction. [1] Today, the widespread availability of digitized [[information]], and the ability to process and display it with computers and web browsers, has brought interaction to the fore as a facilitator of higher-level cognitive processing on multidimensional datasets. [2] Interactive data visualization supports human reasoning and understanding through iterative exploration and investigation. [3] Given the deluge of data in many scientific domains, human-interpretable means for [[Information management|managing]], troubleshooting, and disseminating information—particularly those that preserve machine-interpretability—remain essential in scientific [[research]]. This article illustrates such an approach, on a knowledge [[graph database]], through the combination of a robust visualization grammar (Vega-Lite) and the query language for the semantic web (SPARQL) (Fig. 1).
From early cartography to modern digital interfaces, [[data visualization]]—the display of abstract information in graphical form—has helped humans navigate unknown and complex spaces with a history of conceptual advancements alongside innovations in printing and reproduction.<ref>{{Citation |last=Friendly |first=Michael |date=2008 |title=A Brief History of Data Visualization |url=http://link.springer.com/10.1007/978-3-540-33037-0_2 |work=Handbook of Data Visualization |language=en |publisher=Springer Berlin Heidelberg |place=Berlin, Heidelberg |pages=15–56 |doi=10.1007/978-3-540-33037-0_2 |isbn=978-3-540-33036-3 |accessdate=2024-06-16}}</ref> Today, the widespread availability of digitized [[information]], and the ability to process and display it with computers and web browsers, has brought interaction to the fore as a facilitator of higher-level cognitive processing on multidimensional datasets.<ref>{{Cite journal |last=Yi |first=Ji Soo |last2=Kang |first2=Youn ah |last3=Stasko |first3=John |last4=Jacko |first4=J.A. |date=2007-11 |title=Toward a Deeper Understanding of the Role of Interaction in Information Visualization |url=https://ieeexplore.ieee.org/document/4376144/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=13 |issue=6 |pages=1224–1231 |doi=10.1109/TVCG.2007.70515 |issn=1077-2626}}</ref> Interactive data visualization supports human reasoning and understanding through iterative exploration and investigation.<ref>{{Cite journal |last=Heer |first=Jeffrey |last2=Shneiderman |first2=Ben |date=2012-04 |title=Interactive dynamics for visual analysis |url=https://dl.acm.org/doi/10.1145/2133806.2133821 |journal=Communications of the ACM |language=en |volume=55 |issue=4 |pages=45–54 |doi=10.1145/2133806.2133821 |issn=0001-0782}}</ref> Given the deluge of data in many scientific domains, human-interpretable means for [[Information management|managing]], troubleshooting, and disseminating information—particularly those that preserve machine-interpretability—remain essential in scientific [[research]]. This article illustrates such an approach, on a knowledge [[graph database]], through the combination of a robust visualization grammar (Vega-Lite) and the query language for the semantic web (SPARQL) (Fig. 1).




Line 45: Line 39:
|}
|}


In response to challenges around the reuse of scholarly data [4], scientific communities have mobilized around a set of four guiding principles for data management: ensuring that data is findable, accessible, interoperable, and reusable . [5] Known by the acronym [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR]], these principles aim to preserve the value of digital assets through machine-interpretable [[metadata]] standards and schema. In the [[materials science]] domain, the FAIR guiding principles have been embraced by numerous data resources and repositories, ushering in the development of [[Materials informatics|modern data infrastructures]] for materials research. [6,7,8,9,10] The backbone and nervous system for these and other scientific data infrastructures build upon the foundation of the World Wide Web (WWW).
In response to challenges around the reuse of scholarly data<ref>{{Cite journal |last=Borgman |first=Christine L. |date=2012-06 |title=The conundrum of sharing research data |url=https://onlinelibrary.wiley.com/doi/10.1002/asi.22634 |journal=Journal of the American Society for Information Science and Technology |language=en |volume=63 |issue=6 |pages=1059–1078 |doi=10.1002/asi.22634 |issn=1532-2882}}</ref>, scientific communities have mobilized around a set of four guiding principles for data management: ensuring that data is findable, accessible, interoperable, and reusable.<ref name=":0">{{Cite journal |last=Wilkinson |first=Mark D. |last2=Dumontier |first2=Michel |last3=Aalbersberg |first3=IJsbrand Jan |last4=Appleton |first4=Gabrielle |last5=Axton |first5=Myles |last6=Baak |first6=Arie |last7=Blomberg |first7=Niklas |last8=Boiten |first8=Jan-Willem |last9=da Silva Santos |first9=Luiz Bonino |last10=Bourne |first10=Philip E. |last11=Bouwman |first11=Jildau |date=2016-03-15 |title=The FAIR Guiding Principles for scientific data management and stewardship |url=https://www.nature.com/articles/sdata201618 |journal=Scientific Data |language=en |volume=3 |issue=1 |pages=160018 |doi=10.1038/sdata.2016.18 |issn=2052-4463 |pmc=PMC4792175 |pmid=26978244}}</ref> Known by the acronym [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR]], these principles aim to preserve the value of digital assets through machine-interpretable [[metadata]] standards and schema. In the [[materials science]] domain, the FAIR guiding principles have been embraced by numerous data resources and repositories, ushering in the development of [[Materials informatics|modern data infrastructures]] for materials research.<ref>{{Cite journal |last=Draxl |first=Claudia |last2=Scheffler |first2=Matthias |date=2018-09 |title=NOMAD: The FAIR concept for big data-driven materials science |url=http://link.springer.com/10.1557/mrs.2018.208 |journal=MRS Bulletin |language=en |volume=43 |issue=9 |pages=676–682 |doi=10.1557/mrs.2018.208 |issn=0883-7694}}</ref><ref>{{Cite journal |last=Himanen |first=Lauri |last2=Geurts |first2=Amber |last3=Foster |first3=Adam Stuart |last4=Rinke |first4=Patrick |date=2019-11 |title=Data‐Driven Materials Science: Status, Challenges, and Perspectives |url=https://onlinelibrary.wiley.com/doi/10.1002/advs.201900808 |journal=Advanced Science |language=en |volume=6 |issue=21 |pages=1900808 |doi=10.1002/advs.201900808 |issn=2198-3844 |pmc=PMC6839624 |pmid=31728276}}</ref><ref name=":1">{{Cite journal |last=Brinson |first=L. Catherine |last2=Deagen |first2=Michael |last3=Chen |first3=Wei |last4=McCusker |first4=James |last5=McGuinness |first5=Deborah L. |last6=Schadler |first6=Linda S. |last7=Palmeri |first7=Marc |last8=Ghumman |first8=Umar |last9=Lin |first9=Anqi |last10=Hu |first10=Bingyin |date=2020-08-18 |title=Polymer Nanocomposite Data: Curation, Frameworks, Access, and Potential for Discovery and Design |url=https://pubs.acs.org/doi/10.1021/acsmacrolett.0c00264 |journal=ACS Macro Letters |language=en |volume=9 |issue=8 |pages=1086–1094 |doi=10.1021/acsmacrolett.0c00264 |issn=2161-1653}}</ref><ref>{{Cite journal |last=Horton |first=M. K. |last2=Dwaraknath |first2=S. |last3=Persson |first3=K. A. |date=2021-01-14 |title=Promises and perils of computational materials databases |url=https://www.nature.com/articles/s43588-020-00016-5 |journal=Nature Computational Science |language=en |volume=1 |issue=1 |pages=3–5 |doi=10.1038/s43588-020-00016-5 |issn=2662-8457}}</ref><ref>{{Cite journal |last=Warren |first=James A. |last2=Ward |first2=Charles H. |date=2018-09 |title=Evolution of a Materials Data Infrastructure |url=http://link.springer.com/10.1007/s11837-018-2968-z |journal=JOM |language=en |volume=70 |issue=9 |pages=1652–1658 |doi=10.1007/s11837-018-2968-z |issn=1047-4838}}</ref> The backbone and nervous system for these and other scientific data infrastructures build upon the foundation of the World Wide Web (WWW).


Since the early vision of the [[Semantics|semantic]] web to make data on the internet machine-interpretable [11], the WWW has evolved from a repository of linked documents to an omnipresent medium for information exchange. The [[Resource Description Framework]] (RDF), a metadata model for the semantic web, captures knowledge through expressions known as triples, each comprising two nodes and a directional edge, that form a directed graph-based data representation inside a database, or triple store. SPARQL, a query language for RDF, uses graph-based expressions to retrieve sets of matches, or bindings, of variables in a graph pattern to content in a triple store. In the case of SELECT queries in SPARQL, sets of bindings take on a tabular form. The RDF model achieves interoperability through shared [[Ontology (information science)|ontologies]], or structured vocabularies that form the basis for capturing and reasoning over domain knowledge. Graph databases, such as [[knowledge graph]]s [12], can build on the infrastructure of the internet by using uniform resource identifiers (URIs) that follow the well-established hypertext transfer protocol (HTTP) to ensure global uniqueness. Contrary to digital object identifiers (DOIs), which represent digital resources, URIs can represent anything (e.g., physical objects, abstract concepts, etc.). However, similar to the way a DOI is accessible via redirection when “https://dx.doi.org/” is placed in front, URIs can serve representations in a process known as dereferencing, offering a way to capture information stored elsewhere on the web. Despite challenges around the implementation of truly distributed knowledge representations [13], this extensible data and metadata format shows promise as a FAIR mechanism for storing and linking scientific data.
Since the early vision of the [[Semantics|semantic]] web to make data on the internet machine-interpretable<ref>{{Cite journal |last=Berners-Lee |first=Tim |last2=Hendler |first2=James |last3=Lassila |first3=Ora |date=2001-05 |title=The Semantic Web |url=https://www.scientificamerican.com/article/the-semantic-web |journal=Scientific American |volume=284 |issue=5 |pages=34–43 |doi=10.1038/scientificamerican0501-34 |issn=0036-8733}}</ref>, the WWW has evolved from a repository of linked documents to an omnipresent medium for information exchange. The [[Resource Description Framework]] (RDF), a metadata model for the semantic web, captures knowledge through expressions known as triples, each comprising two nodes and a directional edge, that form a directed graph-based data representation inside a database, or triple store. SPARQL, a query language for RDF, uses graph-based expressions to retrieve sets of matches, or bindings, of variables in a graph pattern to content in a triple store. In the case of SELECT queries in SPARQL, sets of bindings take on a tabular form. The RDF model achieves interoperability through shared [[Ontology (information science)|ontologies]], or structured vocabularies that form the basis for capturing and reasoning over domain knowledge. Graph databases, such as [[knowledge graph]]s<ref>{{Cite journal |last=Hogan |first=Aidan |last2=Blomqvist |first2=Eva |last3=Cochez |first3=Michael |last4=D’amato |first4=Claudia |last5=Melo |first5=Gerard De |last6=Gutierrez |first6=Claudio |last7=Kirrane |first7=Sabrina |last8=Gayo |first8=José Emilio Labra |last9=Navigli |first9=Roberto |last10=Neumaier |first10=Sebastian |last11=Ngomo |first11=Axel-Cyrille Ngonga |date=2022-05-31 |title=Knowledge Graphs |url=https://dl.acm.org/doi/10.1145/3447772 |journal=ACM Computing Surveys |language=en |volume=54 |issue=4 |pages=1–37 |doi=10.1145/3447772 |issn=0360-0300}}</ref>, can build on the infrastructure of the internet by using uniform resource identifiers (URIs) that follow the well-established hypertext transfer protocol (HTTP) to ensure global uniqueness. Contrary to digital object identifiers (DOIs), which represent digital resources, URIs can represent anything (e.g., physical objects, abstract concepts, etc.). However, similar to the way a DOI is accessible via redirection when “https://dx.doi.org/” is placed in front, URIs can serve representations in a process known as dereferencing, offering a way to capture information stored elsewhere on the web. Despite challenges around the implementation of truly distributed knowledge representations<ref>{{Cite journal |last=Polleres |first=Axel |last2=Kamdar |first2=Maulik Rajendra |last3=Fernández |first3=Javier David |last4=Tudorache |first4=Tania |last5=Musen |first5=Mark Alan |date=2020-01-31 |editor-last=Hitzler |editor-first=Pascal |editor2-last=Janowicz |editor2-first=Krzysztof |title=A more decentralized vision for Linked Data |url=https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-190380 |journal=Semantic Web |volume=11 |issue=1 |pages=101–113 |doi=10.3233/SW-190380}}</ref>, this extensible data and metadata format shows promise as a FAIR mechanism for storing and linking scientific data.


Several tools and platforms have been developed for exploring and visualizing RDF and linked data [14,15,16,17,18,19,20,21], but a common thread in these systems is the use of a typology to define charts (e.g., bar charts, pie charts, scatter plots). Extensive research in data visualization has illuminated the deeper structure underlying most data graphics wherein graphical primitives known as data marks (e.g., point, line, area, text) have properties that can be encoded through channels (e.g., position, color, size, opacity) by mapping data attributes along discrete or continuous scales. [22,23] This grammar of graphics forms the basis for highly-cited and widely-adopted visualization libraries. [24,25] Reactive Vega [26], and later Vega-Lite [27], extended this grammar to interaction. In the Vega-Lite grammar for interactive graphics, a chart specification (written in JSON syntax) defines the visual representation of a tabular dataset (e.g., marks, encodings, selection parameters), while lower-level details (e.g., color schemes, legends, axis scales, event handlers) compile with default values unless overridden in the specification. The result is a concise, declarative specification of an interactive view of a dataset, built and customized incrementally.
Several tools and platforms have been developed for exploring and visualizing RDF and linked data<ref>{{Citation |last=Skjæveland |first=Martin G. |date=2015 |editor-last=Simperl |editor-first=Elena |editor2-last=Norton |editor2-first=Barry |editor3-last=Mladenic |editor3-first=Dunja |editor4-last=Della Valle |editor4-first=Emanuele |editor5-last=Fundulaki |editor5-first=Irini |title=Sgvizler: A JavaScript Wrapper for Easy Visualization of SPARQL Result Sets |url=http://link.springer.com/10.1007/978-3-662-46641-4_27 |work=The Semantic Web: ESWC 2012 Satellite Events |language=en |publisher=Springer Berlin Heidelberg |place=Berlin, Heidelberg |volume=7540 |pages=361–365 |doi=10.1007/978-3-662-46641-4_27 |isbn=978-3-662-46640-7 |accessdate=2024-06-16}}</ref><ref>{{Citation |last=Alonen |first=Miika |last2=Kauppinen |first2=Tomi |last3=Suominen |first3=Osma |last4=Hyvönen |first4=Eero |date=2013 |editor-last=Salinesi |editor-first=Camille |editor2-last=Norrie |editor2-first=Moira C. |editor3-last=Pastor |editor3-first=Óscar |title=Exploring the Linked University Data with Visualization Tools |url=http://link.springer.com/10.1007/978-3-642-41242-4_25 |work=Advanced Information Systems Engineering |publisher=Springer Berlin Heidelberg |place=Berlin, Heidelberg |volume=7908 |pages=204–208 |doi=10.1007/978-3-642-41242-4_25 |isbn=978-3-642-38708-1 |accessdate=2024-06-16}}</ref><ref>{{Cite journal |last=Graves |first=Alvaro |date=2013-06-12 |title=Creation of visualizations based on linked data |url=https://dl.acm.org/doi/10.1145/2479787.2479828 |journal=Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics |language=en |publisher=ACM |place=Madrid Spain |pages=1–12 |doi=10.1145/2479787.2479828 |isbn=978-1-4503-1850-1}}</ref><ref>{{Citation |last=Thellmann |first=Klaudia |last2=Galkin |first2=Michael |last3=Orlandi |first3=Fabrizio |last4=Auer |first4=Sören |date=2015 |editor-last=Arenas |editor-first=Marcelo |editor2-last=Corcho |editor2-first=Oscar |editor3-last=Simperl |editor3-first=Elena |editor4-last=Strohmaier |editor4-first=Markus |editor5-last=d'Aquin |editor5-first=Mathieu |title=LinkDaViz – Automatic Binding of Linked Data to Visualizations |url=http://link.springer.com/10.1007/978-3-319-25007-6_9 |work=The Semantic Web - ISWC 2015 |language=en |publisher=Springer International Publishing |place=Cham |volume=9366 |pages=147–162 |doi=10.1007/978-3-319-25007-6_9 |isbn=978-3-319-25006-9 |accessdate=2024-06-16}}</ref><ref>{{Cite journal |last=Krommyda |first=Maria |last2=Kantere |first2=Verena |date=2019-09 |title=Understanding SPARQL Endpoints through Targeted Exploration and Visualization |url=https://ieeexplore.ieee.org/document/9030980/ |journal=2019 First International Conference on Graph Computing (GC) |publisher=IEEE |place=Laguna Hills, CA, USA |pages=21–28 |doi=10.1109/GC46384.2019.00012 |isbn=978-1-7281-4129-9}}</ref><ref>{{Citation |last=De Donato |first=Renato |last2=Garofalo |first2=Martina |last3=Malandrino |first3=Delfina |last4=Pellegrino |first4=Maria Angela |last5=Petta |first5=Andrea |last6=Scarano |first6=Vittorio |date=2020 |editor-last=Blomqvist |editor-first=Eva |editor2-last=Groth |editor2-first=Paul |editor3-last=de Boer |editor3-first=Victor |editor4-last=Pellegrini |editor4-first=Tassilo |editor5-last=Alam |editor5-first=Mehwish |title=QueDI: From Knowledge Graph Querying to Data Visualization |url=http://link.springer.com/10.1007/978-3-030-59833-4_5 |work=Semantic Systems. In the Era of Knowledge Graphs |language=en |publisher=Springer International Publishing |place=Cham |volume=12378 |pages=70–86 |doi=10.1007/978-3-030-59833-4_5 |isbn=978-3-030-59832-7 |pmc=PMC7586436 |accessdate=2024-06-16}}</ref><ref>{{Cite journal |last=Li |first=Haotian |last2=Wang |first2=Yong |last3=Zhang |first3=Songheng |last4=Song |first4=Yangqiu |last5=Qu |first5=Huamin |date=2022-01 |title=KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation |url=https://ieeexplore.ieee.org/document/9552844/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=28 |issue=1 |pages=195–205 |doi=10.1109/TVCG.2021.3114863 |issn=1077-2626}}</ref><ref>{{Cite journal |last=Papadaki |first=Maria-Evangelia |last2=Spyratos |first2=Nicolas |last3=Tzitzikas |first3=Yannis |date=2021-01-25 |title=Towards Interactive Analytics over RDF Graphs |url=https://www.mdpi.com/1999-4893/14/2/34 |journal=Algorithms |language=en |volume=14 |issue=2 |pages=34 |doi=10.3390/a14020034 |issn=1999-4893}}</ref>, but a common thread in these systems is the use of a typology to define charts (e.g., bar charts, pie charts, scatter plots). Extensive research in data visualization has illuminated the deeper structure underlying most data graphics wherein graphical primitives known as data marks (e.g., point, line, area, text) have properties that can be encoded through channels (e.g., position, color, size, opacity) by mapping data attributes along discrete or continuous scales.<ref>{{Citation |last=Wilkinson |first=Leland |date=2012 |editor-last=Gentle |editor-first=James E. |editor2-last=Härdle |editor2-first=Wolfgang Karl |editor3-last=Mori |editor3-first=Yuichi |title=The Grammar of Graphics |url=http://link.springer.com/10.1007/978-3-642-21551-3_13 |work=Handbook of Computational Statistics |language=en |publisher=Springer Berlin Heidelberg |place=Berlin, Heidelberg |pages=375–414 |doi=10.1007/978-3-642-21551-3_13 |isbn=978-3-642-21550-6 |accessdate=2024-06-16}}</ref><ref>{{Cite journal |last=Bostock |first=M. |last2=Heer |first2=J. |date=2009-11 |title=Protovis: A Graphical Toolkit for Visualization |url=http://ieeexplore.ieee.org/document/5290720/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=15 |issue=6 |pages=1121–1128 |doi=10.1109/TVCG.2009.174 |issn=1077-2626}}</ref> This grammar of graphics forms the basis for highly-cited and widely-adopted visualization libraries.<ref>{{Cite journal |last=Bostock |first=M. |last2=Ogievetsky |first2=V. |last3=Heer |first3=J. |date=2011-12 |title=D³ Data-Driven Documents |url=http://ieeexplore.ieee.org/document/6064996/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=17 |issue=12 |pages=2301–2309 |doi=10.1109/TVCG.2011.185 |issn=1077-2626}}</ref><ref>{{Cite journal |last=Wickham |first=Hadley |date=2011-03 |title=ggplot2 |url=https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.147 |journal=WIREs Computational Statistics |language=en |volume=3 |issue=2 |pages=180–185 |doi=10.1002/wics.147 |issn=1939-5108}}</ref> Reactive Vega<ref>{{Cite journal |last=Satyanarayan |first=Arvind |last2=Russell |first2=Ryan |last3=Hoffswell |first3=Jane |last4=Heer |first4=Jeffrey |date=2016-01-31 |title=Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization |url=http://ieeexplore.ieee.org/document/7192704/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=22 |issue=1 |pages=659–668 |doi=10.1109/TVCG.2015.2467091 |issn=1077-2626}}</ref>, and later Vega-Lite<ref>{{Cite journal |last=Satyanarayan |first=Arvind |last2=Moritz |first2=Dominik |last3=Wongsuphasawat |first3=Kanit |last4=Heer |first4=Jeffrey |date=2017-01 |title=Vega-Lite: A Grammar of Interactive Graphics |url=http://ieeexplore.ieee.org/document/7539624/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=23 |issue=1 |pages=341–350 |doi=10.1109/TVCG.2016.2599030 |issn=1077-2626}}</ref>, extended this grammar to interaction. In the Vega-Lite grammar for interactive graphics, a chart specification (written in JSON syntax) defines the visual representation of a tabular dataset (e.g., marks, encodings, selection parameters), while lower-level details (e.g., color schemes, legends, axis scales, event handlers) compile with default values unless overridden in the specification. The result is a concise, declarative specification of an interactive view of a dataset, built and customized incrementally.


Interactive methods for querying databases, such as Polaris and later VizQL (Tableau) [28,29], offer platforms for authoring interactive charts and dashboards through drag-and-drop interfaces. These systems have provided significant value to business analytics with their ease of use and suitability for many common tasks, but they are restrictive in terms of their proprietary nature, limited expressivity, and lack of support for graph-based data sources. To counter these drawbacks and provide a means for FAIR scientific data visualization, we focus our efforts on use of available open-source tools, a high degree of expressivity, and compatibility with knowledge graphs.
Interactive methods for querying databases, such as Polaris and later VizQL (Tableau)<ref>{{Cite journal |last=Stolte |first=C. |last2=Tang |first2=D. |last3=Hanrahan |first3=P. |date=Jan.-March/2002 |title=Polaris: a system for query, analysis, and visualization of multidimensional relational databases |url=http://ieeexplore.ieee.org/document/981851/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=8 |issue=1 |pages=52–65 |doi=10.1109/2945.981851}}</ref><ref>{{Cite journal |last=Hanrahan |first=Pat |date=2006-06-27 |title=VizQL: a language for query, analysis and visualization |url=https://dl.acm.org/doi/10.1145/1142473.1142560 |journal=Proceedings of the 2006 ACM SIGMOD international conference on Management of data |language=en |publisher=ACM |place=Chicago IL USA |pages=721–721 |doi=10.1145/1142473.1142560 |isbn=978-1-59593-434-5}}</ref>, offer platforms for authoring interactive charts and dashboards through drag-and-drop interfaces. These systems have provided significant value to business analytics with their ease of use and suitability for many common tasks, but they are restrictive in terms of their proprietary nature, limited expressivity, and lack of support for graph-based data sources. To counter these drawbacks and provide a means for FAIR scientific data visualization, we focus our efforts on use of available open-source tools, a high degree of expressivity, and compatibility with knowledge graphs.


In this article, we describe a paradigm wherein charts defined through metadata provide a mechanism for exploring and documenting the contents of a knowledge graph of materials science data. Building on the concept of a visualization as a function of a data storage medium and a user specification [30], we model a chart as a combination of query (SPARQL) and chart specification (Vega-Lite) stored in the knowledge graph and processed on demand. This approach for bespoke, interactive data graphics is made possible by the high-level, declarative nature of SPARQL and Vega-Lite. Storing charts as metadata enables them to display the most up-to-date information in the knowledge graph, and charts themselves can be queried and analyzed. We find that dereferenceable URIs—HTTP identifiers that serve human-readable representations when opened in a web browser—embody the complementarity of SPARQL and Vega-Lite. Examples presented here draw from a knowledge graph in the materials science domain, but the paradigm applies to other domains as a mechanism for FAIR scientific data visualization and interaction.
In this article, we describe a paradigm wherein charts defined through metadata provide a mechanism for exploring and documenting the contents of a knowledge graph of materials science data. Building on the concept of a visualization as a function of a data storage medium and a user specification<ref>{{Cite journal |last=Tang |first=Nan |last2=Wu |first2=Eugene |last3=Li |first3=Guoliang |date=2019-06-25 |title=Towards Democratizing Relational Data Visualization |url=https://dl.acm.org/doi/10.1145/3299869.3314029 |journal=Proceedings of the 2019 International Conference on Management of Data |language=en |publisher=ACM |place=Amsterdam Netherlands |pages=2025–2030 |doi=10.1145/3299869.3314029 |isbn=978-1-4503-5643-5}}</ref>, we model a chart as a combination of query (SPARQL) and chart specification (Vega-Lite) stored in the knowledge graph and processed on demand. This approach for bespoke, interactive data graphics is made possible by the high-level, declarative nature of SPARQL and Vega-Lite. Storing charts as metadata enables them to display the most up-to-date information in the knowledge graph, and charts themselves can be queried and analyzed. We find that dereferenceable URIs—HTTP identifiers that serve human-readable representations when opened in a web browser—embody the complementarity of SPARQL and Vega-Lite. Examples presented here draw from a knowledge graph in the materials science domain, but the paradigm applies to other domains as a mechanism for FAIR scientific data visualization and interaction.


==Results==
==Results==
Line 61: Line 55:
To address the trade-off between usability and expressivity, we opt for maximal expressivity in terms of content creation, taking usability into account by making all examples open-source and readily available for re-use. For example, domain experts without fluency in query or visualization languages (e.g., SPARQL, Vega-Lite) can interact with data in the knowledge graph by browsing a gallery of interactive charts, and those interested in creating their own charts have the code behind each chart as a precursor to adapt or modify for their own purposes. In this way, the collection of example queries and chart specifications provides a form of reusable documentation for accessing and viewing data in the knowledge graph.
To address the trade-off between usability and expressivity, we opt for maximal expressivity in terms of content creation, taking usability into account by making all examples open-source and readily available for re-use. For example, domain experts without fluency in query or visualization languages (e.g., SPARQL, Vega-Lite) can interact with data in the knowledge graph by browsing a gallery of interactive charts, and those interested in creating their own charts have the code behind each chart as a precursor to adapt or modify for their own purposes. In this way, the collection of example queries and chart specifications provides a form of reusable documentation for accessing and viewing data in the knowledge graph.


To demonstrate the concept of charts as metadata, we extended the visualization capabilities of the open-source [https://materialsmine.org/ MaterialsMine] repository to accommodate the saving and processing of these bespoke data graphics. The knowledge graph at MaterialsMine, previously NanoMine [8,31], contains curated data from research articles on polymer-matrix nanocomposite materials in the scholarly literature along with metadata describing the materials, processing, characterization, and bibliographic information from those articles. Structured as linked data conforming to semantic web ontologies and vocabularies [32], data and metadata are made accessible through a SPARQL endpoint on the web.
To demonstrate the concept of charts as metadata, we extended the visualization capabilities of the open-source [https://materialsmine.org/ MaterialsMine] repository to accommodate the saving and processing of these bespoke data graphics. The knowledge graph at MaterialsMine, previously NanoMine<ref name=":1" /><ref>{{Cite journal |last=Zhao |first=He |last2=Wang |first2=Yixing |last3=Lin |first3=Anqi |last4=Hu |first4=Bingyin |last5=Yan |first5=Rui |last6=McCusker |first6=James |last7=Chen |first7=Wei |last8=McGuinness |first8=Deborah L. |last9=Schadler |first9=Linda |last10=Brinson |first10=L. Catherine |date=2018-11-01 |title=NanoMine schema: An extensible data representation for polymer nanocomposites |url=https://pubs.aip.org/apm/article/6/11/111108/121743/NanoMine-schema-An-extensible-data-representation |journal=APL Materials |language=en |volume=6 |issue=11 |pages=111108 |doi=10.1063/1.5046839 |issn=2166-532X}}</ref>, contains curated data from research articles on polymer-matrix nanocomposite materials in the scholarly literature along with metadata describing the materials, processing, characterization, and bibliographic information from those articles. Structured as linked data conforming to semantic web ontologies and vocabularies<ref>{{Citation |last=McCusker |first=Jamie P. |last2=Keshan |first2=Neha |last3=Rashid |first3=Sabbir |last4=Deagen |first4=Michael |last5=Brinson |first5=Cate |last6=McGuinness |first6=Deborah L. |date=2020 |editor-last=Pan |editor-first=Jeff Z. |editor2-last=Tamma |editor2-first=Valentina |editor3-last=d’Amato |editor3-first=Claudia |editor4-last=Janowicz |editor4-first=Krzysztof |editor5-last=Fu |editor5-first=Bo |title=NanoMine: A Knowledge Graph for Nanocomposite Materials Science |url=https://link.springer.com/10.1007/978-3-030-62466-8_10 |work=The Semantic Web – ISWC 2020 |language=en |publisher=Springer International Publishing |place=Cham |volume=12507 |pages=144–159 |doi=10.1007/978-3-030-62466-8_10 |isbn=978-3-030-62465-1 |accessdate=2024-06-16}}</ref>, data and metadata are made accessible through a SPARQL endpoint on the web.


Tailored interactive charts containing data from the knowledge graph range in purpose and complexity. Depending on the SPARQL query, datasets vary from individual sample data linked to a research article to meta-analyses of all articles curated into the knowledge graph (Fig. 1). All examples shown here use some combination of layered and concatenated views combined with selections in Vega-Lite to provide explorable, interactive views of data. Following the mantra of overview first, zoom and filter, then details-on-demand [33], these data graphics use elements of interactivity to display aspects of a dataset that exceed the capability of a static representation. Common modes of interaction include tooltips, conditional display on hover interactions or selections, cross-filtered views, and pan and zoom.
Tailored interactive charts containing data from the knowledge graph range in purpose and complexity. Depending on the SPARQL query, datasets vary from individual sample data linked to a research article to meta-analyses of all articles curated into the knowledge graph (Fig. 1). All examples shown here use some combination of layered and concatenated views combined with selections in Vega-Lite to provide explorable, interactive views of data. Following the mantra of overview first, zoom and filter, then details-on-demand<ref name=":2">{{Cite journal |last=Shneiderman |first=B. |date=1996 |title=The eyes have it: a task by data type taxonomy for information visualizations |url=http://ieeexplore.ieee.org/document/545307/ |journal=Proceedings 1996 IEEE Symposium on Visual Languages |publisher=IEEE Comput. Soc. Press |place=Boulder, CO, USA |pages=336–343 |doi=10.1109/VL.1996.545307 |isbn=978-0-8186-7508-9}}</ref>, these data graphics use elements of interactivity to display aspects of a dataset that exceed the capability of a static representation. Common modes of interaction include tooltips, conditional display on hover interactions or selections, cross-filtered views, and pan and zoom.


Offering the full expressivity of SPARQL and Vega-Lite for specifying charts resulted in a number of interesting and often unanticipated interactive views of data in the knowledge graph. For example, rule marks with conditional opacity enable the overlaying of derived mechanical properties (e.g., tensile modulus, tensile strength, elongation at break) over representative curves showing raw tensile test data (Fig. 2a). Using Vega-Lite transforms and layered rule marks permits the custom scaling and plotting of linearized Weibull distributions for real-time calculation of dielectric breakdown strength (Fig. 2b). A query of articles and the material systems studied within them offers an interactive view of trends in polymer nanocomposite materials research (Fig. 2c). Another meta-analysis demonstrates the results of entity resolution with the ChemProps API (Fig. 2d). [34] Concatenated sub-views and text formatting parameters result in a stylized infographic demonstrating some of the ways to enhance data exploration by adding interactive elements (Fig. 2e). In addition to concatenated sub-views, sequence generators and Vega-Lite transforms make possible an embedded explanation of dynamic mechanical analysis for viscoelastic material properties atop experimental data (Fig. 2f). These and over 150 other examples currently populate the gallery of charts in the MaterialsMine knowledge graph.
Offering the full expressivity of SPARQL and Vega-Lite for specifying charts resulted in a number of interesting and often unanticipated interactive views of data in the knowledge graph. For example, rule marks with conditional opacity enable the overlaying of derived mechanical properties (e.g., tensile modulus, tensile strength, elongation at break) over representative curves showing raw tensile test data (Fig. 2a). Using Vega-Lite transforms and layered rule marks permits the custom scaling and plotting of linearized Weibull distributions for real-time calculation of dielectric breakdown strength (Fig. 2b). A query of articles and the material systems studied within them offers an interactive view of trends in polymer nanocomposite materials research (Fig. 2c). Another meta-analysis demonstrates the results of entity resolution with the ChemProps API (Fig. 2d).<ref name=":3">{{Cite journal |last=Hu |first=Bingyin |last2=Lin |first2=Anqi |last3=Brinson |first3=L. Catherine |date=2021-12 |title=ChemProps: A RESTful API enabled database for composite polymer name standardization |url=https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00502-6 |journal=Journal of Cheminformatics |language=en |volume=13 |issue=1 |pages=22 |doi=10.1186/s13321-021-00502-6 |issn=1758-2946 |pmc=PMC7955638 |pmid=33712066}}</ref> Concatenated sub-views and text formatting parameters result in a stylized infographic demonstrating some of the ways to enhance data exploration by adding interactive elements (Fig. 2e). In addition to concatenated sub-views, sequence generators and Vega-Lite transforms make possible an embedded explanation of dynamic mechanical analysis for viscoelastic material properties atop experimental data (Fig. 2f). These and over 150 other examples currently populate the gallery of charts in the MaterialsMine knowledge graph.




Line 74: Line 68:
{| border="0" cellpadding="5" cellspacing="0" width="900px"
{| border="0" cellpadding="5" cellspacing="0" width="900px"
  |-
  |-
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 2.''' Interactive views of sample data, meta-analyses, and stylized infographics. Charts shown here are specified by a SPARQL query (semantic context) as well as Vega-Lite specification (visual context). The snapshots of interactive data graphics shown here display '''a)''' mechanical tensile testing data curated from Bandyopadhyay ''et al.'' (2005) [48], transformed into a layered composite view; '''b)''' a Weibull plot of dielectric testing data using custom y-axis scaling and the regression transform to estimate dielectric breakdown strength (DBS); '''c)''' a meta-analysis of nanocomposite filler materials in curated research articles per year of publication, highlighted to show the trend for graphene; '''d''' a meta-analysis of entity-resolved compound names (computed by the ChemProps API [34]) versus curator-provided strings; '''e)''' an infographic showing a dataset with increasingly interactive views; and '''f)''' an explanatory graphic for viscoelastic data. These examples created for the materials science domain represent a small subset of the variety of datasets and visualizations made possible by using SPARQL queries and Vega-Lite specifications to capture interactive views of content from a knowledge graph database.</blockquote>
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 2.''' Interactive views of sample data, meta-analyses, and stylized infographics. Charts shown here are specified by a SPARQL query (semantic context) as well as Vega-Lite specification (visual context). The snapshots of interactive data graphics shown here display '''a)''' mechanical tensile testing data curated from Bandyopadhyay ''et al.'' (2005)<ref>{{Cite journal |last=Bandyopadhyay |first=A. |last2=De Sarkar |first2=M. |last3=Bhowmick |first3=A. K. |date=2005-10 |title=Poly(vinyl alcohol)/silica hybrid nanocomposites by sol-gel technique: Synthesis and properties |url=http://link.springer.com/10.1007/s10853-005-4417-y |journal=Journal of Materials Science |language=en |volume=40 |issue=19 |pages=5233–5241 |doi=10.1007/s10853-005-4417-y |issn=0022-2461}}</ref>, transformed into a layered composite view; '''b)''' a Weibull plot of dielectric testing data using custom y-axis scaling and the regression transform to estimate dielectric breakdown strength (DBS); '''c)''' a meta-analysis of nanocomposite filler materials in curated research articles per year of publication, highlighted to show the trend for graphene; '''d''' a meta-analysis of entity-resolved compound names (computed by the ChemProps API<ref name=":3" />) versus curator-provided strings; '''e)''' an infographic showing a dataset with increasingly interactive views; and '''f)''' an explanatory graphic for viscoelastic data. These examples created for the materials science domain represent a small subset of the variety of datasets and visualizations made possible by using SPARQL queries and Vega-Lite specifications to capture interactive views of content from a knowledge graph database.</blockquote>
  |-  
  |-  
|}
|}
Line 84: Line 78:
To avoid naming collisions, knowledge graphs employ URIs to globally identify resources without ambiguity. Using well-established internet protocols (e.g., HTTP) helps to ensure global uniqueness among distributed systems on the semantic web. A helpful practice for documenting resources involves the owner of a domain having a representation delivered by a server (e.g., HTML page) when a URI is requested through internet protocols. URIs can exist solely as identifiers, but those with available representations on the web are known as dereferenceable URIs.
To avoid naming collisions, knowledge graphs employ URIs to globally identify resources without ambiguity. Using well-established internet protocols (e.g., HTTP) helps to ensure global uniqueness among distributed systems on the semantic web. A helpful practice for documenting resources involves the owner of a domain having a representation delivered by a server (e.g., HTML page) when a URI is requested through internet protocols. URIs can exist solely as identifiers, but those with available representations on the web are known as dereferenceable URIs.


URIs can be returned in the results of a SPARQL query, but a column of URIs in a table may be less useful than an interactive visualization that allows a user to sort and refine the results of interest. Overview first, zoom and filter, then details-on-demand. [33] We identify two encoding channels in Vega-Lite that make the language well-suited to knowledge graphs: the url encoding channel for image marks (Fig. 3a), and the href (hyperlink reference) encoding channel for other data marks such as text (Fig. 3b) or point marks (Fig. 3c). First, images serve as useful visual representations in many scientific domains, and rendering them on-demand via dereferenceable URIs avoids the need to download or cache a full set of images. Second, the practice of hyperlinking to primary sources or representations leverages the notion of linked data by directing to additional information about resources outside the confines of a given chart.
URIs can be returned in the results of a SPARQL query, but a column of URIs in a table may be less useful than an interactive visualization that allows a user to sort and refine the results of interest. Overview first, zoom and filter, then details-on-demand.<ref name=":2" /> We identify two encoding channels in Vega-Lite that make the language well-suited to knowledge graphs: the url encoding channel for image marks (Fig. 3a), and the href (hyperlink reference) encoding channel for other data marks such as text (Fig. 3b) or point marks (Fig. 3c). First, images serve as useful visual representations in many scientific domains, and rendering them on-demand via dereferenceable URIs avoids the need to download or cache a full set of images. Second, the practice of hyperlinking to primary sources or representations leverages the notion of linked data by directing to additional information about resources outside the confines of a given chart.




Line 93: Line 87:
{| border="0" cellpadding="5" cellspacing="0" width="600px"
{| border="0" cellpadding="5" cellspacing="0" width="600px"
  |-
  |-
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3.''' Direct linking to representations of resources in the knowledge graph. These charts make use of dereferenceable URIs in the knowledge graph to display or link to resources. '''a)''' Image marks with accompanying URL encoding channels are used to display curated sample images from Natarajan ''et al.'' (2013) [49] corresponding to the selected points on the adjacent scatter plot. '''b)''' Text marks with a hyperlink encoding channel link open the URL of a journal article DOI when selected. '''c)''' A scatter plot displays charts published to the knowledge graph, arranged by the character length of their Vega-Lite specification and description. Point marks with the hyperlink encoding channel link to a chart page when selected. This final chart is self-referential; the highlighted point mark represents the chart itself.</blockquote>
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 3.''' Direct linking to representations of resources in the knowledge graph. These charts make use of dereferenceable URIs in the knowledge graph to display or link to resources. '''a)''' Image marks with accompanying URL encoding channels are used to display curated sample images from Natarajan ''et al.'' (2013)<ref>{{Cite journal |last=Natarajan |first=Bharath |last2=Li |first2=Yang |last3=Deng |first3=Hua |last4=Brinson |first4=L. Catherine |last5=Schadler |first5=Linda S. |date=2013-04-09 |title=Effect of Interfacial Energetics on Dispersion and Glass Transition Temperature in Polymer Nanocomposites |url=https://pubs.acs.org/doi/10.1021/ma302281b |journal=Macromolecules |language=en |volume=46 |issue=7 |pages=2833–2841 |doi=10.1021/ma302281b |issn=0024-9297}}</ref> corresponding to the selected points on the adjacent scatter plot. '''b)''' Text marks with a hyperlink encoding channel link open the URL of a journal article DOI when selected. '''c)''' A scatter plot displays charts published to the knowledge graph, arranged by the character length of their Vega-Lite specification and description. Point marks with the hyperlink encoding channel link to a chart page when selected. This final chart is self-referential; the highlighted point mark represents the chart itself.</blockquote>
  |-  
  |-  
|}
|}
|}
|}


Interactive data visualization offers myriad ways to explore a dataset, and we describe how knowledge graphs with dereferenceable URIs can expand the reach of these graphics to the entire web through hyperlinks. By combining the strengths of knowledge graphs for storing knowledge and interactive visualizations for accessing knowledge, this approach provides a means for communicating data in a way that builds trust and makes data analysis more transparent, building on the idea that sharing the graphic should equate to sharing the data. [35]
Interactive data visualization offers myriad ways to explore a dataset, and we describe how knowledge graphs with dereferenceable URIs can expand the reach of these graphics to the entire web through hyperlinks. By combining the strengths of knowledge graphs for storing knowledge and interactive visualizations for accessing knowledge, this approach provides a means for communicating data in a way that builds trust and makes data analysis more transparent, building on the idea that sharing the graphic should equate to sharing the data.<ref>{{Cite web |last=Lebo, T.; Graves, A.; McGuinness, D.L. |date=01 October 2013 |title=Content-Preserving Graphics |work=Tetherless World Publications |url=https://hdl.handle.net/20.500.13015/4523 |publisher=Rensselaer Polytechnic Institute}}</ref>


===Interoperability with other web platforms===
===Interoperability with other web platforms===
The semantic web facilitates data exchange in a distributed manner by building on the infrastructure of the internet and encouraging the use of common vocabularies and ontologies. One demonstration of interoperability enabled by SPARQL is the extension for federated querying. Federated queries aggregate data from multiple sources by running sub-queries across distributed SPARQL endpoints on the internet. Furthermore, the ability to send a query to a public SPARQL endpoint via HTTP GET request and receive machine-readable results (e.g., JSON) enables other web platforms to query and process data from a knowledge graph.
The semantic web facilitates data exchange in a distributed manner by building on the infrastructure of the internet and encouraging the use of common vocabularies and ontologies. One demonstration of interoperability enabled by SPARQL is the extension for federated querying. Federated queries aggregate data from multiple sources by running sub-queries across distributed SPARQL endpoints on the internet. Furthermore, the ability to send a query to a public SPARQL endpoint via HTTP GET request and receive machine-readable results (e.g., JSON) enables other web platforms to query and process data from a knowledge graph.


Here, we demonstrate a two-fold example of interoperability by showing an example chart from MaterialsMine, with federated querying of DBpedia [36], all within a reactive computational notebook on Observable (Fig. 4). Platforms such as [https://observablehq.com/ Observable], which natively supports Vega-Lite, can fetch a chart’s metadata, parse the query and chart specification, run the query for the chart’s data (in this case, at the same endpoint), then render those data as an interactive Vega-Lite chart. In this example, the query contains a SERVICE clause to the DBpedia SPARQL endpoint to return the English-text abstract for the material compound “Silicon dioxide” from Wikipedia, and the Vega-Lite specification displays this abstract as a text mark on the chart (Fig. 4, red dotted lines). At present, federated querying adds several seconds to the query runtime, therefore the development of such queries requires optimization.
Here, we demonstrate a two-fold example of interoperability by showing an example chart from MaterialsMine, with federated querying of DBpedia<ref name=":4">{{Cite journal |last=Lehmann |first=Jens |last2=Isele |first2=Robert |last3=Jakob |first3=Max |last4=Jentzsch |first4=Anja |last5=Kontokostas |first5=Dimitris |last6=Mendes |first6=Pablo N. |last7=Hellmann |first7=Sebastian |last8=Morsey |first8=Mohamed |last9=van Kleef |first9=Patrick |last10=Auer |first10=Sören |last11=Bizer |first11=Christian |date=2015 |title=DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia |url=https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-140134 |journal=Semantic Web |volume=6 |issue=2 |pages=167–195 |doi=10.3233/SW-140134}}</ref>, all within a reactive computational notebook on Observable (Fig. 4). Platforms such as [https://observablehq.com/ Observable], which natively supports Vega-Lite, can fetch a chart’s metadata, parse the query and chart specification, run the query for the chart’s data (in this case, at the same endpoint), then render those data as an interactive Vega-Lite chart. In this example, the query contains a SERVICE clause to the DBpedia SPARQL endpoint to return the English-text abstract for the material compound “Silicon dioxide” from Wikipedia, and the Vega-Lite specification displays this abstract as a text mark on the chart (Fig. 4, red dotted lines). At present, federated querying adds several seconds to the query runtime, therefore the development of such queries requires optimization.




Line 112: Line 106:
{| border="0" cellpadding="5" cellspacing="0" width="800px"
{| border="0" cellpadding="5" cellspacing="0" width="800px"
  |-
  |-
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 4.''' Interoperability with other web platforms and FAIR data sources. The ability of public SPARQL endpoints to send queries and receive data through internet protocols enables interoperability within a query (e.g., federated querying from DBpedia [36]) as well as displaying and processing information from the knowledge graph using external web-based platforms, such as an Observable notebook (https://observablehq.com/@mdeagen/figure-4-notebook).</blockquote>
   | style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 4.''' Interoperability with other web platforms and FAIR data sources. The ability of public SPARQL endpoints to send queries and receive data through internet protocols enables interoperability within a query (e.g., federated querying from DBpedia<ref name=":4" />) as well as displaying and processing information from the knowledge graph using external web-based platforms, such as an Observable notebook (https://observablehq.com/@mdeagen/figure-4-notebook).</blockquote>
  |-  
  |-  
|}
|}
Line 125: Line 119:


==Discussion==
==Discussion==
In the paradigm of charts as metadata, the data instances that populate a chart are absent from the chart specification. This may seem counter-intuitive, but the resulting specification describes what data to retrieve (i.e., semantic context) and how to display it (i.e., visual context). As a result, these metadata-defined charts represent interactive lenses, each with a particular vantage view of the knowledge graph, that display the most up-to-date instances from the knowledge graph at the time of rendering. Many approaches to designing static visualizations no longer apply when visualizations become interactive and subject to changing data. [37] We organize these design considerations into three broad categories: portability, persistence, and performance.
In the paradigm of charts as metadata, the data instances that populate a chart are absent from the chart specification. This may seem counter-intuitive, but the resulting specification describes what data to retrieve (i.e., semantic context) and how to display it (i.e., visual context). As a result, these metadata-defined charts represent interactive lenses, each with a particular vantage view of the knowledge graph, that display the most up-to-date instances from the knowledge graph at the time of rendering. Many approaches to designing static visualizations no longer apply when visualizations become interactive and subject to changing data.<ref>{{Cite journal |last=Walny |first=Jagoda |last2=Frisson |first2=Christian |last3=West |first3=Mieka |last4=Kosminsky |first4=Doris |last5=Knudsen |first5=Soren |last6=Carpendale |first6=Sheelagh |last7=Willett |first7=Wesley |date=2020-01 |title=Data Changes Everything: Challenges and Opportunities in Data Visualization Design Handoff |url=https://ieeexplore.ieee.org/document/8816695/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=26 |issue=1 |pages=12–22 |doi=10.1109/TVCG.2019.2934538 |issn=1077-2626}}</ref> We organize these design considerations into three broad categories: portability, persistence, and performance.


Portability poses a key challenge for web-based charts and interactive charts in general. The wide adoption of a given approach or toolset hinges on its reliability and compatibility with a diverse set of platforms and devices. To serve the intended use as a means for analyzing or disseminating data, an interactive data graphic must retain its ability to respond to user input when embedded in some other format (e.g., offline document, presentation slide), or offer a pre-recorded animation displaying its contents. Two recent projects, Chameleon and Loom, have begun to tackle some of these challenges around portability of interactive data graphics. [38,39]
Portability poses a key challenge for web-based charts and interactive charts in general. The wide adoption of a given approach or toolset hinges on its reliability and compatibility with a diverse set of platforms and devices. To serve the intended use as a means for analyzing or disseminating data, an interactive data graphic must retain its ability to respond to user input when embedded in some other format (e.g., offline document, presentation slide), or offer a pre-recorded animation displaying its contents. Two recent projects, Chameleon and Loom, have begun to tackle some of these challenges around portability of interactive data graphics.<ref>{{Cite journal |last=Masson |first=Damien |last2=Malacria |first2=Sylvain |last3=Lank |first3=Edward |last4=Casiez |first4=Géry |date=2020-04-21 |title=Chameleon: Bringing Interactivity to Static Digital Documents |url=https://dl.acm.org/doi/10.1145/3313831.3376559 |journal=Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems |language=en |publisher=ACM |place=Honolulu HI USA |pages=1–13 |doi=10.1145/3313831.3376559 |isbn=978-1-4503-6708-0}}</ref><ref>{{Cite journal |last=Raji |first=Mohammad |last2=Duncan |first2=Jeremiah |last3=Hobson |first3=Tanner |last4=Huang |first4=Jian |date=2021-09-01 |title=Dataless Sharing of Interactive Visualization |url=https://ieeexplore.ieee.org/document/9056539/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=27 |issue=9 |pages=3656–3669 |doi=10.1109/TVCG.2020.2984708 |issn=1077-2626}}</ref>


Persistence, or the ability to continue existing as a useful data graphic, largely depends on the stability of the underlying data representations and their ability to be interpreted in the future. In a Vega-Lite chart specification, one may specify the schema used as a form of [[version control]] against future software changes in Vega-Lite. On the data query side, if vocabulary URIs or the way data are modeled in the knowledge graph change, SPARQL queries may cease to function as originally intended. We experienced this challenge when converting terms in the MaterialsMine ontology from their prior namespace (http://nanomine.org/ns/) to a new namespace (http://materialsmine.org/ns/). Updates to charts typically only involved a one-line change in the SPARQL PREFIX header of the query, but the issue highlighted the effect of upstream changes involving URIs on downstream resources such as charts. To mitigate these issues, communities should invest requisite resources to ensure robust ontologies and stable SPARQL endpoints that provide reliable access to data and a consistent semantic representation.
Persistence, or the ability to continue existing as a useful data graphic, largely depends on the stability of the underlying data representations and their ability to be interpreted in the future. In a Vega-Lite chart specification, one may specify the schema used as a form of [[version control]] against future software changes in Vega-Lite. On the data query side, if vocabulary URIs or the way data are modeled in the knowledge graph change, SPARQL queries may cease to function as originally intended. We experienced this challenge when converting terms in the MaterialsMine ontology from their prior namespace (http://nanomine.org/ns/) to a new namespace (http://materialsmine.org/ns/). Updates to charts typically only involved a one-line change in the SPARQL PREFIX header of the query, but the issue highlighted the effect of upstream changes involving URIs on downstream resources such as charts. To mitigate these issues, communities should invest requisite resources to ensure robust ontologies and stable SPARQL endpoints that provide reliable access to data and a consistent semantic representation.


Performance of these charts may involve technological or data limitations. Query runtimes and chart rendering are necessarily impacted by the quantity of data available in the knowledge graph and how much data can be stored in memory. Moreover, responsiveness of public SPARQL endpoints, particularly with respect to federated querying, remains an ongoing challenge. For visualization and interaction design, accounting for future data involves considering how new data may impact scale extents, latency, or occlusion of data marks. Consideration of the scope of the data graphic becomes important, for example separating a large dataset into separate views that show a high-level view of the dataset with access to instances through interaction. Overview first, zoom and filter, then details-on-demand. [33] On a more technical note, rendering images in Vega-Lite requires the use of image URIs within the same domain or ensuring that images from external domains have the appropriate HTTP header enabling cross-origin resource sharing (CORS). Finally, scalability of a gallery of charts from a knowledge graph involves considerations of the ease with which domain experts can search and navigate the collection of charts available.
Performance of these charts may involve technological or data limitations. Query runtimes and chart rendering are necessarily impacted by the quantity of data available in the knowledge graph and how much data can be stored in memory. Moreover, responsiveness of public SPARQL endpoints, particularly with respect to federated querying, remains an ongoing challenge. For visualization and interaction design, accounting for future data involves considering how new data may impact scale extents, latency, or occlusion of data marks. Consideration of the scope of the data graphic becomes important, for example separating a large dataset into separate views that show a high-level view of the dataset with access to instances through interaction. Overview first, zoom and filter, then details-on-demand.<ref name=":2" /> On a more technical note, rendering images in Vega-Lite requires the use of image URIs within the same domain or ensuring that images from external domains have the appropriate HTTP header enabling cross-origin resource sharing (CORS). Finally, scalability of a gallery of charts from a knowledge graph involves considerations of the ease with which domain experts can search and navigate the collection of charts available.


Relational and non-relational databases (e.g., SQL, NoSQL) provide limited account of the relationships between individual data objects, simplifying initial development of limited-scope data resources but hindering the later integration and interoperability with other data resources as these models and applications scale in complexity. Knowledge graph databases, on the other hand, use a graph data model upfront to capture these abstract relationships and semantics. Backend database performance still remains a concern when metadata employ a graph data model, but the use of shared ontologies mitigates the scaling issues around interoperability. When graph databases build upon the infrastructure of the WWW and employ globally unique and dereferenceable identifiers (URIs), they lower the barriers for distributed data exchange and can benefit from a web-based interactive visualization grammar such as Vega-Lite.
Relational and non-relational databases (e.g., SQL, NoSQL) provide limited account of the relationships between individual data objects, simplifying initial development of limited-scope data resources but hindering the later integration and interoperability with other data resources as these models and applications scale in complexity. Knowledge graph databases, on the other hand, use a graph data model upfront to capture these abstract relationships and semantics. Backend database performance still remains a concern when metadata employ a graph data model, but the use of shared ontologies mitigates the scaling issues around interoperability. When graph databases build upon the infrastructure of the WWW and employ globally unique and dereferenceable identifiers (URIs), they lower the barriers for distributed data exchange and can benefit from a web-based interactive visualization grammar such as Vega-Lite.


Defining charts as metadata in a knowledge graph captures semantic context and visual context while providing interactive, human-interpretable documentation of the contents of a knowledge graph. These chart representations may also be considered a form of “visualization data,” an emerging data format relevant to the application of [[artificial intelligence]] (AI) to visualization generation, enhancement, and analysis. [40] The complementarity of SPARQL and Vega-Lite make this approach to scientific data visualization well-aligned with the FAIR principles by preserving machine-interpretability of underlying data while simultaneously providing an interactive means for domain experts to explore the contents of a knowledge graph.
Defining charts as metadata in a knowledge graph captures semantic context and visual context while providing interactive, human-interpretable documentation of the contents of a knowledge graph. These chart representations may also be considered a form of “visualization data,” an emerging data format relevant to the application of [[artificial intelligence]] (AI) to visualization generation, enhancement, and analysis.<ref>{{Cite journal |last=Wu |first=Aoyu |last2=Wang |first2=Yun |last3=Shu |first3=Xinhuan |last4=Moritz |first4=Dominik |last5=Cui |first5=Weiwei |last6=Zhang |first6=Haidong |last7=Zhang |first7=Dongmei |last8=Qu |first8=Huamin |date=2022-12-01 |title=AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization |url=https://ieeexplore.ieee.org/document/9495259/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=28 |issue=12 |pages=5049–5070 |doi=10.1109/TVCG.2021.3099002 |issn=1077-2626}}</ref> The complementarity of SPARQL and Vega-Lite make this approach to scientific data visualization well-aligned with the FAIR principles by preserving machine-interpretability of underlying data while simultaneously providing an interactive means for domain experts to explore the contents of a knowledge graph.


==Methods==
==Methods==
Line 141: Line 135:


===Metadata for a chart===
===Metadata for a chart===
Expressing data queries and chart specifications as text allows them to be stored as string literals in the knowledge graph. We assign each chart URI to the class <tt>sio:Chart</tt> from the Semanticscience Integrated Ontology [41], along with metadata corresponding to the widely-adopted Dublin Core, Schema.org, and FOAF vocabularies. In addition to the SPARQL query (i.e., semantic context) and Vega-Lite specification (i.e., visual context), we include a title, description, and thumbnail depiction of each chart (Fig. 5). When published to the knowledge graph, provenance metadata (when a chart was created and by which logged-in user) are captured as extensions of a named graph using the nanopublication framework. [42]
Expressing data queries and chart specifications as text allows them to be stored as string literals in the knowledge graph. We assign each chart URI to the class <tt>sio:Chart</tt> from the Semanticscience Integrated Ontology<ref name=":5">{{Cite journal |last=Dumontier |first=Michel |last2=Baker |first2=Christopher JO |last3=Baran |first3=Joachim |last4=Callahan |first4=Alison |last5=Chepelev |first5=Leonid |last6=Cruz-Toledo |first6=José |last7=Del Rio |first7=Nicholas R |last8=Duck |first8=Geraint |last9=Furlong |first9=Laura I |last10=Keath |first10=Nichealla |last11=Klassen |first11=Dana |date=2014-12 |title=The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery |url=https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-14 |journal=Journal of Biomedical Semantics |language=en |volume=5 |issue=1 |pages=14 |doi=10.1186/2041-1480-5-14 |issn=2041-1480 |pmc=PMC4015691 |pmid=24602174}}</ref>, along with metadata corresponding to the widely-adopted Dublin Core, Schema.org, and FOAF vocabularies. In addition to the SPARQL query (i.e., semantic context) and Vega-Lite specification (i.e., visual context), we include a title, description, and thumbnail depiction of each chart (Fig. 5). When published to the knowledge graph, provenance metadata (when a chart was created and by which logged-in user) are captured as extensions of a named graph using the nanopublication framework.<ref name=":6">{{Cite journal |last=Kuhn |first=Tobias |last2=Meroño-Peñuela |first2=Albert |last3=Malic |first3=Alexander |last4=Poelen |first4=Jorrit H. |last5=Hurlbert |first5=Allen H. |last6=Ortiz |first6=Emilio Centeno |last7=Furlong |first7=Laura I. |last8=Queralt-Rosinach |first8=Núria |last9=Chichester |first9=Christine |last10=Banda |first10=Juan M. |last11=Willighagen |first11=Egon |date=2018 |title=Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data |url=https://arxiv.org/abs/1809.06532 |journal=arXiv |doi=10.48550/ARXIV.1809.06532}}</ref>




Line 171: Line 165:


===Browsing and creating charts===
===Browsing and creating charts===
We use Whyis [43], a Python Flask application for knowledge graphs, to upload and manage charts in the knowledge graph. All instances of <tt>sio:Chart</tt> currently populate a paginated gallery featuring the thumbnail depiction, chart title, preview of the description, and link to the chart URI. By clicking on a chart, a user is directed to a chart instance view which queries the knowledge graph, displays the chart title and description, and renders the Vega-Lite chart. Icons above the chart allow the user to view the SPARQL query and Vega-Lite chart specification. Given the many possible ways to visualize a tabular dataset, we also enable the user to explore the raw data returned by the query inside an instance of Data Voyager [44,45], which provides a drag-and-drop interface for defining chart encodings and exploring recommended views.
We use Whyis<ref name=":7">{{Cite web |last=McCusker, J.; Rashid, S.; Agu, N. et al. |date=01 October 2018 |title=The Whyis Knowledge Graph Framework in Action |work=Tetherless World Publications |url=https://hdl.handle.net/20.500.13015/4443 |publisher=Rensselaer Polytechnic Institute}}</ref>, a Python Flask application for knowledge graphs, to upload and manage charts in the knowledge graph. All instances of <tt>sio:Chart</tt> currently populate a paginated gallery featuring the thumbnail depiction, chart title, preview of the description, and link to the chart URI. By clicking on a chart, a user is directed to a chart instance view which queries the knowledge graph, displays the chart title and description, and renders the Vega-Lite chart. Icons above the chart allow the user to view the SPARQL query and Vega-Lite chart specification. Given the many possible ways to visualize a tabular dataset, we also enable the user to explore the raw data returned by the query inside an instance of Data Voyager<ref>{{Cite journal |last=Wongsuphasawat |first=Kanit |last2=Moritz |first2=Dominik |last3=Anand |first3=Anushka |last4=Mackinlay |first4=Jock |last5=Howe |first5=Bill |last6=Heer |first6=Jeffrey |date=2016-01-31 |title=Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations |url=http://ieeexplore.ieee.org/document/7192728/ |journal=IEEE Transactions on Visualization and Computer Graphics |volume=22 |issue=1 |pages=649–658 |doi=10.1109/TVCG.2015.2467191 |issn=1077-2626}}</ref><ref>{{Cite journal |last=Wongsuphasawat |first=Kanit |last2=Qu |first2=Zening |last3=Moritz |first3=Dominik |last4=Chang |first4=Riley |last5=Ouk |first5=Felix |last6=Anand |first6=Anushka |last7=Mackinlay |first7=Jock |last8=Howe |first8=Bill |last9=Heer |first9=Jeffrey |date=2017-05-02 |title=Voyager 2: Augmenting Visual Analysis with Partial View Specifications |url=https://dl.acm.org/doi/10.1145/3025453.3025768 |journal=Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems |language=en |publisher=ACM |place=Denver Colorado USA |pages=2648–2659 |doi=10.1145/3025453.3025768 |isbn=978-1-4503-4655-9}}</ref>, which provides a drag-and-drop interface for defining chart encodings and exploring recommended views.


To add a chart to the knowledge graph, a user enters the SPARQL query, Vega-Lite specification, title, and description into the custom chart editor interface in Whyis. SPARQL query syntax highlighting is enabled by embedding a YASGUI interface. [46] On the opposite panel of this interface, the user can toggle between views of the raw data as a table or Vega-Lite chart. When a chart is saved, a nanopublication is published to the knowledge graph, a backup of the chart metadata is created in MongoDB, and the chart joins the gallery of charts. At present, all charts published to the knowledge graph are publicly available. Future deployments may consider whether to offer tiered access or intermediate publication, for example saving progress on a chart under development or publishing with a limited scope (e.g., to a research team). In the meantime, development of charts can occur by using an offline platform such as Visual Studio Code with Vega-Lite plug-in, or by using an online platform such as Observable.
To add a chart to the knowledge graph, a user enters the SPARQL query, Vega-Lite specification, title, and description into the custom chart editor interface in Whyis. SPARQL query syntax highlighting is enabled by embedding a YASGUI interface.<ref>{{Citation |last=Rietveld |first=Laurens |last2=Hoekstra |first2=Rinke |date=2013 |editor-last=Salinesi |editor-first=Camille |editor2-last=Norrie |editor2-first=Moira C. |editor3-last=Pastor |editor3-first=Óscar |title=YASGUI: Not Just Another SPARQL Client |url=http://link.springer.com/10.1007/978-3-642-41242-4_7 |work=Advanced Information Systems Engineering |publisher=Springer Berlin Heidelberg |place=Berlin, Heidelberg |volume=7908 |pages=78–86 |doi=10.1007/978-3-642-41242-4_7 |isbn=978-3-642-38708-1 |accessdate=2024-06-16}}</ref> On the opposite panel of this interface, the user can toggle between views of the raw data as a table or Vega-Lite chart. When a chart is saved, a nanopublication is published to the knowledge graph, a backup of the chart metadata is created in MongoDB, and the chart joins the gallery of charts. At present, all charts published to the knowledge graph are publicly available. Future deployments may consider whether to offer tiered access or intermediate publication, for example saving progress on a chart under development or publishing with a limited scope (e.g., to a research team). In the meantime, development of charts can occur by using an offline platform such as Visual Studio Code with Vega-Lite plug-in, or by using an online platform such as Observable.


===Framing the FAIR guiding principles for data graphics===
===Framing the FAIR guiding principles for data graphics===
Here, we highlight several of the guiding principles of FAIR laid out by Wilkinson ''et al.'' [5] and relate them to design decisions around the combined approach of SPARQL and Vega-Lite for scientific data visualization:
Here, we highlight several of the guiding principles of FAIR laid out by Wilkinson ''et al.''<ref name=":0" /> and relate them to design decisions around the combined approach of SPARQL and Vega-Lite for scientific data visualization:


* “F1. (Meta)data are assigned a globally unique and persistent identifier”: Chart objects, modeled as the combination of a SPARQL query and Vega-Lite specification among other metadata, are assigned a globally unique URI.
*“F1. (Meta)data are assigned a globally unique and persistent identifier”: Chart objects, modeled as the combination of a SPARQL query and Vega-Lite specification among other metadata, are assigned a globally unique URI.
* “A1. (Meta)data are retrievable by their identifier using a standardized communications protocol”: A chart object is dereferenceable through its HTTP URI, and data objects within a chart that have their own dereferenceable URIs can use image marks or the hyperlink encoding channel in the Vega-Lite specification.
*“A1. (Meta)data are retrievable by their identifier using a standardized communications protocol”: A chart object is dereferenceable through its HTTP URI, and data objects within a chart that have their own dereferenceable URIs can use image marks or the hyperlink encoding channel in the Vega-Lite specification.
* “A1.1. The protocol is open, free, and universally implementable”: A public SPARQL endpoint provides access to the data, and the free, open-source software developed by the Vega-Lite community enables the rendering of valid chart specifications.
*“A1.1. The protocol is open, free, and universally implementable”: A public SPARQL endpoint provides access to the data, and the free, open-source software developed by the Vega-Lite community enables the rendering of valid chart specifications.
* “A1.2. The protocol allows for an authentication and authorization procedure, where necessary”: To ensure provenance of chart objects, posting to the knowledge graph is limited to authenticated users who are logged into the web application.
*“A1.2. The protocol allows for an authentication and authorization procedure, where necessary”: To ensure provenance of chart objects, posting to the knowledge graph is limited to authenticated users who are logged into the web application.
* “I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation”: The RDF (meta)data model has been employed to capture the semantic relationships between a chart object, its associated metadata, and its provenance.
*“I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation”: The RDF (meta)data model has been employed to capture the semantic relationships between a chart object, its associated metadata, and its provenance.
* “I2. (Meta)data use vocabularies that follow FAIR principles”: We use the Semanticscience Integrated Ontology [41] along with Dublin Core, Schema.org, and FOAF vocabularies.
*“I2. (Meta)data use vocabularies that follow FAIR principles”: We use the Semanticscience Integrated Ontology<ref name=":5" /> along with Dublin Core, Schema.org, and FOAF vocabularies.
* “R1.2. (Meta)data are associated with detailed provenance”: The Whyis knowledge graph framework [43], using the concept of nanopublications [42], captures provenance metadata for charts posted to the knowledge graph, such as the creator and time of publication.
*“R1.2. (Meta)data are associated with detailed provenance”: The Whyis knowledge graph framework<ref name=":7" />, using the concept of nanopublications<ref name=":6" />, captures provenance metadata for charts posted to the knowledge graph, such as the creator and time of publication.
* “R1.3. (Meta)data meet domain-relevant community standards”: Vega-Lite provides a high degree of expressivity, of which the figures in this article provide a small sample. The approach could also apply to scientific domains that utilize geospatial visualizations using Vega-Lite’s geographic projection abilities. Scientific visualizations outside of the scope of Vega-Lite (e.g., 3D models, molecular structures, annotated schematics, force-directed graphs, animations) can instead use static depictions with image marks and a hyperlink encoding channel to link to a more advanced representation.
*“R1.3. (Meta)data meet domain-relevant community standards”: Vega-Lite provides a high degree of expressivity, of which the figures in this article provide a small sample. The approach could also apply to scientific domains that utilize geospatial visualizations using Vega-Lite’s geographic projection abilities. Scientific visualizations outside of the scope of Vega-Lite (e.g., 3D models, molecular structures, annotated schematics, force-directed graphs, animations) can instead use static depictions with image marks and a hyperlink encoding channel to link to a more advanced representation.


==Abbreviations, acronyms, and initialisms==
==Abbreviations, acronyms, and initialisms==
* '''AI''': artificial intelligence
 
* '''CORS''': cross-origin resource sharing  
*'''AI''': artificial intelligence
* '''DOI''': digital object identifier
*'''CORS''': cross-origin resource sharing
* '''FAIR''': findable, accessible, interoperable, reusable
*'''DOI''': digital object identifier
* '''HTTP''': hypertext transfer protocol
*'''FAIR''': findable, accessible, interoperable, reusable
* '''RDF''': Resource Description Framework
*'''HTTP''': hypertext transfer protocol
* '''URI''': uniform resource identifier
*'''RDF''': Resource Description Framework
* '''WWW''': World Wide Web
*'''URI''': uniform resource identifier
*'''WWW''': World Wide Web


==Acknowledgements==
==Acknowledgements==
Line 201: Line 196:


===Data availability===
===Data availability===
The “living” versions of these interactive charts, which require an operational SPARQL endpoint and Whyis application, can be found in the MaterialsMine Gallery of Interactive Charts at https://materialsmine.org/wi/gallery. As a backup, archival versions of each interactive chart featured in this article (using a static snapshot of queried data) are available on Observable (https://observablehq.com/@mdeagen/archival-interactive-charts). A zipped folder with the query and chart specification for each chart featured in this article, as well as a snapshot of the data retrieved by the query, is available on Figshare (https://doi.org/10.6084/m9.figshare.19352258). [47]
The “living” versions of these interactive charts, which require an operational SPARQL endpoint and Whyis application, can be found in the MaterialsMine Gallery of Interactive Charts at https://materialsmine.org/wi/gallery. As a backup, archival versions of each interactive chart featured in this article (using a static snapshot of queried data) are available on Observable (https://observablehq.com/@mdeagen/archival-interactive-charts). A zipped folder with the query and chart specification for each chart featured in this article, as well as a snapshot of the data retrieved by the query, is available on Figshare (https://doi.org/10.6084/m9.figshare.19352258).<ref>{{Citation |last=Deagen, Michael |date=2022 |title=Chart metadata and snapshots of data from March 13, 2022 |url=https://figshare.com/articles/dataset/Chart_metadata_and_snapshots_of_data_from_March_13_2022/19352258/1 |work=Figshare |publisher= |doi=10.6084/m9.figshare.19352258.v1 |accessdate=}}</ref>


===Code availability===
===Code availability===

Latest revision as of 16:21, 16 June 2024

Full article title FAIR and interactive data graphics from a scientific knowledge graph
Journal Scientific Data
Author(s) Deagen, Michael E.; McCusker, Jamie P.; Fateye, Tolulomo; Stouffer, Samuel; Brinson, L. Cate; McGuinness, Deborah L.; Schadler, Linda S.
Author affiliation(s) University of Vermont, Rensselaer Polytechnic Institute, Duke University
Primary contact Email: mdeagen at mit dot edu
Year published 2022
Volume and issue 9
Article # 239
DOI 10.1038/s41597-022-01352-z
ISSN 2052-4463
Distribution license Creative Commons Attribution 4.0 International
Website https://www.nature.com/articles/s41597-022-01352-z
Download https://www.nature.com/articles/s41597-022-01352-z.pdf (PDF)

Abstract

Graph databases capture richly linked domain knowledge by integrating heterogeneous data and metadata into a unified representation. Here, we present the use of bespoke, interactive data graphics (e.g., bar charts, scatter plots, etc.) for visual exploration of a knowledge graph. By modeling a chart as a set of metadata that describes semantic context (SPARQL query) separately from visual context (Vega-Lite specification), we leverage the high-level, declarative nature of the SPARQL and Vega-Lite grammars to concisely specify web-based, interactive data graphics synchronized to a knowledge graph. Resources with dereferenceable uniform resource identifiers (URIs) can employ the hyperlink encoding channel or image marks in Vega-Lite to amplify the information content of a given data graphic, and published charts populate a browsable gallery of the database. We discuss design considerations that arise in relation to portability, persistence, and performance. Altogether, this pairing of SPARQL and Vega-Lite—demonstrated here in the domain of polymer nanocomposite materials science—offers an extensible approach to FAIR (findable, accessible, interoperable, reusable) scientific data visualization within a knowledge graph framework.

Keywords: FAIR, graph database, knowledge graph, materials science, research management

Introduction

From early cartography to modern digital interfaces, data visualization—the display of abstract information in graphical form—has helped humans navigate unknown and complex spaces with a history of conceptual advancements alongside innovations in printing and reproduction.[1] Today, the widespread availability of digitized information, and the ability to process and display it with computers and web browsers, has brought interaction to the fore as a facilitator of higher-level cognitive processing on multidimensional datasets.[2] Interactive data visualization supports human reasoning and understanding through iterative exploration and investigation.[3] Given the deluge of data in many scientific domains, human-interpretable means for managing, troubleshooting, and disseminating information—particularly those that preserve machine-interpretability—remain essential in scientific research. This article illustrates such an approach, on a knowledge graph database, through the combination of a robust visualization grammar (Vega-Lite) and the query language for the semantic web (SPARQL) (Fig. 1).


Fig1 Deagen SciData22 9.png

Figure 1. Extending FAIR to data graphics. In the paradigm of charts as metadata, a chart object is modeled as a set of metadata that includes semantic context (SPARQL query) and visual context (Vega-Lite chart specification). With the SPARQL query language and the Vega-Lite grammar of interactive graphics, one can specify interactive charts (e.g., bar charts, scatter plots, heat maps, etc.) that remain synchronized to the content of the knowledge graph and whose data marks can link to dereferenceable URIs (e.g., DOIs, images, other charts, etc.) through hyperlink encoding channels. Combined, these tools offer a human- and machine-interpretable way to explore and share scientific data.

In response to challenges around the reuse of scholarly data[4], scientific communities have mobilized around a set of four guiding principles for data management: ensuring that data is findable, accessible, interoperable, and reusable.[5] Known by the acronym FAIR, these principles aim to preserve the value of digital assets through machine-interpretable metadata standards and schema. In the materials science domain, the FAIR guiding principles have been embraced by numerous data resources and repositories, ushering in the development of modern data infrastructures for materials research.[6][7][8][9][10] The backbone and nervous system for these and other scientific data infrastructures build upon the foundation of the World Wide Web (WWW).

Since the early vision of the semantic web to make data on the internet machine-interpretable[11], the WWW has evolved from a repository of linked documents to an omnipresent medium for information exchange. The Resource Description Framework (RDF), a metadata model for the semantic web, captures knowledge through expressions known as triples, each comprising two nodes and a directional edge, that form a directed graph-based data representation inside a database, or triple store. SPARQL, a query language for RDF, uses graph-based expressions to retrieve sets of matches, or bindings, of variables in a graph pattern to content in a triple store. In the case of SELECT queries in SPARQL, sets of bindings take on a tabular form. The RDF model achieves interoperability through shared ontologies, or structured vocabularies that form the basis for capturing and reasoning over domain knowledge. Graph databases, such as knowledge graphs[12], can build on the infrastructure of the internet by using uniform resource identifiers (URIs) that follow the well-established hypertext transfer protocol (HTTP) to ensure global uniqueness. Contrary to digital object identifiers (DOIs), which represent digital resources, URIs can represent anything (e.g., physical objects, abstract concepts, etc.). However, similar to the way a DOI is accessible via redirection when “https://dx.doi.org/” is placed in front, URIs can serve representations in a process known as dereferencing, offering a way to capture information stored elsewhere on the web. Despite challenges around the implementation of truly distributed knowledge representations[13], this extensible data and metadata format shows promise as a FAIR mechanism for storing and linking scientific data.

Several tools and platforms have been developed for exploring and visualizing RDF and linked data[14][15][16][17][18][19][20][21], but a common thread in these systems is the use of a typology to define charts (e.g., bar charts, pie charts, scatter plots). Extensive research in data visualization has illuminated the deeper structure underlying most data graphics wherein graphical primitives known as data marks (e.g., point, line, area, text) have properties that can be encoded through channels (e.g., position, color, size, opacity) by mapping data attributes along discrete or continuous scales.[22][23] This grammar of graphics forms the basis for highly-cited and widely-adopted visualization libraries.[24][25] Reactive Vega[26], and later Vega-Lite[27], extended this grammar to interaction. In the Vega-Lite grammar for interactive graphics, a chart specification (written in JSON syntax) defines the visual representation of a tabular dataset (e.g., marks, encodings, selection parameters), while lower-level details (e.g., color schemes, legends, axis scales, event handlers) compile with default values unless overridden in the specification. The result is a concise, declarative specification of an interactive view of a dataset, built and customized incrementally.

Interactive methods for querying databases, such as Polaris and later VizQL (Tableau)[28][29], offer platforms for authoring interactive charts and dashboards through drag-and-drop interfaces. These systems have provided significant value to business analytics with their ease of use and suitability for many common tasks, but they are restrictive in terms of their proprietary nature, limited expressivity, and lack of support for graph-based data sources. To counter these drawbacks and provide a means for FAIR scientific data visualization, we focus our efforts on use of available open-source tools, a high degree of expressivity, and compatibility with knowledge graphs.

In this article, we describe a paradigm wherein charts defined through metadata provide a mechanism for exploring and documenting the contents of a knowledge graph of materials science data. Building on the concept of a visualization as a function of a data storage medium and a user specification[30], we model a chart as a combination of query (SPARQL) and chart specification (Vega-Lite) stored in the knowledge graph and processed on demand. This approach for bespoke, interactive data graphics is made possible by the high-level, declarative nature of SPARQL and Vega-Lite. Storing charts as metadata enables them to display the most up-to-date information in the knowledge graph, and charts themselves can be queried and analyzed. We find that dereferenceable URIs—HTTP identifiers that serve human-readable representations when opened in a web browser—embody the complementarity of SPARQL and Vega-Lite. Examples presented here draw from a knowledge graph in the materials science domain, but the paradigm applies to other domains as a mechanism for FAIR scientific data visualization and interaction.

Results

By exploring the notion of charts as metadata, we find that the variety of bespoke data graphics offers a useful, interoperable platform for exploratory visualization of a knowledge graph.

Sandbox for exploratory visualization, infographics, and meta-analyses

To address the trade-off between usability and expressivity, we opt for maximal expressivity in terms of content creation, taking usability into account by making all examples open-source and readily available for re-use. For example, domain experts without fluency in query or visualization languages (e.g., SPARQL, Vega-Lite) can interact with data in the knowledge graph by browsing a gallery of interactive charts, and those interested in creating their own charts have the code behind each chart as a precursor to adapt or modify for their own purposes. In this way, the collection of example queries and chart specifications provides a form of reusable documentation for accessing and viewing data in the knowledge graph.

To demonstrate the concept of charts as metadata, we extended the visualization capabilities of the open-source MaterialsMine repository to accommodate the saving and processing of these bespoke data graphics. The knowledge graph at MaterialsMine, previously NanoMine[8][31], contains curated data from research articles on polymer-matrix nanocomposite materials in the scholarly literature along with metadata describing the materials, processing, characterization, and bibliographic information from those articles. Structured as linked data conforming to semantic web ontologies and vocabularies[32], data and metadata are made accessible through a SPARQL endpoint on the web.

Tailored interactive charts containing data from the knowledge graph range in purpose and complexity. Depending on the SPARQL query, datasets vary from individual sample data linked to a research article to meta-analyses of all articles curated into the knowledge graph (Fig. 1). All examples shown here use some combination of layered and concatenated views combined with selections in Vega-Lite to provide explorable, interactive views of data. Following the mantra of overview first, zoom and filter, then details-on-demand[33], these data graphics use elements of interactivity to display aspects of a dataset that exceed the capability of a static representation. Common modes of interaction include tooltips, conditional display on hover interactions or selections, cross-filtered views, and pan and zoom.

Offering the full expressivity of SPARQL and Vega-Lite for specifying charts resulted in a number of interesting and often unanticipated interactive views of data in the knowledge graph. For example, rule marks with conditional opacity enable the overlaying of derived mechanical properties (e.g., tensile modulus, tensile strength, elongation at break) over representative curves showing raw tensile test data (Fig. 2a). Using Vega-Lite transforms and layered rule marks permits the custom scaling and plotting of linearized Weibull distributions for real-time calculation of dielectric breakdown strength (Fig. 2b). A query of articles and the material systems studied within them offers an interactive view of trends in polymer nanocomposite materials research (Fig. 2c). Another meta-analysis demonstrates the results of entity resolution with the ChemProps API (Fig. 2d).[34] Concatenated sub-views and text formatting parameters result in a stylized infographic demonstrating some of the ways to enhance data exploration by adding interactive elements (Fig. 2e). In addition to concatenated sub-views, sequence generators and Vega-Lite transforms make possible an embedded explanation of dynamic mechanical analysis for viscoelastic material properties atop experimental data (Fig. 2f). These and over 150 other examples currently populate the gallery of charts in the MaterialsMine knowledge graph.


Fig2 Deagen SciData22 9.png

Figure 2. Interactive views of sample data, meta-analyses, and stylized infographics. Charts shown here are specified by a SPARQL query (semantic context) as well as Vega-Lite specification (visual context). The snapshots of interactive data graphics shown here display a) mechanical tensile testing data curated from Bandyopadhyay et al. (2005)[35], transformed into a layered composite view; b) a Weibull plot of dielectric testing data using custom y-axis scaling and the regression transform to estimate dielectric breakdown strength (DBS); c) a meta-analysis of nanocomposite filler materials in curated research articles per year of publication, highlighted to show the trend for graphene; d a meta-analysis of entity-resolved compound names (computed by the ChemProps API[34]) versus curator-provided strings; e) an infographic showing a dataset with increasingly interactive views; and f) an explanatory graphic for viscoelastic data. These examples created for the materials science domain represent a small subset of the variety of datasets and visualizations made possible by using SPARQL queries and Vega-Lite specifications to capture interactive views of content from a knowledge graph database.

The examples presented here by no means represent the only way to query and display these data. By making available the expressivity offered by SPARQL and Vega-Lite, we encourage experimentation and rich customization in the pursuit of effective means of data exploration for a variety of applications. Any individual data visualization will have finite applicability. However, the collection of such open-source visualizations enabled by this approach can accomplish a variety of tasks and illuminate remote corners of a knowledge graph.

Leveraging dereferenceable URIs in a knowledge graph

To avoid naming collisions, knowledge graphs employ URIs to globally identify resources without ambiguity. Using well-established internet protocols (e.g., HTTP) helps to ensure global uniqueness among distributed systems on the semantic web. A helpful practice for documenting resources involves the owner of a domain having a representation delivered by a server (e.g., HTML page) when a URI is requested through internet protocols. URIs can exist solely as identifiers, but those with available representations on the web are known as dereferenceable URIs.

URIs can be returned in the results of a SPARQL query, but a column of URIs in a table may be less useful than an interactive visualization that allows a user to sort and refine the results of interest. Overview first, zoom and filter, then details-on-demand.[33] We identify two encoding channels in Vega-Lite that make the language well-suited to knowledge graphs: the url encoding channel for image marks (Fig. 3a), and the href (hyperlink reference) encoding channel for other data marks such as text (Fig. 3b) or point marks (Fig. 3c). First, images serve as useful visual representations in many scientific domains, and rendering them on-demand via dereferenceable URIs avoids the need to download or cache a full set of images. Second, the practice of hyperlinking to primary sources or representations leverages the notion of linked data by directing to additional information about resources outside the confines of a given chart.


Fig3 Deagen SciData22 9.png

Figure 3. Direct linking to representations of resources in the knowledge graph. These charts make use of dereferenceable URIs in the knowledge graph to display or link to resources. a) Image marks with accompanying URL encoding channels are used to display curated sample images from Natarajan et al. (2013)[36] corresponding to the selected points on the adjacent scatter plot. b) Text marks with a hyperlink encoding channel link open the URL of a journal article DOI when selected. c) A scatter plot displays charts published to the knowledge graph, arranged by the character length of their Vega-Lite specification and description. Point marks with the hyperlink encoding channel link to a chart page when selected. This final chart is self-referential; the highlighted point mark represents the chart itself.

Interactive data visualization offers myriad ways to explore a dataset, and we describe how knowledge graphs with dereferenceable URIs can expand the reach of these graphics to the entire web through hyperlinks. By combining the strengths of knowledge graphs for storing knowledge and interactive visualizations for accessing knowledge, this approach provides a means for communicating data in a way that builds trust and makes data analysis more transparent, building on the idea that sharing the graphic should equate to sharing the data.[37]

Interoperability with other web platforms

The semantic web facilitates data exchange in a distributed manner by building on the infrastructure of the internet and encouraging the use of common vocabularies and ontologies. One demonstration of interoperability enabled by SPARQL is the extension for federated querying. Federated queries aggregate data from multiple sources by running sub-queries across distributed SPARQL endpoints on the internet. Furthermore, the ability to send a query to a public SPARQL endpoint via HTTP GET request and receive machine-readable results (e.g., JSON) enables other web platforms to query and process data from a knowledge graph.

Here, we demonstrate a two-fold example of interoperability by showing an example chart from MaterialsMine, with federated querying of DBpedia[38], all within a reactive computational notebook on Observable (Fig. 4). Platforms such as Observable, which natively supports Vega-Lite, can fetch a chart’s metadata, parse the query and chart specification, run the query for the chart’s data (in this case, at the same endpoint), then render those data as an interactive Vega-Lite chart. In this example, the query contains a SERVICE clause to the DBpedia SPARQL endpoint to return the English-text abstract for the material compound “Silicon dioxide” from Wikipedia, and the Vega-Lite specification displays this abstract as a text mark on the chart (Fig. 4, red dotted lines). At present, federated querying adds several seconds to the query runtime, therefore the development of such queries requires optimization.


Fig4 Deagen SciData22 9.png

Figure 4. Interoperability with other web platforms and FAIR data sources. The ability of public SPARQL endpoints to send queries and receive data through internet protocols enables interoperability within a query (e.g., federated querying from DBpedia[38]) as well as displaying and processing information from the knowledge graph using external web-based platforms, such as an Observable notebook (https://observablehq.com/@mdeagen/figure-4-notebook).

Interoperability is arguably the most challenging of the FAIR principles to implement, and we have shown how a SPARQL-equipped knowledge graph can interoperate with other public SPARQL endpoints as well as display charts and their metadata on an external platform that supports Vega-Lite. In the next section, we present design considerations for queries and chart specifications that arise in this approach to FAIR scientific data visualization.

Decoupling (meta)data from graphical representation

Data graphics assemble and contextualize information for scientists, similar to how metadata package and describe data for machines. By choosing to model a data graphic (e.g., interactive Vega-Lite chart) as a form of metadata itself, researchers can simultaneously capture human-interpretable and machine-interpretable representations of their research output. This FAIR approach to data visualization leverages Vega-Lite’s grammar of interactive graphics, which differs fundamentally from conventional tools (e.g., Excel, Plotly, Matlab, etc.). By describing an interactive representation of data as a JSON object, a Vega-Lite specification illuminates the inherent structure of most data graphics, as opposed to a chart typology that requires many preset chart types to achieve expressivity. Upon introducing the ability to encode URIs as hyperlinks in data marks, Vega-Lite becomes an ideal tool for combining with semantic web technologies. While a formal grammar of graphics ontology falls outside the scope of the present work, such an effort could build upon these demonstrations of the reciprocal benefits of SPARQL and Vega-Lite and include stakeholders from both the semantic web and data visualization communities.

To further illustrate the benefits of the combined approach of SPARQL and Vega-Lite, we can consider the substitution of either tool with traditional alternatives. In the case of SPARQL with a typology-based plotting tool, one loses expressivity in terms of building interactive data graphics and may obscure the visual meaning captured in the rendered graphic. The inverse case—an isolated tabular dataset with a Vega-Lite specification—may lack sufficient metadata and semantic context necessary to interpret the raw data. With the combined approach, data and visual representations exist as metadata, with the added benefit that interactive charts can use hyperlink encoding channels to provide direct access to dereferenceable resources in the knowledge graph. Jointly, these tools embody FAIR scientific data visualization, and we elaborate further on the framing of specific FAIR guiding principles around these notions in the Methods section.

Discussion

In the paradigm of charts as metadata, the data instances that populate a chart are absent from the chart specification. This may seem counter-intuitive, but the resulting specification describes what data to retrieve (i.e., semantic context) and how to display it (i.e., visual context). As a result, these metadata-defined charts represent interactive lenses, each with a particular vantage view of the knowledge graph, that display the most up-to-date instances from the knowledge graph at the time of rendering. Many approaches to designing static visualizations no longer apply when visualizations become interactive and subject to changing data.[39] We organize these design considerations into three broad categories: portability, persistence, and performance.

Portability poses a key challenge for web-based charts and interactive charts in general. The wide adoption of a given approach or toolset hinges on its reliability and compatibility with a diverse set of platforms and devices. To serve the intended use as a means for analyzing or disseminating data, an interactive data graphic must retain its ability to respond to user input when embedded in some other format (e.g., offline document, presentation slide), or offer a pre-recorded animation displaying its contents. Two recent projects, Chameleon and Loom, have begun to tackle some of these challenges around portability of interactive data graphics.[40][41]

Persistence, or the ability to continue existing as a useful data graphic, largely depends on the stability of the underlying data representations and their ability to be interpreted in the future. In a Vega-Lite chart specification, one may specify the schema used as a form of version control against future software changes in Vega-Lite. On the data query side, if vocabulary URIs or the way data are modeled in the knowledge graph change, SPARQL queries may cease to function as originally intended. We experienced this challenge when converting terms in the MaterialsMine ontology from their prior namespace (http://nanomine.org/ns/) to a new namespace (http://materialsmine.org/ns/). Updates to charts typically only involved a one-line change in the SPARQL PREFIX header of the query, but the issue highlighted the effect of upstream changes involving URIs on downstream resources such as charts. To mitigate these issues, communities should invest requisite resources to ensure robust ontologies and stable SPARQL endpoints that provide reliable access to data and a consistent semantic representation.

Performance of these charts may involve technological or data limitations. Query runtimes and chart rendering are necessarily impacted by the quantity of data available in the knowledge graph and how much data can be stored in memory. Moreover, responsiveness of public SPARQL endpoints, particularly with respect to federated querying, remains an ongoing challenge. For visualization and interaction design, accounting for future data involves considering how new data may impact scale extents, latency, or occlusion of data marks. Consideration of the scope of the data graphic becomes important, for example separating a large dataset into separate views that show a high-level view of the dataset with access to instances through interaction. Overview first, zoom and filter, then details-on-demand.[33] On a more technical note, rendering images in Vega-Lite requires the use of image URIs within the same domain or ensuring that images from external domains have the appropriate HTTP header enabling cross-origin resource sharing (CORS). Finally, scalability of a gallery of charts from a knowledge graph involves considerations of the ease with which domain experts can search and navigate the collection of charts available.

Relational and non-relational databases (e.g., SQL, NoSQL) provide limited account of the relationships between individual data objects, simplifying initial development of limited-scope data resources but hindering the later integration and interoperability with other data resources as these models and applications scale in complexity. Knowledge graph databases, on the other hand, use a graph data model upfront to capture these abstract relationships and semantics. Backend database performance still remains a concern when metadata employ a graph data model, but the use of shared ontologies mitigates the scaling issues around interoperability. When graph databases build upon the infrastructure of the WWW and employ globally unique and dereferenceable identifiers (URIs), they lower the barriers for distributed data exchange and can benefit from a web-based interactive visualization grammar such as Vega-Lite.

Defining charts as metadata in a knowledge graph captures semantic context and visual context while providing interactive, human-interpretable documentation of the contents of a knowledge graph. These chart representations may also be considered a form of “visualization data,” an emerging data format relevant to the application of artificial intelligence (AI) to visualization generation, enhancement, and analysis.[42] The complementarity of SPARQL and Vega-Lite make this approach to scientific data visualization well-aligned with the FAIR principles by preserving machine-interpretability of underlying data while simultaneously providing an interactive means for domain experts to explore the contents of a knowledge graph.

Methods

In this section we describe the metadata model for charts, show a minimal example of a functional chart, and describe how charts are created and managed.

Metadata for a chart

Expressing data queries and chart specifications as text allows them to be stored as string literals in the knowledge graph. We assign each chart URI to the class sio:Chart from the Semanticscience Integrated Ontology[43], along with metadata corresponding to the widely-adopted Dublin Core, Schema.org, and FOAF vocabularies. In addition to the SPARQL query (i.e., semantic context) and Vega-Lite specification (i.e., visual context), we include a title, description, and thumbnail depiction of each chart (Fig. 5). When published to the knowledge graph, provenance metadata (when a chart was created and by which logged-in user) are captured as extensions of a named graph using the nanopublication framework.[44]


Fig5 Deagen SciData22 9.png

Figure 5. Metadata describing a chart resource in the knowledge graph. Each chart instance is a member of the class sio:Chart, with metadata including a thumbnail depiction (created at the time of chart publication) as well as string literals defining the title, description, query, and chart specification. URI namespace prefixes are shown at the bottom.

Concise specification of chart metadata

The raw data within a chart are not explicitly enumerated in its metadata but are instead captured implicitly via the SPARQL query. This method allows charts to accommodate data instances added to or updated within the knowledge graph at a future point in time. Here, we demonstrate a minimal (non-interactive) Vega-Lite chart that displays the count of research articles curated into the knowledge graph as a function of the year each article was published (Fig. 6). Combined, the query and chart specification require only 300 characters. Behind the scenes, the SPARQL engine processes the query to collect all available matches in the knowledge graph. The result of this query is a set of tabular data with two variable attributes (i.e., DOI, Year) which occupy nearly 10,000 characters if serialized as a single string. Tabular query results are passed to the Vega-Lite renderer, which processes the chart specification, performs an aggregation operation, formats the axes, and draws the data marks according to the specified encodings and default parameter values. As content is added to the knowledge graph, the tabular data returned by running the query will capture those new instances, and the Vega-Lite chart will reflect those data when compiled and rendered. Although the bar chart represents a minimal example, the figure shows how the sizes of the SPARQL query and Vega-Lite specification compare to other, more elaborate data graphics presented in this article.


Fig6 Deagen SciData22 9.png

Figure 6. Minimal example of a chart specification. The bar chart in the lower left, showing the counts of curated research articles grouped by year of publication, was generated from a query and chart specification each containing approximately 150 characters. The faceted plot in the lower right, also generated using SPARQL and Vega-Lite, encodes character counts as the size of point marks to compare the relative brevity of this minimal example query and chart specification to other interactive, layered, and stylized charts featured in this article.

Browsing and creating charts

We use Whyis[45], a Python Flask application for knowledge graphs, to upload and manage charts in the knowledge graph. All instances of sio:Chart currently populate a paginated gallery featuring the thumbnail depiction, chart title, preview of the description, and link to the chart URI. By clicking on a chart, a user is directed to a chart instance view which queries the knowledge graph, displays the chart title and description, and renders the Vega-Lite chart. Icons above the chart allow the user to view the SPARQL query and Vega-Lite chart specification. Given the many possible ways to visualize a tabular dataset, we also enable the user to explore the raw data returned by the query inside an instance of Data Voyager[46][47], which provides a drag-and-drop interface for defining chart encodings and exploring recommended views.

To add a chart to the knowledge graph, a user enters the SPARQL query, Vega-Lite specification, title, and description into the custom chart editor interface in Whyis. SPARQL query syntax highlighting is enabled by embedding a YASGUI interface.[48] On the opposite panel of this interface, the user can toggle between views of the raw data as a table or Vega-Lite chart. When a chart is saved, a nanopublication is published to the knowledge graph, a backup of the chart metadata is created in MongoDB, and the chart joins the gallery of charts. At present, all charts published to the knowledge graph are publicly available. Future deployments may consider whether to offer tiered access or intermediate publication, for example saving progress on a chart under development or publishing with a limited scope (e.g., to a research team). In the meantime, development of charts can occur by using an offline platform such as Visual Studio Code with Vega-Lite plug-in, or by using an online platform such as Observable.

Framing the FAIR guiding principles for data graphics

Here, we highlight several of the guiding principles of FAIR laid out by Wilkinson et al.[5] and relate them to design decisions around the combined approach of SPARQL and Vega-Lite for scientific data visualization:

  • “F1. (Meta)data are assigned a globally unique and persistent identifier”: Chart objects, modeled as the combination of a SPARQL query and Vega-Lite specification among other metadata, are assigned a globally unique URI.
  • “A1. (Meta)data are retrievable by their identifier using a standardized communications protocol”: A chart object is dereferenceable through its HTTP URI, and data objects within a chart that have their own dereferenceable URIs can use image marks or the hyperlink encoding channel in the Vega-Lite specification.
  • “A1.1. The protocol is open, free, and universally implementable”: A public SPARQL endpoint provides access to the data, and the free, open-source software developed by the Vega-Lite community enables the rendering of valid chart specifications.
  • “A1.2. The protocol allows for an authentication and authorization procedure, where necessary”: To ensure provenance of chart objects, posting to the knowledge graph is limited to authenticated users who are logged into the web application.
  • “I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation”: The RDF (meta)data model has been employed to capture the semantic relationships between a chart object, its associated metadata, and its provenance.
  • “I2. (Meta)data use vocabularies that follow FAIR principles”: We use the Semanticscience Integrated Ontology[43] along with Dublin Core, Schema.org, and FOAF vocabularies.
  • “R1.2. (Meta)data are associated with detailed provenance”: The Whyis knowledge graph framework[45], using the concept of nanopublications[44], captures provenance metadata for charts posted to the knowledge graph, such as the creator and time of publication.
  • “R1.3. (Meta)data meet domain-relevant community standards”: Vega-Lite provides a high degree of expressivity, of which the figures in this article provide a small sample. The approach could also apply to scientific domains that utilize geospatial visualizations using Vega-Lite’s geographic projection abilities. Scientific visualizations outside of the scope of Vega-Lite (e.g., 3D models, molecular structures, annotated schematics, force-directed graphs, animations) can instead use static depictions with image marks and a hyperlink encoding channel to link to a more advanced representation.

Abbreviations, acronyms, and initialisms

  • AI: artificial intelligence
  • CORS: cross-origin resource sharing
  • DOI: digital object identifier
  • FAIR: findable, accessible, interoperable, reusable
  • HTTP: hypertext transfer protocol
  • RDF: Resource Description Framework
  • URI: uniform resource identifier
  • WWW: World Wide Web

Acknowledgements

The authors gratefully acknowledge support from the National Science Foundation through the NSF CSSI program (OAC-1835648, OAC-1835782), NSF DIBBs program (ACI-1640840), and NSF DMREF program (CMMI-1818574, CMMI-1729743, CMMI-1729452) in addition to ongoing collaborations with NIST, CHiMaD at Northwestern University, and funding from the Tetherless World Constellation at Rensselaer Polytechnic Institute. The authors also recognize the contributions from the Interactive Data Lab at the University of Washington and its alumni in developing and maintaining the open-source code and high-quality documentation of Vega-Lite; the teams behind Observable, DBpedia, and the SPARQL 1.1 query language for making possible the demonstration of interoperability; and Leland Wilkinson for an inspiring life and career that fundamentally reimagined data visualization through the grammar of graphics.

Data availability

The “living” versions of these interactive charts, which require an operational SPARQL endpoint and Whyis application, can be found in the MaterialsMine Gallery of Interactive Charts at https://materialsmine.org/wi/gallery. As a backup, archival versions of each interactive chart featured in this article (using a static snapshot of queried data) are available on Observable (https://observablehq.com/@mdeagen/archival-interactive-charts). A zipped folder with the query and chart specification for each chart featured in this article, as well as a snapshot of the data retrieved by the query, is available on Figshare (https://doi.org/10.6084/m9.figshare.19352258).[49]

Code availability

Source code for the Whyis application framework can be found on Github at https://github.com/tetherless-world/whyis/. Source code and documentation for the Vega-Lite project can be found on Github at https://github.com/vega/vega-lite. The W3C Recommendation for the SPARQL 1.1 Query Language can be found at https://www.w3.org/TR/sparql11-query/.

Conflict of interest

The authors declare no competing interests.

References

  1. Friendly, Michael (2008), "A Brief History of Data Visualization" (in en), Handbook of Data Visualization (Berlin, Heidelberg: Springer Berlin Heidelberg): 15–56, doi:10.1007/978-3-540-33037-0_2, ISBN 978-3-540-33036-3, http://link.springer.com/10.1007/978-3-540-33037-0_2. Retrieved 2024-06-16 
  2. Yi, Ji Soo; Kang, Youn ah; Stasko, John; Jacko, J.A. (1 November 2007). "Toward a Deeper Understanding of the Role of Interaction in Information Visualization". IEEE Transactions on Visualization and Computer Graphics 13 (6): 1224–1231. doi:10.1109/TVCG.2007.70515. ISSN 1077-2626. https://ieeexplore.ieee.org/document/4376144/. 
  3. Heer, Jeffrey; Shneiderman, Ben (1 April 2012). "Interactive dynamics for visual analysis" (in en). Communications of the ACM 55 (4): 45–54. doi:10.1145/2133806.2133821. ISSN 0001-0782. https://dl.acm.org/doi/10.1145/2133806.2133821. 
  4. Borgman, Christine L. (1 June 2012). "The conundrum of sharing research data" (in en). Journal of the American Society for Information Science and Technology 63 (6): 1059–1078. doi:10.1002/asi.22634. ISSN 1532-2882. https://onlinelibrary.wiley.com/doi/10.1002/asi.22634. 
  5. 5.0 5.1 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. https://www.nature.com/articles/sdata201618. 
  6. Draxl, Claudia; Scheffler, Matthias (1 September 2018). "NOMAD: The FAIR concept for big data-driven materials science" (in en). MRS Bulletin 43 (9): 676–682. doi:10.1557/mrs.2018.208. ISSN 0883-7694. http://link.springer.com/10.1557/mrs.2018.208. 
  7. Himanen, Lauri; Geurts, Amber; Foster, Adam Stuart; Rinke, Patrick (1 November 2019). "Data‐Driven Materials Science: Status, Challenges, and Perspectives" (in en). Advanced Science 6 (21): 1900808. doi:10.1002/advs.201900808. ISSN 2198-3844. PMC PMC6839624. PMID 31728276. https://onlinelibrary.wiley.com/doi/10.1002/advs.201900808. 
  8. 8.0 8.1 Brinson, L. Catherine; Deagen, Michael; Chen, Wei; McCusker, James; McGuinness, Deborah L.; Schadler, Linda S.; Palmeri, Marc; Ghumman, Umar et al. (18 August 2020). "Polymer Nanocomposite Data: Curation, Frameworks, Access, and Potential for Discovery and Design" (in en). ACS Macro Letters 9 (8): 1086–1094. doi:10.1021/acsmacrolett.0c00264. ISSN 2161-1653. https://pubs.acs.org/doi/10.1021/acsmacrolett.0c00264. 
  9. Horton, M. K.; Dwaraknath, S.; Persson, K. A. (14 January 2021). "Promises and perils of computational materials databases" (in en). Nature Computational Science 1 (1): 3–5. doi:10.1038/s43588-020-00016-5. ISSN 2662-8457. https://www.nature.com/articles/s43588-020-00016-5. 
  10. Warren, James A.; Ward, Charles H. (1 September 2018). "Evolution of a Materials Data Infrastructure" (in en). JOM 70 (9): 1652–1658. doi:10.1007/s11837-018-2968-z. ISSN 1047-4838. http://link.springer.com/10.1007/s11837-018-2968-z. 
  11. Berners-Lee, Tim; Hendler, James; Lassila, Ora (1 May 2001). "The Semantic Web". Scientific American 284 (5): 34–43. doi:10.1038/scientificamerican0501-34. ISSN 0036-8733. https://www.scientificamerican.com/article/the-semantic-web. 
  12. Hogan, Aidan; Blomqvist, Eva; Cochez, Michael; D’amato, Claudia; Melo, Gerard De; Gutierrez, Claudio; Kirrane, Sabrina; Gayo, José Emilio Labra et al. (31 May 2022). "Knowledge Graphs" (in en). ACM Computing Surveys 54 (4): 1–37. doi:10.1145/3447772. ISSN 0360-0300. https://dl.acm.org/doi/10.1145/3447772. 
  13. Polleres, Axel; Kamdar, Maulik Rajendra; Fernández, Javier David; Tudorache, Tania; Musen, Mark Alan (31 January 2020). Hitzler, Pascal; Janowicz, Krzysztof. eds. "A more decentralized vision for Linked Data". Semantic Web 11 (1): 101–113. doi:10.3233/SW-190380. https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-190380. 
  14. Skjæveland, Martin G. (2015), Simperl, Elena; Norton, Barry; Mladenic, Dunja et al.., eds., "Sgvizler: A JavaScript Wrapper for Easy Visualization of SPARQL Result Sets" (in en), The Semantic Web: ESWC 2012 Satellite Events (Berlin, Heidelberg: Springer Berlin Heidelberg) 7540: 361–365, doi:10.1007/978-3-662-46641-4_27, ISBN 978-3-662-46640-7, http://link.springer.com/10.1007/978-3-662-46641-4_27. Retrieved 2024-06-16 
  15. Alonen, Miika; Kauppinen, Tomi; Suominen, Osma; Hyvönen, Eero (2013), Salinesi, Camille; Norrie, Moira C.; Pastor, Óscar, eds., "Exploring the Linked University Data with Visualization Tools", Advanced Information Systems Engineering (Berlin, Heidelberg: Springer Berlin Heidelberg) 7908: 204–208, doi:10.1007/978-3-642-41242-4_25, ISBN 978-3-642-38708-1, http://link.springer.com/10.1007/978-3-642-41242-4_25. Retrieved 2024-06-16 
  16. Graves, Alvaro (12 June 2013). "Creation of visualizations based on linked data" (in en). Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics (Madrid Spain: ACM): 1–12. doi:10.1145/2479787.2479828. ISBN 978-1-4503-1850-1. https://dl.acm.org/doi/10.1145/2479787.2479828. 
  17. Thellmann, Klaudia; Galkin, Michael; Orlandi, Fabrizio; Auer, Sören (2015), Arenas, Marcelo; Corcho, Oscar; Simperl, Elena et al.., eds., "LinkDaViz – Automatic Binding of Linked Data to Visualizations" (in en), The Semantic Web - ISWC 2015 (Cham: Springer International Publishing) 9366: 147–162, doi:10.1007/978-3-319-25007-6_9, ISBN 978-3-319-25006-9, http://link.springer.com/10.1007/978-3-319-25007-6_9. Retrieved 2024-06-16 
  18. Krommyda, Maria; Kantere, Verena (1 September 2019). "Understanding SPARQL Endpoints through Targeted Exploration and Visualization". 2019 First International Conference on Graph Computing (GC) (Laguna Hills, CA, USA: IEEE): 21–28. doi:10.1109/GC46384.2019.00012. ISBN 978-1-7281-4129-9. https://ieeexplore.ieee.org/document/9030980/. 
  19. De Donato, Renato; Garofalo, Martina; Malandrino, Delfina; Pellegrino, Maria Angela; Petta, Andrea; Scarano, Vittorio (2020), Blomqvist, Eva; Groth, Paul; de Boer, Victor et al.., eds., "QueDI: From Knowledge Graph Querying to Data Visualization" (in en), Semantic Systems. In the Era of Knowledge Graphs (Cham: Springer International Publishing) 12378: 70–86, doi:10.1007/978-3-030-59833-4_5, ISBN 978-3-030-59832-7, PMC PMC7586436, http://link.springer.com/10.1007/978-3-030-59833-4_5. Retrieved 2024-06-16 
  20. Li, Haotian; Wang, Yong; Zhang, Songheng; Song, Yangqiu; Qu, Huamin (1 January 2022). "KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation". IEEE Transactions on Visualization and Computer Graphics 28 (1): 195–205. doi:10.1109/TVCG.2021.3114863. ISSN 1077-2626. https://ieeexplore.ieee.org/document/9552844/. 
  21. Papadaki, Maria-Evangelia; Spyratos, Nicolas; Tzitzikas, Yannis (25 January 2021). "Towards Interactive Analytics over RDF Graphs" (in en). Algorithms 14 (2): 34. doi:10.3390/a14020034. ISSN 1999-4893. https://www.mdpi.com/1999-4893/14/2/34. 
  22. Wilkinson, Leland (2012), Gentle, James E.; Härdle, Wolfgang Karl; Mori, Yuichi, eds., "The Grammar of Graphics" (in en), Handbook of Computational Statistics (Berlin, Heidelberg: Springer Berlin Heidelberg): 375–414, doi:10.1007/978-3-642-21551-3_13, ISBN 978-3-642-21550-6, http://link.springer.com/10.1007/978-3-642-21551-3_13. Retrieved 2024-06-16 
  23. Bostock, M.; Heer, J. (1 November 2009). "Protovis: A Graphical Toolkit for Visualization". IEEE Transactions on Visualization and Computer Graphics 15 (6): 1121–1128. doi:10.1109/TVCG.2009.174. ISSN 1077-2626. http://ieeexplore.ieee.org/document/5290720/. 
  24. Bostock, M.; Ogievetsky, V.; Heer, J. (1 December 2011). "D³ Data-Driven Documents". IEEE Transactions on Visualization and Computer Graphics 17 (12): 2301–2309. doi:10.1109/TVCG.2011.185. ISSN 1077-2626. http://ieeexplore.ieee.org/document/6064996/. 
  25. Wickham, Hadley (1 March 2011). "ggplot2" (in en). WIREs Computational Statistics 3 (2): 180–185. doi:10.1002/wics.147. ISSN 1939-5108. https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.147. 
  26. Satyanarayan, Arvind; Russell, Ryan; Hoffswell, Jane; Heer, Jeffrey (31 January 2016). "Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization". IEEE Transactions on Visualization and Computer Graphics 22 (1): 659–668. doi:10.1109/TVCG.2015.2467091. ISSN 1077-2626. http://ieeexplore.ieee.org/document/7192704/. 
  27. Satyanarayan, Arvind; Moritz, Dominik; Wongsuphasawat, Kanit; Heer, Jeffrey (1 January 2017). "Vega-Lite: A Grammar of Interactive Graphics". IEEE Transactions on Visualization and Computer Graphics 23 (1): 341–350. doi:10.1109/TVCG.2016.2599030. ISSN 1077-2626. http://ieeexplore.ieee.org/document/7539624/. 
  28. Stolte, C.; Tang, D.; Hanrahan, P. (Jan.-March/2002). "Polaris: a system for query, analysis, and visualization of multidimensional relational databases". IEEE Transactions on Visualization and Computer Graphics 8 (1): 52–65. doi:10.1109/2945.981851. http://ieeexplore.ieee.org/document/981851/. 
  29. Hanrahan, Pat (27 June 2006). "VizQL: a language for query, analysis and visualization" (in en). Proceedings of the 2006 ACM SIGMOD international conference on Management of data (Chicago IL USA: ACM): 721–721. doi:10.1145/1142473.1142560. ISBN 978-1-59593-434-5. https://dl.acm.org/doi/10.1145/1142473.1142560. 
  30. Tang, Nan; Wu, Eugene; Li, Guoliang (25 June 2019). "Towards Democratizing Relational Data Visualization" (in en). Proceedings of the 2019 International Conference on Management of Data (Amsterdam Netherlands: ACM): 2025–2030. doi:10.1145/3299869.3314029. ISBN 978-1-4503-5643-5. https://dl.acm.org/doi/10.1145/3299869.3314029. 
  31. Zhao, He; Wang, Yixing; Lin, Anqi; Hu, Bingyin; Yan, Rui; McCusker, James; Chen, Wei; McGuinness, Deborah L. et al. (1 November 2018). "NanoMine schema: An extensible data representation for polymer nanocomposites" (in en). APL Materials 6 (11): 111108. doi:10.1063/1.5046839. ISSN 2166-532X. https://pubs.aip.org/apm/article/6/11/111108/121743/NanoMine-schema-An-extensible-data-representation. 
  32. McCusker, Jamie P.; Keshan, Neha; Rashid, Sabbir; Deagen, Michael; Brinson, Cate; McGuinness, Deborah L. (2020), Pan, Jeff Z.; Tamma, Valentina; d’Amato, Claudia et al.., eds., "NanoMine: A Knowledge Graph for Nanocomposite Materials Science" (in en), The Semantic Web – ISWC 2020 (Cham: Springer International Publishing) 12507: 144–159, doi:10.1007/978-3-030-62466-8_10, ISBN 978-3-030-62465-1, https://link.springer.com/10.1007/978-3-030-62466-8_10. Retrieved 2024-06-16 
  33. 33.0 33.1 33.2 Shneiderman, B. (1996). "The eyes have it: a task by data type taxonomy for information visualizations". Proceedings 1996 IEEE Symposium on Visual Languages (Boulder, CO, USA: IEEE Comput. Soc. Press): 336–343. doi:10.1109/VL.1996.545307. ISBN 978-0-8186-7508-9. http://ieeexplore.ieee.org/document/545307/. 
  34. 34.0 34.1 Hu, Bingyin; Lin, Anqi; Brinson, L. Catherine (1 December 2021). "ChemProps: A RESTful API enabled database for composite polymer name standardization" (in en). Journal of Cheminformatics 13 (1): 22. doi:10.1186/s13321-021-00502-6. ISSN 1758-2946. PMC PMC7955638. PMID 33712066. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00502-6. 
  35. Bandyopadhyay, A.; De Sarkar, M.; Bhowmick, A. K. (1 October 2005). "Poly(vinyl alcohol)/silica hybrid nanocomposites by sol-gel technique: Synthesis and properties" (in en). Journal of Materials Science 40 (19): 5233–5241. doi:10.1007/s10853-005-4417-y. ISSN 0022-2461. http://link.springer.com/10.1007/s10853-005-4417-y. 
  36. Natarajan, Bharath; Li, Yang; Deng, Hua; Brinson, L. Catherine; Schadler, Linda S. (9 April 2013). "Effect of Interfacial Energetics on Dispersion and Glass Transition Temperature in Polymer Nanocomposites" (in en). Macromolecules 46 (7): 2833–2841. doi:10.1021/ma302281b. ISSN 0024-9297. https://pubs.acs.org/doi/10.1021/ma302281b. 
  37. Lebo, T.; Graves, A.; McGuinness, D.L. (1 October 2013). "Content-Preserving Graphics". Tetherless World Publications. Rensselaer Polytechnic Institute. https://hdl.handle.net/20.500.13015/4523. 
  38. 38.0 38.1 Lehmann, Jens; Isele, Robert; Jakob, Max; Jentzsch, Anja; Kontokostas, Dimitris; Mendes, Pablo N.; Hellmann, Sebastian; Morsey, Mohamed et al. (2015). "DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia". Semantic Web 6 (2): 167–195. doi:10.3233/SW-140134. https://www.medra.org/servlet/aliasResolver?alias=iospress&doi=10.3233/SW-140134. 
  39. Walny, Jagoda; Frisson, Christian; West, Mieka; Kosminsky, Doris; Knudsen, Soren; Carpendale, Sheelagh; Willett, Wesley (1 January 2020). "Data Changes Everything: Challenges and Opportunities in Data Visualization Design Handoff". IEEE Transactions on Visualization and Computer Graphics 26 (1): 12–22. doi:10.1109/TVCG.2019.2934538. ISSN 1077-2626. https://ieeexplore.ieee.org/document/8816695/. 
  40. Masson, Damien; Malacria, Sylvain; Lank, Edward; Casiez, Géry (21 April 2020). "Chameleon: Bringing Interactivity to Static Digital Documents" (in en). Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu HI USA: ACM): 1–13. doi:10.1145/3313831.3376559. ISBN 978-1-4503-6708-0. https://dl.acm.org/doi/10.1145/3313831.3376559. 
  41. Raji, Mohammad; Duncan, Jeremiah; Hobson, Tanner; Huang, Jian (1 September 2021). "Dataless Sharing of Interactive Visualization". IEEE Transactions on Visualization and Computer Graphics 27 (9): 3656–3669. doi:10.1109/TVCG.2020.2984708. ISSN 1077-2626. https://ieeexplore.ieee.org/document/9056539/. 
  42. Wu, Aoyu; Wang, Yun; Shu, Xinhuan; Moritz, Dominik; Cui, Weiwei; Zhang, Haidong; Zhang, Dongmei; Qu, Huamin (1 December 2022). "AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization". IEEE Transactions on Visualization and Computer Graphics 28 (12): 5049–5070. doi:10.1109/TVCG.2021.3099002. ISSN 1077-2626. https://ieeexplore.ieee.org/document/9495259/. 
  43. 43.0 43.1 Dumontier, Michel; Baker, Christopher JO; Baran, Joachim; Callahan, Alison; Chepelev, Leonid; Cruz-Toledo, José; Del Rio, Nicholas R; Duck, Geraint et al. (1 December 2014). "The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery" (in en). Journal of Biomedical Semantics 5 (1): 14. doi:10.1186/2041-1480-5-14. ISSN 2041-1480. PMC PMC4015691. PMID 24602174. https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-5-14. 
  44. 44.0 44.1 Kuhn, Tobias; Meroño-Peñuela, Albert; Malic, Alexander; Poelen, Jorrit H.; Hurlbert, Allen H.; Ortiz, Emilio Centeno; Furlong, Laura I.; Queralt-Rosinach, Núria et al. (2018). "Nanopublications: A Growing Resource of Provenance-Centric Scientific Linked Data". arXiv. doi:10.48550/ARXIV.1809.06532. https://arxiv.org/abs/1809.06532. 
  45. 45.0 45.1 McCusker, J.; Rashid, S.; Agu, N. et al. (1 October 2018). "The Whyis Knowledge Graph Framework in Action". Tetherless World Publications. Rensselaer Polytechnic Institute. https://hdl.handle.net/20.500.13015/4443. 
  46. Wongsuphasawat, Kanit; Moritz, Dominik; Anand, Anushka; Mackinlay, Jock; Howe, Bill; Heer, Jeffrey (31 January 2016). "Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations". IEEE Transactions on Visualization and Computer Graphics 22 (1): 649–658. doi:10.1109/TVCG.2015.2467191. ISSN 1077-2626. http://ieeexplore.ieee.org/document/7192728/. 
  47. Wongsuphasawat, Kanit; Qu, Zening; Moritz, Dominik; Chang, Riley; Ouk, Felix; Anand, Anushka; Mackinlay, Jock; Howe, Bill et al. (2 May 2017). "Voyager 2: Augmenting Visual Analysis with Partial View Specifications" (in en). Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver Colorado USA: ACM): 2648–2659. doi:10.1145/3025453.3025768. ISBN 978-1-4503-4655-9. https://dl.acm.org/doi/10.1145/3025453.3025768. 
  48. Rietveld, Laurens; Hoekstra, Rinke (2013), Salinesi, Camille; Norrie, Moira C.; Pastor, Óscar, eds., "YASGUI: Not Just Another SPARQL Client", Advanced Information Systems Engineering (Berlin, Heidelberg: Springer Berlin Heidelberg) 7908: 78–86, doi:10.1007/978-3-642-41242-4_7, ISBN 978-3-642-38708-1, http://link.springer.com/10.1007/978-3-642-41242-4_7. Retrieved 2024-06-16 
  49. Deagen, Michael (2022), "Chart metadata and snapshots of data from March 13, 2022", Figshare, doi:10.6084/m9.figshare.19352258.v1, https://figshare.com/articles/dataset/Chart_metadata_and_snapshots_of_data_from_March_13_2022/19352258/1 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.