Journal:Data sharing at scale: A heuristic for affirming data cultures

From LIMSWiki
Revision as of 21:26, 7 October 2019 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Data sharing at scale: A heuristic for affirming data cultures
Journal Data Science Journal
Author(s) Poirier, Lindsay; Costelloe-Kuehn, Brandon
Author affiliation(s) University of California - Davis, Rensselaer Polytechnic Institute
Primary contact Email: lnpoirier at ucdavis dot edu
Year published 2019
Volume and issue 18(1)
Page(s) 48
DOI 10.5334/dsj-2019-048
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2019-048/
Download https://datascience.codata.org/articles/10.5334/dsj-2019-048/galley/896/download/ (PDF)

Abstract

Addressing the most pressing contemporary social, environmental, and technological challenges will require integrating insights and sharing data across disciplines, geographies, and cultures. Strengthening international data sharing networks will not only demand advancing technical, legal, and logistical infrastructure for publishing data in open, accessible formats; it will also require recognizing, respecting, and learning to work across diverse data cultures. This essay introduces a heuristic for pursuing richer characterizations of the “data cultures” at play in international, interdisciplinary data sharing. The heuristic prompts cultural analysts to query the contexts of data sharing for a particular discipline, institution, geography, or project at seven scales: the meta, macro, meso, micro, techno, data, and nano. The essay articulates examples of the diverse cultural forces acting upon and interacting with researchers in different communities at each scale. The heuristic we introduce in this essay aims to elicit from researchers the beliefs, values, practices, incentives, and restrictions that impact how they think about and approach data sharing. Rather than represent an effort to iron out differences between disciplines, this essay instead intends to showcase and affirm the diversity of traditions and modes of analysis that have shaped how data gets collected, organized, and interpreted in diverse settings.

Keywords: data sharing, data culture, ethnography, data friction, metadata

Introduction

In the 1980s, the European Organization for Nuclear Research (CERN) was the most prominent particle physics laboratory in the world and at the cutting edge of coordinating international scientific research. Herwig Schopper, Director-General for CERN, 1981–1988, describes the time as provoking “a new ‘sociology’ for international scientific collaboration”[1]; with over 30 countries participating in experiments, the challenges for keeping track of researchers, workflows, and scientific data were enormous.

CERN hired Tim Berners-Lee as a contract programmer in 1980. To help keep track of projects, he toyed with designing Enquire, a knowledge organization system that enabled users to organize their data by creating links between documents stored in separate locations. Berners-Lee landed a fellowship in the Data Acquisition and Control division in 1983, a time when CERN was upgrading its computing infrastructure to better network globally distributed researchers in laboratories that each followed their own methods, used their own operating systems, and often spoke different languages. In describing the systems that were proposed for addressing these challenges, Berners-Lee writes[2]:

I had seen numerous developers arrive at CERN to tout systems that “helped” people organize information. They’d say, “To use this system all you have to do is divide all your documents into four categories” or “You just have to save your data as a WordWonderful document” or whatever. I saw one protagonist after the next shot down in flames by indignant researchers because the developers were forcing them to reorganize their work to fit the system. I would have to create a system with common rules that would be acceptable to everyone. This meant as close as possible to no rules at all.

The challenge was not to compel researchers to adopt a new standard; instead the challenge was learning to recognize and respect the different data cultures that guided how diverse researchers approached their work.[a] Berners-Lee’s Enquire eventually evolved into a proposal for what inevitably became the World Wide Web, perhaps the most widely adopted information infrastructure in the world, in large part because the system has very few rules prescribing how users should organize their knowledge within it.

Today, we are contending with a sociology for international scientific collaboration on a much larger scale. Addressing the most pressing contemporary social, environmental, and technological challenges will require integrating insights and sharing data across disciplines, geographies, and cultures. Research into the socio-technical challenges of data sharing has begun to characterize complications that arise as researchers in different communities work to align their data cultures.[3] The process of integrating complex and heterogeneous data generated in different geographies, according to different disciplinary standards, and motivated by different epistemic commitments and incentive structures, can produce “friction,” demanding that researchers make compromises to find common ground.[4] Different disciplines may speak different “languages,” making it difficult to devise shared schemas and ontologies. Perhaps most notably, researchers in different settings often have diverse rationales for valuing data preservation, contextualization, integration, and dissemination. Strengthening international data sharing networks will not only demand advancing technical, legal, and logistical infrastructure for publishing data in open, accessible ways; it will also require recognizing, respecting, and learning to work across diverse data cultures. As Berners-Lee observed of collaborative research practice at CERN in the 1980s, prescriptively forcing researchers to reorganize their work to fit a standard limits adoption and collaboration. This essay, informed by our work exploring diverse data sharing communities at the Research Data Alliance (RDA), will introduce a heuristic we’ve developed in order to pursue richer characterizations of the “data cultures” at play.

Footnotes

  1. We refer to “researchers” here quite expansively to denote any individual involved in the collection, designation, analysis, stewardship, and/or use of empirical data. This may refer to scientists, humanists, industrial analysts, and government actors in a variety of locations.

References

  1. Schopper, H. (28 March 2014). "The 1980s: spurring collaboration". CERN Courier. https://cerncourier.com/a/viewpoint-the-1980s-spurring-collaboration/. 
  2. Berners-Lee, T. (2000). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. HarperCollins. p. 15. ISBN 9780062515872. 
  3. Borgman, C.L. (2012). "The conundrum of sharing research data". Journal of the American Society for Information Science and Technology 63 (6): 1059–78. doi:10.1002/asi.22634. 
  4. Edwards, P.N.; Mayernik, M.S.; Batcheller, A.L. et al. (2011). "Science friction: Data, metadata, and collaboration". Social Studies of Science 41 (5): 667-690. doi:10.1177/0306312711413314. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation and grammar. In some cases important information was missing from the references, and that information was added. The original article had citations listed alphabetically; they are listed in the order they appear here due to the way the wiki works. To more easily differentiate footnotes from references, the original footnotes (which where numbered) were updated to use lowercase letters.