Journal:Building open access to research (OAR) data infrastructure at NIST
Full article title | Building open access to research (OAR) data infrastructure at NIST |
---|---|
Journal | Data Science Journal |
Author(s) | Greene, Gretchen; Plante, Raymond; Hanisch, Robert |
Author affiliation(s) | National Institute of Standards and Technology |
Primary contact | Email: gretchen dot greene at nist dot gov |
Year published | 2019 |
Volume and issue | 18(1) |
Page(s) | 30 |
DOI | 10.5334/dsj-2019-030 |
ISSN | 1683-1470 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://datascience.codata.org/articles/10.5334/dsj-2019-030/ |
Download | https://datascience.codata.org/articles/10.5334/dsj-2019-030/galley/861/download/ (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
As a National Metrology Institute (NMI), the U.S. National Institute of Standards and Technology (NIST) scientists, engineers, and technology experts conduct research across a full spectrum of physical science domains. NIST is a non-regulatory agency within the U.S. Department of Commerce with a mission to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. NIST research results in the production and distribution of standard reference materials, [[calibration services, and datasets. These are generated from a wide range of complex laboratory instrumentation, expert analyses, and calibration processes. In response to a government open data policy, and in collaboration with the broader research community, NIST has developed a federated Open Access to Research (OAR) scientific data infrastructure aligned with FAIR (findable, accessible, interoperable, reusable) data principles. Through the OAR initiatives, NIST's Material Measurement Laboratory Office of Data and Informatics (ODI) recently released a new scientific data discovery portal and public data repository. These science-oriented applications provide dissemination and public access for data from across the broad spectrum of NIST research disciplines, including chemistry, biology, materials science (such as crystallography, nanomaterials, etc.), physics, disaster resilience, cyberinfrastructure, communications, forensics, and others. NIST's public data consist of carefully curated Standard Reference Data, legacy high valued data, and new research data publications. The repository is thus evolving both in content and features as the nature of research progresses. Implementation of the OAR infrastructure is key to NIST's role in sharing high-integrity, reproducible research for measurement science in a rapidly changing world.
Keywords: data repository, FAIR, research metadata, metrology, data portal, government
Introduction
NIST research is predominantly characterized as “long tail” in terms of the data produced, i.e., small datasets that are highly varied in topic and content.[1] This is colloquially described as “a mile wide and an inch deep” and may be classified as big data in context of variety and veracity. Newer, more modern laboratory instrumentation such as nuclear magnetic resonance spectrometers, electron microscopes, synchrotron beamlines, and high-performance computers usher NIST into the realm of managing the velocity and volume of big data. Furthermore, new strategic initiatives in the areas of artificial intelligence (AI) require an infrastructure designed to support digital mining and transformation. Management and exchange of the underlying research domain-specific data with both internal and external communities are important considerations for the OAR architecture and implementation.
The overarching goal of OAR is to deliver a robust research data infrastructure to share the results of NIST research with the community at large. Our strategy for achieving this goal involves collaborative data science as demonstrated through usage statistics from astronomical archives’ data discovery and access patterns.[2] Organizations face many challenges striving to balance rapid advancements in technology and data driven research with internal operational costs and constraints. To meet these challenges, NIST assembled a diverse group of experts with key leaders and engaged stakeholders via cross-organizational advisors. This resulted in a joint effort to build an integrated system engineered to support data workflow processes, systems infrastructure, and public dissemination with secure publicly accessible platforms for scientific collaboration.
At the onset of the OAR project, priority was placed on developing a system that would allow us to comply with government open data policy.[3] This resulted in a baseline Minimum Viable Product (MVP), delivering a NIST public data listing (PDL) which enforces adherence to a new government data standard semantic model, the Project Open Data (POD) schema. The NIST PDL continues to be routinely harvested by the Department of Commerce and made available through the U.S. data.gov web portal, which hosts records of all POD-compliant government public datasets. Following enactment of the OPEN Government Data Act[4], updates and compliance of our OAR infrastructure will be further advanced.
References
- ↑ "Long Tail of Data: e-IRG Task Force Report" (PDF). e-IRG Secretariat. September 2016. http://e-irg.eu/documents/10920/238968/LongTailOfData2016.pdf. Retrieved 29 January 2019.
- ↑ White, R.L.; Accomazzi, A.; Berriman, G.B. et al. (2009). "The High Impact of Astronomical Data Archives". Astro2010: The Astronomy and Astrophysics Decadal Survey: 64. https://ui.adsabs.harvard.edu/abs/2009astro2010P..64W/abstract.
- ↑ Burwell, S.M.; VanRoekel, S.; Park, T.; Mancini, D.J. (9 May 2013). "Open Data Policy—Managing Information as an Asset" (PDF). M-13-13 Memorandum for the Heads of Executive Departments and Agencies. https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf. Retrieved 20 April 2019.
- ↑ "Title II - Open Government Data Act". HR 4174: Foundations for Evidence-Based Policymaking Act of 2018. 115th Congress. 2018. https://www.congress.gov/bill/115th-congress/house-bill/4174/text#toc-H8E449FBAEFA34E45A6F1F20EFB13ED95. Retrieved 29 January 2019.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.