Journal:Principles of metadata organization at the ENCODE data coordination center

From LIMSWiki
Revision as of 19:04, 6 September 2016 by Shawndouglas (talk | contribs) (Authors)
Jump to navigationJump to search
Full article title Principles of metadata organization at the ENCODE data coordination center
Journal Database
Author(s) Hong, Eurie L.; Sloan, Cricket A.; Chan, Esther T.; Davidson, Jean M.; Malladi, Venkat S.; Strattan, J. Seth; Hitz, Benjamin C.;
Gabdank, Idan; Narayanan, Aditi K.; Ho, Marcus; Lee, Brian T.; Rowe, Laurence D.; Dreszer, Timothy R.; Roe, Greg R.;
Podduturi, Nikhil R.; Tanaka, Forrest; Hilton, Jason A.; Cherry, J. Michael
Author affiliation(s) Stanford University, University of California - Santa Cruz
Primary contact Email: cherry at stanford dot edu
Year published 2016
Page(s) baw001
DOI 10.1093/database/baw001
ISSN 1758-0463
Distribution license Creative Commons Attribution 4.0 International
Website http://database.oxfordjournals.org/content/2016/baw001
Download http://database.oxfordjournals.org/content/2016/baw001.full.pdf+html (PDF)

Abstract

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/).

Database URL: www.encodeproject.org

Introduction

The goal of the Encyclopedia of DNA Elements (ENCODE) project is to annotate functional regions in the human and mouse genomes. Functional regions include those that code protein-coding or non-coding RNA gene products as well as regions that could have a regulatory role.[1][2] To this end, the project has surveyed the landscape of the human genome using over 35 high-throughput experimental methods in  more than 250 different cell and tissue types, resulting in over 4000 experiments.[1][3] These datasets are submitted to a Data Coordinating Center (DCC), whose role is to describe, organize and provide access to these diverse datasets.[4]

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.