Difference between revisions of "User:Shawndouglas/sandbox/sublevel2"
Shawndouglas (talk | contribs) (Blanked the page) |
Shawndouglas (talk | contribs) |
||
Line 1: | Line 1: | ||
<div class="nonumtoc">__TOC__</div> | |||
{{ombox | |||
| type = notice | |||
| style = width: 960px; | |||
| text = This is sublevel2 of my sandbox, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see [[User_talk:Shawndouglas|my discussion page]] instead.<p></p> | |||
}} | |||
==Sandbox begins below== | |||
{{Infobox journal article | |||
|name = | |||
|image = | |||
|alt = <!-- Alternative text for images --> | |||
|caption = | |||
|title_full = Big data in the era of health information exchanges: Challenges and opportunities for public health | |||
|journal = ''Informatics'' | |||
|authors = Baseman, Janet G.; Revere, Debra; Painter, Ian | |||
|affiliations = University of Washington | |||
|contact = Email: jbaseman at uw dot edu | |||
|editors = Ge, Mouzhi; Dohnal, Vlastislav | |||
|pub_year = 2017 | |||
|vol_iss = '''4'''(4) | |||
|pages = 39 | |||
|doi = [http://10.3390/informatics4040039 10.3390/informatics4040039] | |||
|issn = 2227-9709 | |||
|license = [http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International] | |||
|website = [http://www.mdpi.com/2227-9709/4/4/39/htm http://www.mdpi.com/2227-9709/4/4/39/htm] | |||
|download = [http://www.mdpi.com/2227-9709/4/4/39/pdf http://www.mdpi.com/2227-9709/4/4/39/pdf] (PDF) | |||
}} | |||
{{ombox | |||
| type = content | |||
| style = width: 500px; | |||
| text = This article should not be considered complete until this message box has been removed. This is a work in progress. | |||
}} | |||
==Abstract== | |||
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of health care and public health systems. [[Health information exchange]]s (HIEs) which support electronic sharing of data and [[information]] between health care organizations are recognized as a source of "big data" in health care and have the potential to provide public health with a single stream of data collated across disparate systems and sources. However, given these data are not collected specifically to meet public health objectives, it is unknown whether a public health agency’s (PHA’s) secondary use of the data is supportive of or presents additional barriers to meeting disease reporting and surveillance needs. To explore this issue, we conducted an assessment of big data that is available to a PHA—[[Public health laboratory|laboratory]] test results and clinician-generated notifiable condition report data—through its participation in an HIE. | |||
'''Keywords''': big data, communicable diseases, data mining, data quality, epidemiology, health information exchange, infectious diseases, population surveillance, public health | |||
==Introduction== | |||
We evaluated two datasets—for sexually-transmitted infections (STIs) and non-STIs—for the time period of January 1, 2012 to September 15, 2013 used by a PHA that is part of one of the largest and oldest HIE infrastructures in the U.S. The two datasets were independently analyzed for their data quality, utility, and appropriateness for meeting public health surveillance objectives: (1) timeliness, defined as the difference between earliest date of a disease report and date the report is received at the PHA; (2) volume, defined as the number of disease report cases received by the PHA; and (3) completion, defined as the number of days to close a disease case report. | |||
Our assessment uncovered the following challenges for effective utilization of big data by public health: | |||
# While PHAs almost exclusively rely on secondary use data for surveillance, big data that has been collected for clinical purposes omits data fields of high value for public health. | |||
# Big data is not always smart data, especially when the context within which the data is collected is absent. | |||
# Data collected by disparate, varying systems and sources can introduce uncertainties and limit trustworthiness in the data, which may diminish its value for public health purposes. | |||
# The process by which data is obtained needs to be evident in order for big data to be useful to public health. | |||
# Big data for public health purposes needs to answer both "what" and "why" questions. | |||
Despite these and other issues—such as measurement error and confounding, well-known challenges to both big and small data—strategies traditionally employed by public health epidemiologists and other public health professionals can uncover limitations and contribute to the design of solutions in collection, integration, warehousing, and analysis of big data so its value and utility to public health can be optimized. | |||
In recognition of the 10 year anniversary of the incorporation of the internet search firm Google, the journal ''Nature'' issued a special supplement on big data and what the availability of large datasets meant and will mean for scientists and researchers [1]. In particular, the supplement focused on the opportunities that will be possible when issues such as interoperable [[Information management|data infrastructures]], [[Information security|security]], data standardization, storage and transfer requirements, and data governance are resolved. Now, nearly 10 years later, users of big data—characterized by the 5 Vs (huge volume, high velocity, high variety, low veracity, and high value)—still encounter the issues presented in the ''Nature'' special supplement [2]. In particular, the primary challenges to utilizing big data center around the diversity of data types (variety), the resources required to handle data collection, storage and processing (velocity), and uncertainties inherent in mixing and cleaning data from varied data streams that generates unpredictability in the data (veracity) [3]. | |||
==References== | |||
{{Reflist|colwidth=30em}} | |||
==Notes== | |||
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Several URL from the original were dead, and more current URLs were substituted. | |||
<!--Place all category tags here--> | |||
[[Category:LIMSwiki journal articles (added in 2018)]] | |||
[[Category:LIMSwiki journal articles (all)]] | |||
[[Category:LIMSwiki journal articles on data quality]] | |||
[[Category:LIMSwiki journal articles on public health informatics]] |
Revision as of 23:03, 13 August 2018
This is sublevel2 of my sandbox, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see my discussion page instead. |
Sandbox begins below
Full article title | Big data in the era of health information exchanges: Challenges and opportunities for public health |
---|---|
Journal | Informatics |
Author(s) | Baseman, Janet G.; Revere, Debra; Painter, Ian |
Author affiliation(s) | University of Washington |
Primary contact | Email: jbaseman at uw dot edu |
Editors | Ge, Mouzhi; Dohnal, Vlastislav |
Year published | 2017 |
Volume and issue | 4(4) |
Page(s) | 39 |
DOI | 10.3390/informatics4040039 |
ISSN | 2227-9709 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | http://www.mdpi.com/2227-9709/4/4/39/htm |
Download | http://www.mdpi.com/2227-9709/4/4/39/pdf (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of health care and public health systems. Health information exchanges (HIEs) which support electronic sharing of data and information between health care organizations are recognized as a source of "big data" in health care and have the potential to provide public health with a single stream of data collated across disparate systems and sources. However, given these data are not collected specifically to meet public health objectives, it is unknown whether a public health agency’s (PHA’s) secondary use of the data is supportive of or presents additional barriers to meeting disease reporting and surveillance needs. To explore this issue, we conducted an assessment of big data that is available to a PHA—laboratory test results and clinician-generated notifiable condition report data—through its participation in an HIE.
Keywords: big data, communicable diseases, data mining, data quality, epidemiology, health information exchange, infectious diseases, population surveillance, public health
Introduction
We evaluated two datasets—for sexually-transmitted infections (STIs) and non-STIs—for the time period of January 1, 2012 to September 15, 2013 used by a PHA that is part of one of the largest and oldest HIE infrastructures in the U.S. The two datasets were independently analyzed for their data quality, utility, and appropriateness for meeting public health surveillance objectives: (1) timeliness, defined as the difference between earliest date of a disease report and date the report is received at the PHA; (2) volume, defined as the number of disease report cases received by the PHA; and (3) completion, defined as the number of days to close a disease case report.
Our assessment uncovered the following challenges for effective utilization of big data by public health:
- While PHAs almost exclusively rely on secondary use data for surveillance, big data that has been collected for clinical purposes omits data fields of high value for public health.
- Big data is not always smart data, especially when the context within which the data is collected is absent.
- Data collected by disparate, varying systems and sources can introduce uncertainties and limit trustworthiness in the data, which may diminish its value for public health purposes.
- The process by which data is obtained needs to be evident in order for big data to be useful to public health.
- Big data for public health purposes needs to answer both "what" and "why" questions.
Despite these and other issues—such as measurement error and confounding, well-known challenges to both big and small data—strategies traditionally employed by public health epidemiologists and other public health professionals can uncover limitations and contribute to the design of solutions in collection, integration, warehousing, and analysis of big data so its value and utility to public health can be optimized.
In recognition of the 10 year anniversary of the incorporation of the internet search firm Google, the journal Nature issued a special supplement on big data and what the availability of large datasets meant and will mean for scientists and researchers [1]. In particular, the supplement focused on the opportunities that will be possible when issues such as interoperable data infrastructures, security, data standardization, storage and transfer requirements, and data governance are resolved. Now, nearly 10 years later, users of big data—characterized by the 5 Vs (huge volume, high velocity, high variety, low veracity, and high value)—still encounter the issues presented in the Nature special supplement [2]. In particular, the primary challenges to utilizing big data center around the diversity of data types (variety), the resources required to handle data collection, storage and processing (velocity), and uncertainties inherent in mixing and cleaning data from varied data streams that generates unpredictability in the data (veracity) [3].
References
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Several URL from the original were dead, and more current URLs were substituted.