Difference between revisions of "Journal:The development of data science: Implications for education, employment, research, and the data revolution for sustainable development"
Shawndouglas (talk | contribs) (Created stub. Saving and adding more.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 29: | Line 29: | ||
'''Keywords''': big data training and learning, company and business requirements, ethics, impact, decision support, data engineering, open data, smart homes, smart cities, IoT | '''Keywords''': big data training and learning, company and business requirements, ethics, impact, decision support, data engineering, open data, smart homes, smart cities, IoT | ||
==Data science as the convergence and bridging of disciplines== | ==Introduction: Data science as the convergence and bridging of disciplines== | ||
The context of our problem | The context of our problem solving and analytics will always be quite fundamental, very specific, and particularly oriented. (Section 4 of this article draws some interesting and relevant implications of this.) This article is oriented towards commonality and mutual influence of methodologies, and of analytical processes and procedures. A nice example of the parallel nature of such things is how "big data analytics" is often considered a synonym of "data science." In Section 2.2, it is mentioned how public transport may well use smartphone and mobile phone wireless connection data to observe locations of individuals. This close association or, perhaps even, identity of big data analytics and data science will have growing importance with the internet of things (IoT), and smart cities and smart homes, and so on (as noted in Section 8). The McKinsey Global Institute provided an outstanding perspective on this idea in their paper ''The age of analytics: Competing in a data-driven world''.<ref name="HenkeTheAge16">{{cite web |url=https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world |title=The age of analytics: Competing in a data-driven world |author=Henke, N.; Bughin, J.; Chui, M. et al. |publisher=McKinsey & Company |pages=136 |date=December 2016 |accessdate=18 June 2018}}</ref> | ||
In Section 8 and Section 9 of this article, very important developments are at issue, encompassing newly oriented and pursued methodologies, and the integration of research domains. Section 7 notes how important all of the content here is to sustainable development. The phrase "data revolution" is based here on ongoing work by the United Nations, and by so many of us in this domain, and from national authorities in Africa and the Middle East discussing issues here at the most recent (2017) World Statistics Congress. | |||
This converging and bridging of disciplines is increasingly important. For example, Mahabal ''et al.''<ref name="MahabalFromSky17">{{cite journal |title=From Sky to Earth: Data Science Methodology Transfer |journal=Proceedings of the International Astronomical Union |author=Mahabal, A.A.; Crichton, D.; Djorgovki, S.G. et al. |pages=1–10 |year=2017 |doi=10.1017/S1743921317000060}}</ref> discuss the parallels between astronomy and Earth science data, methodology transfer, and metadata and ontologies characterized as being crucial. They claim the convergence or bridging of disciplines must address “non-homogeneous observables, and varied spatial, temporal coverage at different resolutions.”<ref name="MahabalFromSky17" /> This quotation is very familiar to us in regard to how NoSQL databases are now widely used, as well as traditional relational databases. Another example is how text mining, social media, and many other domains have become so very important in many contexts. Then, given computational support, “it is the complexity more than the data volume that proves to be a bigger challenge.”<ref name="MahabalFromSky17" /> Further benefits of this data science convergence are termed here "tractability" and "reproducibility." Mahabal ''et al.''<ref name="MahabalFromSky17" /> also discuss the complexity relating to resolution and distributions. In a separate work, Murtagh<ref name="MurtaghData17">{{cite book |title=Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics |author=Murtagh, F. |publisher=CRC Press |pages=206 |year=2017 |isbn=9781498763936}}</ref> characterized this in terms of data encoding. Plenty of work now emphasizes the importance of ''p''-adic data encoding (binary or ternary when ''p'' = 2 or 3), compared with real-valued encoding (''m''-adic, especially when ''m'' = 10). | |||
The convergence and bridging of disciplines is fully emphasized by Mahabal ''et al.'' as such<ref name="MahabalFromSky17" />: | |||
<blockquote>Methodology transfer can almost never be unidirectional. Diverse fields grow by learning tricks employed by other disciplines. The important thing is to abstract data—described by meaningful metadata—and the metadata in turn connected by a good ontology.</blockquote> | |||
Further description is at issue in regard to data science<ref name="MahabalFromSky17" />: | |||
<blockquote>We have described here a few techniques from astroinformatics that are finding use in [[geoinformatics]]. There would be many from earth science that space science would do well to emulate. Even other disciplines like [[bioinformatics]] provide ample opportunities for methodology transfer and collaboration. With growing data volumes, and more importantly the increasing complexity, data science is our only refuge. Collaboration in data science will be beneficial to all sciences.</blockquote> | |||
==References== | ==References== | ||
Line 36: | Line 48: | ||
==Notes== | ==Notes== | ||
This presentation is faithful to the original, with only a few minor changes to grammar, spelling, and presentation, including the addition of PMCID and DOI when they were missing from the original reference. | This presentation is faithful to the original, with only a few minor changes to grammar, spelling, and presentation, including the addition of PMCID and DOI when they were missing from the original reference. The original inline citation method was unorthodox; these inline citations have been made clearer with the addition of the author of the citation. | ||
<!--Place all category tags here--> | <!--Place all category tags here--> |
Revision as of 18:56, 16 July 2018
Full article title | The development of data science: Implications for education, employment, research, and the data revolution for sustainable development |
---|---|
Journal | Big Data and Cognitive Computing |
Author(s) | Murtagh, Fionn; Devlin, Keith |
Author affiliation(s) | University of Huddersfield, Stanford University |
Primary contact | Email: fmurtagh at acm dot org |
Year published | 2018 |
Volume and issue | 2(2) |
Page(s) | 14 |
DOI | 10.3390/bdcc2020014 |
ISSN | 2504-2289 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | http://www.mdpi.com/2504-2289/2/2/14/htm |
Download | http://www.mdpi.com/2504-2289/2/2/14/pdf (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
In data science, we are concerned with the integration of relevant sciences in observed and empirical contexts. This results in the unification of analytical methodologies, and of observed and empirical data contexts. Given the dynamic nature of convergence, the origins and many evolutions of the data science theme are described. The following are covered in this article: the rapidly growing post-graduate university course provisioning for data science; a preliminary study of employability requirements; and how past eminent work in the social sciences and other areas, certainly mathematics, can be of immediate and direct relevance and benefit for innovative methodology, and for facing and addressing the ethical aspect of big data analytics, relating to data aggregation and scale effects. Associated also with data science is how direct and indirect outcomes and consequences of data science include decision support and policy making, and both qualitative as well as quantitative outcomes. For such reasons, the importance is noted of how data science builds collaboratively on other domains, potentially with innovative methodologies and practice. Further sections point towards some of the major current research issues.
Keywords: big data training and learning, company and business requirements, ethics, impact, decision support, data engineering, open data, smart homes, smart cities, IoT
Introduction: Data science as the convergence and bridging of disciplines
The context of our problem solving and analytics will always be quite fundamental, very specific, and particularly oriented. (Section 4 of this article draws some interesting and relevant implications of this.) This article is oriented towards commonality and mutual influence of methodologies, and of analytical processes and procedures. A nice example of the parallel nature of such things is how "big data analytics" is often considered a synonym of "data science." In Section 2.2, it is mentioned how public transport may well use smartphone and mobile phone wireless connection data to observe locations of individuals. This close association or, perhaps even, identity of big data analytics and data science will have growing importance with the internet of things (IoT), and smart cities and smart homes, and so on (as noted in Section 8). The McKinsey Global Institute provided an outstanding perspective on this idea in their paper The age of analytics: Competing in a data-driven world.[1]
In Section 8 and Section 9 of this article, very important developments are at issue, encompassing newly oriented and pursued methodologies, and the integration of research domains. Section 7 notes how important all of the content here is to sustainable development. The phrase "data revolution" is based here on ongoing work by the United Nations, and by so many of us in this domain, and from national authorities in Africa and the Middle East discussing issues here at the most recent (2017) World Statistics Congress.
This converging and bridging of disciplines is increasingly important. For example, Mahabal et al.[2] discuss the parallels between astronomy and Earth science data, methodology transfer, and metadata and ontologies characterized as being crucial. They claim the convergence or bridging of disciplines must address “non-homogeneous observables, and varied spatial, temporal coverage at different resolutions.”[2] This quotation is very familiar to us in regard to how NoSQL databases are now widely used, as well as traditional relational databases. Another example is how text mining, social media, and many other domains have become so very important in many contexts. Then, given computational support, “it is the complexity more than the data volume that proves to be a bigger challenge.”[2] Further benefits of this data science convergence are termed here "tractability" and "reproducibility." Mahabal et al.[2] also discuss the complexity relating to resolution and distributions. In a separate work, Murtagh[3] characterized this in terms of data encoding. Plenty of work now emphasizes the importance of p-adic data encoding (binary or ternary when p = 2 or 3), compared with real-valued encoding (m-adic, especially when m = 10).
The convergence and bridging of disciplines is fully emphasized by Mahabal et al. as such[2]:
Methodology transfer can almost never be unidirectional. Diverse fields grow by learning tricks employed by other disciplines. The important thing is to abstract data—described by meaningful metadata—and the metadata in turn connected by a good ontology.
Further description is at issue in regard to data science[2]:
We have described here a few techniques from astroinformatics that are finding use in geoinformatics. There would be many from earth science that space science would do well to emulate. Even other disciplines like bioinformatics provide ample opportunities for methodology transfer and collaboration. With growing data volumes, and more importantly the increasing complexity, data science is our only refuge. Collaboration in data science will be beneficial to all sciences.
References
- ↑ Henke, N.; Bughin, J.; Chui, M. et al. (December 2016). "The age of analytics: Competing in a data-driven world". McKinsey & Company. pp. 136. https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world. Retrieved 18 June 2018.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 Mahabal, A.A.; Crichton, D.; Djorgovki, S.G. et al. (2017). "From Sky to Earth: Data Science Methodology Transfer". Proceedings of the International Astronomical Union: 1–10. doi:10.1017/S1743921317000060.
- ↑ Murtagh, F. (2017). Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics. CRC Press. pp. 206. ISBN 9781498763936.
Notes
This presentation is faithful to the original, with only a few minor changes to grammar, spelling, and presentation, including the addition of PMCID and DOI when they were missing from the original reference. The original inline citation method was unorthodox; these inline citations have been made clearer with the addition of the author of the citation.