Journal:Big data management for cloud-enabled geological information services

From LIMSWiki
Revision as of 23:43, 19 February 2018 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Big data management for cloud-enabled geological information services
Journal Scientific Programming
Author(s) Zhu, Yueqin; Tan, Yongjie; Luo, Xiong; He, Zhijie
Author affiliation(s) China Geological Survey, Ministry of Land and Resources, University of Science and
Technology Beijing, Beijing Key Laboratory of Knowledge Engineering for Materials Science
Editors Liu, A.
Year published 2018
Volume and issue 2018(2018)
Page(s) 1327214
DOI 10.1155/2018/1327214
ISSN 1875-919X
Distribution license Creative Commons Attribution 4.0 International
Website https://www.hindawi.com/journals/sp/2018/1327214/
Download http://downloads.hindawi.com/journals/sp/2018/1327214.pdf (PDF)

Abstract

Cloud computing as a powerful technology of performing massive-scale and complex computing plays an important role in implementing geological information services. In the era of big data, data are being collected at an unprecedented scale. Therefore, to ensure successful data processing and analysis in cloud-enabled geological information services (CEGIS), we must address the challenging and time-demanding task of big data processing. This review starts by elaborating the system architecture and the requirements for big data management. This is followed by the analysis of the application requirements and technical challenges of big data management for CEGIS in China. This review also presents the application development opportunities and technical trends of big data management in CEGIS, including collection and preprocessing, storage and management, analysis and mining, parallel computing-based cloud platforms, and technology applications.

Introduction

In the era of big data, the data-driven modeling method enables us to exploit the potential of massive amounts of geological data easily.[1][2][3] In particular, by mining the data scientifically, one can offer new services that bring higher value to customers. Furthermore, it is now possible to implement the transition from digital geology to intelligent geology by integrating multiple systems in geological research through the use of big data and other technologies.[4]

The application of geological data management in the cloud makes it possible to fully utilize structured and unstructured data, including geology, minerals, geophysics, geochemistry, remote sensing, terrain, topography, vegetation, architecture, hydrology, disasters, and other digital geological data distributed in every place on the surface of the earth.[4][5] Moreover, the geological cloud will enable the integration of data collection, resource integration, data transmission, information extraction, and knowledge mining, which will pave the way for the transition from data to information, from information to knowledge, and from knowledge to wisdom. In addition, it supports data analysis, mining, organization, and management services for the scientific management of land resources, prospecting breakthrough strategic action and social services, while conducting multilevel, multiangle, and multiobjective demonstration applications on geological data for government decision-making, scientific research, and public services.[5]

Big data technologies are bringing unprecedented opportunities and challenges to various application areas, especially to geological information processing.[2][6][7] Under these circumstances, there are some advancements achieved in the development of this area.[8][9] Furthermore, from various disciplines of science and engineering, there has been a growing interest in this research field related to geological data generated in the geological information services (GIS). We analyzed the number of those documents indexed in the “Web of Science” research database.[10] In Figures 1 and 2, we can easily find that, in the past ten years, the number of those documents in which “geological data” is in the title and in the topic is increasing, respectively. Hence, geological data analysis in GIS is an interesting and important research topic currently.


Fig1 ZhuSciProg2018 2018-2018.png

Figure 1. The trend of the number of documents in which “geological data” is in the title, from 2007 to 2016

Fig2 ZhuSciProg2018 2018-2018.png

Figure 2. The trend of the number of documents in which “geological data” is in the topic, from 2007 to 2016.

Considering the development status of cloud-enabled geological information services (CEGIS) and the application requirements of big data management analysis, this article describes the significant impact and revolution on GIS brought by the advancement of big data technologies. Furthermore, this article outlines the future application development and technology development trend of big data management analysis in CEGIS.

The remainder of this article is organized as follows. In the next section we provide a review on CEGIS, with an emphasis on the descriptions for the system architecture and those requirements from big data management. Then, the challenges for big data management in CEGIS are presented. The key technologies and trends on big data management in CEGIS are analyzed afterwards, and finally we draw conclusions from the research.

Review on cloud-enabled geological information services

The construction of a geological cloud differs from the current big data analysis based on the internet of things (IoT). Having a deep understanding of data characteristics is necessary to collect, process, analyze, and interpret data in different fields, because the nature and types of data vary in different fields and in different problems. Geology is a data-intensive science, and geological data are characterized with multisource heterogeneity, spatiotemporal variation, correlation, uncertainty, fuzziness, and nonlinearity. Therefore, the geological cloud has a certain degree of confidentiality and it is highly domain-specific; meanwhile, it is developed on the basis of a large amount of geological data accumulated over a long period of time.[5][11] There are many real-time data generated from geological disasters and the geological environment. The geological cloud includes core basic data, which can be divided into three parts: an existing structured database, some unstructured data, and public application data. Therefore, it is important to take good advantage of the existing traditional structured data, use the big data technologies to deal with the relevant unstructured data, and also consider the peripheral public data.

Geological big data are multidimensional, and they consist of both structured and unstructured data.[12] The technical methods of big data analysis differ greatly from those of professional databases. Long-term geological survey and study have yeilded years of geological information, forming a rich and professional database, which is an important fundamental assurance for land and resources science management, geological survey, and geological information public service.[13] This “professional cloud” objectively requires technology research and development, such as the construction of a professional local area network, a data sharing platform, and geological big data visualization services. Hence, the construction of a geological cloud service is closely related to land resource management, deployment decisions, and the application demand of public service. The key technologies of research and development include the following: unstructured data extraction and mining analysis, structured and unstructured data mixed storage and management, big data sharing platform, data transmission, and visualization.[11]

Generally, the construction of a geological cloud is a long-term systematic project. This means that it is required to follow the basic principles of “standing on the reality, focusing on the future” and “focusing on the long-term and overall situation, embarking on the current and local situation,” in order to achieve the analysis and application of geological cloud public data and core data gradually in accordance with the technical route of big data analysis; thus the construction of a geological cloud will be implemented eventually. For the earth, land and resource management should cover many respects, including human behavior, climate change, development and utilization of various resources, natural disasters, environmental pollution, and the ecosystem cycle. Then the introduction of big data technologies can integrate this type of resource information to provide the ability of uniformly dealing with the problems related to the entire earth information resources, which has a significant effect on the strategic planning of land and resource management.[3]

The geological cloud is an important component of the scientific process for geological data research. The ultimate goal for developing a geological cloud is to better describe and understand the complex earth system and geological framework, provide the scientific basis for the description of the land surface and the biodiversity characteristics of the earth, and improve the ability to deal with complex social problems.

System architecture

Because the business service functions of each country differ, the system architecture of the geological cloud also will vary. In Figure 3, we present a system architecture, using China as an example.[13]


Fig3 ZhuSciProg2018 2018-2018.png

Figure 3. The system architecture of a geological cloud

The geological cloud combines the geological survey intranet and the geological survey extranet. It enables the sharing and management of computing resources, storage resources, network resources, software resources, and geological data resources.[14]

The geological cloud can be summarized as having the following characteristics[13]:

(i) “One platform: The geological cloud management platform”: It uniformly manages computing resources, storage resources, network resources, software resources, and geological data resources.

(ii) “Two networks: The geological survey intranet and the geological survey extranet”: Here, the intranet is constructed by creating a network that is physically isolated from the internet. The intranet is developed on the basis of the existing geological survey network, and each node is linked through a dedicated line or bare fiber. All of the internal business management systems, software systems, and data are deployed on the internet, providing services to 28 local units and those users of more than 350 geological survey projects. Facilitated by the public geological survey network, the geological survey business management system, geological data information service system, and public geological data can be deployed on the extranet accessed by the general public. The communication between the intranet and the extranet, including data exchange and audit, can be carried out via single-directional light gate.

iii) “One main node and three domain-specific nodes”: One main node is constructed at the China Geological Survey Development Research Center. In addition, three domain-specific nodes — namely the marine node, geological environment node, and aviation geophysical exploration and remote sensing node — are constructed, respectively. Each node is configured with the corresponding servers, storage equipment, network equipment, management platform, large-scale specialized data processing software system, and various customized applications. Each node would store huge amounts of geological data and conform to current data security standards. The master node and the domain-specific nodes are linked via optical fibers. The master node will consist of 200 computing nodes with three petabytes of storage capacity and will be equipped with some geological data processing software system. The master node will be hosted in a medium-sized supercomputing center, and it will provide support for the three-dimensional seismic exploration data processing and other large-scale computing. The three domain-specific nodes are to maintain their scale in the near future to facilitate reasonable scheduling and efficient utilization of information resources and data resources.

A system for geological survey business management and auxiliary decision-making is deployed on the extranet. The system provides a real-time tracking and management function for geological survey projects and various resources.

Main users of the geological cloud include institutional users, geological survey project users, and users from the general public. The institutional users can store the current geological database and newly collected data in the geological cloud through the geological survey business network and can obtain the geological data of other institutions from the cloud as needed. The geological survey project users can access the cloud geological background data through 4G or satellite lines and can collect data through the data collection system.

Requirement from big data management

The construction of a geological cloud must meet customer demand. Big data technologies are then used as the means to implement the geological cloud.

The types and quantity of geological data have been continuously growing over the years. Geological data include all kinds of electronic documents, structured, semistructured, and unstructured data, such as various databases (map database, spatial database, and attribute database), pictures, tables, video, and audio. Generally speaking, those important data may be buried in the massive dataset without the guidance for requirements. Hence, the first step is to understand the user requirements and then gain the capability of large-scale data processing. This is followed by data mining, algorithms, and analysis, which will ultimately generate value. Big data technologies in the field of geography must meet different needs from people at different levels, including the public demand of the geologic data services and professional data demand for geological research institutions, as well as related enterprises and government departments.[15]

On the basis of big data analysis technologies, a complete data link is formed connecting data, information, knowledge, and service, through the use of an advanced cloud computing system, IoT, and big data processing flow. It is shown in Figure 4.[5]


Fig4 ZhuSciProg2018 2018-2018.png

Figure 4. Schematic diagram of big data analysis

Challenges for big data management in cloud-enabled geological information services

Geological big data are generated regarding various layers of the earth, the history of the conformation and evolution of the earth, and the material composition of the earth and its changes. It also involves the exploration and utilization of mineral resources. In the current geological work, the collection, mining, processing, analysis, and utilization of various complex type data are closely related to those general big data. The “4V” characteristics of big data — namely volume, velocity, variety, and veracity — also apply to geological big data.

Volume

Currently, there is no consensus on the collective size of geological data. Geological big data include geology, minerals, remote sensing, geophysical exploration, geochemical exploration, surveying, and mapping data, which are interconnected and integrated. In terms of the number of mines, there are at least 70,000 in China, and some official documents and popular science books indicate that there are more than 200,000 deposits and minerals that have been found. This collection of information is huge and cannot be processed using conventional tools. For example, an Excel spreadsheet cannot contain all the information of 70,000 mining areas. As such, it is difficult to classify and rank the 200,000 mines, so it is necessary to rely on the concepts and technologies of big data.[16]

Especially in recent years, images, video, and other types of data have emerged on a large scale. With the application of 3D scanning and other devices, the data volume has been increasing exponentially. The ability to describe the data is more and more powerful, and the data are gradually approximated to the real world. In addition, the large amount of data is also reflected in the aspect that the methods and ideas used by people to deal with data have undergone a fundamental change. In the early days, people used the sampling method to process and analyze data in order to approximate the objective with a small number of subsample data. With the development of technologies, the number of samples gradually approaches the overall data. Using all the data can lead to a higher accuracy, which can explain things in more detail.[17]

Recently, the China geological survey system has built many databases, including[16]:

  • a regional geological database (covering the 1 : 2500000, 1 : 1000000, 1 : 500000, 1 : 250000, and 1 : 200000 regional geological map; the national 1 : 200000 natural sand; the isotope geological dating; and the lithostratigraphic unit database);
  • a basic geological database (covering the national rock property database and national geological working degree database);
  • a mineral resources database (covering the national mineral resources, the national mineral resources utilization survey mining resources reserves verification results, the national survey of large and medium-sized mines, the prospect of mineral resources, the survey of the resources potential of major solid mineral resources in China, and the geological and mineral resources database);
  • an oil and gas energy database (covering the oil and gas basins in China, the geological survey results of the national oil and gas resources, the national petroleum and geophysical exploration, national shale gas, national coal bed methane, national natural gas hydrate, and other databases);
  • a geophysical database (covering 1 : 1 million, 1 : 500000, 1 : 250000, 1 : 200000, and 1 : 50000 gravity, national regional gravity, national aeromagnetism, national ground magnetism, national electrical survey, seismic survey, national aviation radioactivity, and national logging database);
  • a geochemical database (covering the databases of national 1 : 250000 and 1 : 20 geochemical exploration, national multiobjective geochemical and national land quality evaluation results);
  • a remote sensing survey database (covering national aeronautical remote sensing image, China resources satellite data, space remote sensing image, national mine environmental remote sensing monitoring, national high score satellite, and other databases);
  • a drilling database (covering the national geological borehole information, the national important geological borehole, the Chinese mainland scientific drilling core scanning image library, and so on);
  • a hydraulic cycle hazards database;
  • a data literature database;
  • a special subject database (covering the national mineral resources potential evaluation database, the important mineral “three-rate” investigation and evaluation database); and
  • a work management database (covering the national exploration right, mining right, mining right verification, geological information metadata database, and many others).

Those databases are still expanding and consummating, and their practical values have not yet been fully reflected. However, it's virtually impossible for the vast majority of researchers to have all of the above data, at most, using their own accumulated data. Even if their accumulated data, both on the quantity and on type, is incomparable to 10 years and 20 years ago, they are, in fact, in the era of “relatively big data.” From 1999 to 2004, for example, in “the Chinese mineralization system and regional metalorganic evaluation” project, although there are 202 national academic experts that participated in it, they only contain master data of 4500 properties (all kinds of minerals). From 2006 to 2013, the study of “national important mineral and regional mineralization laws” was conducted; meanwhile, the mining area covered only by the mineral resources research institute was 30,600. Therefore, the increase of information and the amount of data are unprecedented in the last 10 years.

Variety

From a formal point-of-view, geological big data have many characteristics, including multidimensionality, multiscale, and multitenses. And they contain structured, semistructured, and unstructured data and usually are stored in forms of text, graphics, images, databases (including image databases, spatial databases, and attribute databases), tables, videos, and audio, often in a fragmented state. For example, a large number of field outcrop description, borehole core description, geological survey, exploration report, geological mapping, drawing, and photo data were stored and managed in the form of paper for a long time. Even the numerous relational databases and spatial databases were primarily used to store and manage structured data that are tabulated and vectorized, while the text descriptions, records, and summaries were directly stored. Very few standardized processing and structural transformations were performed. Furthermore, there is no tool available to effectively integrate storage and manage structured, semistructured, and unstructured data.

Velocity

The increase of geological data is very fast, especially in remote sensing geology, aviation geophysical exploration, regional geochemical exploration, and other fields, due to the introduction of new technologies and new methods. Meanwhile, high-speed processing is also a characteristic of big data. In addition to the need of analyzing data in real time, people also need to describe the results of data mining and processing through the use of several data processing techniques, such as image and video, while requiring effective and efficient handling skills. For example, the detection of deep earth information not only needs to obtain parameters of the seismic wave reflection and refraction but also needs to conduct quick processing, so as to timely predict whether earthquakes will occur and forecast the time, location, and intensity. In this way, we can avoid the disaster effectively. When applying a variety of data to a particular mountain, one should learn which ones have spatial limitations and which are not related to spatiality, so that one deduces the metallogenic law and guides the prospecting better.[17]

Veracity

For the understanding of the value of big data, most people consider it low-value density. It means that the real useful information in the vast amount of data is very little. Taking video as an example, the useful data may be only a second or two in the continuous monitoring process. While big data is high-value, it does not need to be too invested; just collecting information from the internet can bring business value. Therefore, big data has the characteristics of low-value density and high business value. The same is true for geological big data. So far, there has been a lot of information about geophysical prospecting, but only a few have been confirmed, and the discovered mines were less. But once a breakthrough was made, its socioeconomic value was enormous, such as the lithium polymetallic deposit in Tibet and the newly discovered Jima copper polymetallic deposit in the outskirts of Sichuan.[17]

In addition, the spatial attribute and temporal attribute of geological data also bring a big challenge to data accuracy. Any geological data have spatial attributes, and their values are reflected in the spatial law of distribution of mineral resources. For this reason, in the process of establishing the metallogenic series, exploring the metallogenic law, and constructing the mathematical model, the spatial attribute of the metallogenic model should be considered. Obviously, every metallogenic series has its own spatial attributes. Geological data also has the time attribute, which is very different from physical, chemical, and other natural sciences. One of the fundamental pillars of geology is the geological time scale. The rocks, strata, and deposits of different geological periods have different distribution characteristics and regularity, so those data have their own time attribute.

It is obvious that those characteristics of geological big data mentioned above impose very challenging obstacles to the data management in CEGIS. The challenges related to geological big data management can be summarized as follows:

(i) It is quite difficult to describe and model geological big data since there are few effective description mechanisms for characteristics and object modeling approaches under the cloud computing environment.

(ii) There remain many technical issues that must be addressed to fully manage, mine, analyze, integrate, and share those geological big data, in consideration of those complex characteristics, including multi-source heterogeneous data, highly spatiotemporal variation, high-volume and high-correlation data, and many others.

(iii) Many issues appear in achieving decision support, such as data incompleteness, data uncertainty, and high-dimensionality of data.

The broad range of challenges described here make good topics for research within the field of big data management in CEGIS. They are analyzed in the next section.

Key technologies and trends on big data management in cloud-enabled geological information services (CEGIS)

References

  1. Vermeesch, P.; Garzenti, E. (2015). "Making geological sense of ‘Big Data’ in sedimentary provenance analysis". Chemical Geology 409: 20-27. doi:10.1016/j.chemgeo.2015.05.004. 
  2. 2.0 2.1 Chen, J.; Xiang, J.; Hu, Q. et al. (2016). "Quantitative Geoscience and Geological Big Data Development: A Review". Acta Geologica Sinica 90 (4): 1490–1515. doi:10.1111/1755-6724.12782. 
  3. 3.0 3.1 Zhu, Y.; Tan, Y.; Li, R. et al. (2016). "Cyber-physical-social-thinking modeling and computing for geological information service system". International Journal of Distributed Sensor Networks 12 (11). doi:10.1177/1550147716666666. 
  4. 4.0 4.1 Kim, Y.-H.; Yarlagadda, P. (2013). "Cloud Computing Model for Big Geological Data Processing". Applied Mechanics and Materials 475–476: 306-311. doi:10.4028/www.scientific.net/AMM.475-476.306. 
  5. 5.0 5.1 5.2 5.3 Chen, J.; Li, J.; Cui, N.; Yu, P. (2015). "The construction and application of geological cloud under the big data background". Geological Bulletin of China 34 (7): 1260–1265. http://caod.oriprobe.com/articles/46629977/The_construction_and_application_of_geological_cloud_under_the_big_dat.htm. 
  6. Li, C. (2010). [10.1109/GEOINFORMATICS.2010.5567743 "The technical infrastructure of geological survey information grid"]. Proceedings from the 18th International Conference on Geoinformatics 2010: 1–6. 10.1109/GEOINFORMATICS.2010.5567743. 
  7. Wu, L.; Xue, L.; Li, C. et al. (2015). "A Geospatial Information Grid Framework for Geological Survey". PLoS One 10 (12): e0145312. doi:10.1371/journal.pone.0145312. 
  8. Evangelidis, K.; Ntouros, K.; Makridis, S.; et al. (2014). "Geospatial services in the Cloud". Computers & Geosciences 63: 116–122. doi:10.1016/j.cageo.2013.10.007. 
  9. Huang, M.; Liu, A.; Wang, T.; Huang, C. (2017). "Green data gathering under delay differentiated services constraint for internet of things". Wireless Communications and Mobile Computing. https://www.hindawi.com/journals/wcmc/aip/9715428/. 
  10. "Web of Science". Clarivate Analytics. https://www.webofknowledge.com/. 
  11. 11.0 11.1 Yang, C.; Yu, M.; Hu, F. et al. (2017). "Utilizing cloud computing to address big geospatial data challenges". Computers, Environment and Urban Systems 61 (Part B): 120–128. doi:10.1016/j.compenvurbsys.2016.10.010. 
  12. Wu, L.; Xue, L.; Li, C. et al. (2017). "A Knowledge-Driven Geospatially Enabled Framework for Geological Big Data". International Journal of Geo-Information 6 (6): 166. doi:10.3390/ijgi6060166. 
  13. 13.0 13.1 13.2 Tan, Y. (2016). "Architecture and Key Issues of Geological Big Data and Information Service Project". Geomatics World 23 (1): 1–6. http://caod.oriprobe.com/articles/48928882/Architecture_and_Key_Issues_of_Geological_Big_Data_and_Information_Ser.htm.  Cite error: Invalid <ref> tag; name "TanArchi16" defined multiple times with different content
  14. He, W.; Wang, Y. (2014). "Prototype system of geological cloud computing". Progress in Geophysics 29 (6): 2886–2896. http://caod.oriprobe.com/articles/45636829/Prototype_system_of_geological_cloud_computing.htm. 
  15. Zhu, Y.; Tan, T.; Zhang, J. et al. (2015). "A framework of hadoop based geology big data fusion and mining technologies". Cehui Xuebao/Acta Geodaetica et Cartographica Sinica 44 (S0): 152–159. doi:10.11947/j.AGCS.2015.F059. 
  16. 16.0 16.1 Wang, D.; Liu, X.; Liu, L. (2015). "Characteristics of big geodata and its application to study of minerogenetic regularity and minerogenetic series". Mineral Deposits 34 (6): 1143–1154. doi:10.16111/j.0258-7106.2015.06.004. 
  17. 17.0 17.1 17.2 Pan, B.; Yang, R. (2017). "Management and Utilization of Big Data for Geology". Surveying and Mapping of Geology and Mineral Resources 33 (1): 1–3, 14. https://caod.oriprobe.com/articles/50925192/Management_and_Utilization_of_Big_Data_for_Geology.htm. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Grammar has been updated to make the content more readable.