Journal:Design of a data management reference architecture for sustainable agriculture

From LIMSWiki
Revision as of 23:53, 9 June 2022 by Shawndouglas (talk | contribs) (Finished adding rest of content.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title Design of a data management reference architecture for sustainable agriculture
Journal Sustainability
Author(s) Giray, Görkem; Catal, Cagatay
Author affiliation(s) Independent researcher, Qatar University
Primary contact Email: gorkemgiray at gmail dot com
Year published 2021
Volume and issue 13(13)
Article # 7309
DOI 10.3390/su13137309
ISSN 2071-1050
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/2071-1050/13/13/7309/htm
Download https://www.mdpi.com/2071-1050/13/13/7309/pdf (PDF)

Abstract

Effective and efficient data management is crucial for smart farming and precision agriculture. To realize operational efficiency, full automation, and high productivity in agricultural systems, different kinds of data are collected from operational systems using different sensors, stored in different systems, and processed using advanced techniques, such as machine learning and deep learning. Due to the complexity of data management operations, a data management reference architecture is required. While there are different initiatives to design data management reference architectures, a data management reference architecture for sustainable agriculture is missing. In this study, we follow domain scoping, domain modeling, and reference architecture design stages to design the reference architecture for sustainable agriculture. Four case studies were performed to demonstrate the applicability of the reference architecture. This study shows that the proposed data management reference architecture is practical and effective for sustainable agriculture.

Keywords: sustainability, agriculture, sustainable agriculture, data management, reference architecture, design science research

Introduction

The increase in food demand and its associated large ecological footprint call for action in agricultural production.[1] Inputs and assets should be optimized, and long-term ecological impacts should be assessed for sustainable agriculture. Decision-making processes on optimization and assessment need data on several inputs, outputs, and external factors. To this end, various systems have been developed for data acquisition and management to enable precision agriculture.[1] Precision agriculture refers to the application of technologies and principles for improving crop performance and environmental sustainability.[2] Smart farming extends precision agriculture and enhances decision-making capabilities by using recent technologies for smart sensing, monitoring, analysis, planning, and control.[1] Data to be acquired are enhanced by context, situation, and location awareness.[1] Real-time sensors are utilized to collect various data, and real-time actuators are used to fine-tune production parameters instantly.

In the late 2000s, Murakami et al.[3] and Steinberger et al.[4] pointed out a need for data storage and a processing platform for agricultural production. They utilized web services to send and receive data from a central web application. That web application received, stored, and processed data, and it provided the required outputs to its users or any other system. Similarly, Sørensen et al.[5] listed several data processing use cases to assist farmers’ decision-making processes. More recently, technologies such as the internet of things (IoT) make digital data acquisition, and hence smart farming, possible.[6] In recent years, many studies have been performed in the fields of smart farming and precision agriculture.[7][8][9][10][11][12][13] At the heart of many of those studies is Industry 4.0, which acts as a transformative force on smart farming processes. Industry 4.0-related technologies—namely IoT, big data, edge computing, 3D printing, augmented reality, collaborative robotics, data science, cloud computing, cyber-physical systems, digital twins, cybersecurity, and real-time optimization—are increasingly integrated into different parts of modern agricultural systems.[14]

To realize operational efficiency, full automation, and high productivity in these systems, different types of data are collected from operational systems using different sensors, stored in big data systems, and processed using machine learning and deep learning approaches. Traditional data management techniques and systems are not sufficient to deal with this scale of data, and as such, big data infrastructures and systems have been designed and implemented. To manage the complexity of this big data, many different aspects of data must be considered during the design of these systems. Different data management reference architectures have been designed to date.[15][16][17] To the best of our knowledge, none of these studies have focused on sustainable agriculture. There exist several practices for sustainable agriculture that can protect the environment, improve soil fertility, and increase natural resources. It is known that agriculture can affect soil erosion, water quality, human health, and pollination services.[18] As such, sustainable agriculture is crucial to minimize the negative effects of agricultural production. Sustainable agriculture requires an iterative process because each actor in the system has a different responsibility, and the success of this process is highly dependent on the success of each actor.

The goal of this study is to present a data management reference architecture for supporting smart farming, sustainable agriculture, and other domains. The study builds on the recent developments in data management and processing, i.e., big data, machine learning, and data lakes. We designed a data management reference architecture for sustainable agriculture and evaluated it using several case studies. Domain scoping, domain modeling, and reference architecture design stages were followed to create the reference architecture. Based on the reference architecture, we can design different application architectures. During the validation stage of this study, using different case studies obtained from the literature, we have shown the applicability of our reference architecture as a novel data management reference architecture for sustainable agriculture.

The structure of this paper follows the outline proposed by Gregor & Hevner[19] for design science research. The next section summarizes the research method adopted in this study, followed by the definition and structuring of the problem by analyzing the existing literature. We then present the related reference architecture studies and explain the solution design process and the reference architecture obtained. That is followed by the evaluation of the reference architecture by deriving application architectures from it based on some requirements from the sustainable agriculture domain. The penultimate section discusses the results, and the final section provides conclusions and plans of future work.

Research method

The design science research (DSR) method proposed by Hevner et al.[20] was followed in this study. DSR is a problem-solving paradigm and seeks to create artifacts through which information systems can be effectively and efficiently engineered.[20] These artifacts are designed to interact with a problem context to improve something in that context.[21]

The activities and the artifacts span two significant dimensions, i.e., problem-solution and theory-practice dimensions.[22] Figure 1 shows the research method used in this study. The first step was the identification of some problem instances occurring in practice and sharing similar aspects. These problem instances were analyzed, and a problem statement was formed using theoretical concepts from the literature. A conceptual solution, i.e., an artifact or artifacts, was designed by following a systematic approach. Domain analysis was used to derive and represent domain knowledge to be used for solution design. Domain analysis involved domain scoping and domain modeling activities.[23] Domain scoping refers to the identification of relevant knowledge sources to derive the key concepts of the solution.[24] To this end, several searches were conducted on the Scopus database using different search strings. Domain modeling aims at unifying and representing the domain knowledge obtained from relevant sources. The feature model was used to represent the output of domain modeling.[25] A reference architecture was designed as a conceptual solution.


Fig1 Giray Sustain21 13-13.png

Fig. 1 The research method used in this study, which involves the main steps of design science research (DSR), i.e., problem definition, solution design, and validation. The iterative nature of the research method was neglected for the sake of simplicity.

To evaluate the reference architecture, requirements were specified using recent literature on sustainable agriculture.[26][27][28][29][30] Based on these requirements, a concrete application architecture was derived using the reference architecture.

In accordance with DSR and the research method described here, the following sections describe problem definition, design of a solution, and the evaluation of the solution.

Problem definition

This study was motivated by three use cases involving different data management requirements to support sustainable agriculture. The following three use cases were used for understanding and conceptualizing the problem:

  • Case 1: Satellite images (e.g., Sentinel-2 data) can be obtained from a data provider. These images can be processed to derive plant parameters such as Leaf Area Index (LAI), biomass, and chlorophyll content during the growing season.[31] Afterward, the current growth status and development of cultivated crops at each location in the field can be deduced.[32] This information can be used for site-specific plant protection and fertilization measures[33], which support sustainable agriculture.
  • Case 2: Harvested crop volume can be quantified and recorded in real time using numerous sensors.[34] Various parameters such as "quantity per hectare" and "flow" can be calculated, and crop productivity maps can be built.[34] Farmers can use these maps to optimize inputs such as fertilizers, pesticides, and seeding rates, resulting in an increase in yields.[35]
  • Case 3: Machinery process data such as speed, angle, pressure, and flow rate can be obtained through sensors in tractors and equipment.[4] Machine, worker, field, and time slot data can be stored, and basic statistics like minimum, maximum, and standard deviation can be computed.[4] As a result, automated documentation of the production process and site-specific work can be attained.[4]

Table 1 summarizes the above-mentioned cases from a data management perspective. Similar to many cases in various domains, at a high level, digital data are produced and fed to a software system to be processed and stored. Such a system can be designated a data management platform and produce outputs that can lead to better business outcomes. As per the first case, satellite images can be processed via computer vision algorithms to drive plant parameters such as Leaf Area Index (LAI), biomass, and chlorophyll content, which can in turn be used to track the current growth status of cultivated crops and support decision-making activities.

Table 1. A summary of the three cases presented above from a data management perspective
Data input Data processing Data output Outcome
Satellite images Derive plant parameters via computer vision algorithms Plant parameters such as Leaf Area Index (LAI), biomass, and chlorophyll content Track current growth status and development of cultivated crop at each location
Harvested crops volume via sensors Build crop productivity maps Various parameters such as "quantity per hectare" and "flow" on the map Use such maps to optimize inputs such as fertilizers, pesticides, and seeding rates in order to increase yields
Machinery process data via sensors Compute statistics Machine, worker, field, and time slot data, along with basic statistics such as minimum, maximum, and standard deviation Attain automated documentation of the production process and site-specific work

Solution design and artifact description

This section starts with a summary of related reference architecture studies and then presents the three steps of the solution design phase, namely domain scoping, domain modeling, and reference architecture design.

Related reference architecture studies

Before presenting our reference architecture, we discuss the available reference architectures in the literature. First, while Nikkilä et al.[36] and Kaloxylos et al.[37] presented architectural aspects of Farm Management Information Systems (FMISs), they did not propose a reference architecture, which limited the utility of their research for our purposes. However, Tummers et al.[17] designed a reference architecture for FMISs. They first identified the stakeholders and their concerns. Afterward, a feature model for FMISs was created. The reference architecture was designed and represented via context and decomposition views. Three case studies were performed to show the applicability of the proposed reference architecture.

Köksal & Tekinerdogan[6] proposed a reference architecture for IoT-based FMISs. They proposed an architecture design method and showed that the approach is practical and effective. Their architecture included data acquisition, data processing, data visualization, system management, and external services. Each main feature consisted of several sub-features. For instance, the data processing feature involved sub-features like image/video processing, data mining, decision support, and data logging. They used decomposition, layered, and deployment views to document the reference architecture. For deriving a concrete FMIS architecture, their reference architecture can thus be used.

Kruize et al.[38] proposed a reference architecture for farm software ecosystems. Farm software ecosystems aim to fulfill the needs of several actors in the smart farming domain. In that respect, their scope is much wider compared to FMISs. The farm software ecosystem reference architecture mainly focused on the problem of bringing various software and hardware components together to form a platform for multiple actors.

To the best of our knowledge, there is no other study that presents a data management reference architecture for sustainable agriculture. Although some of the previous studies mention several data-related components, a complete architectural view of managing data for sustainable agriculture was missing. As such, our reference architecture study aims to fill this research gap.


Domain scoping

The Scopus database was used as the knowledge source for domain scoping. To identify the search keywords, it was crucial to understand the recent factors driving reference architectures for data management. The concept of "big data" has emerged to highlight challenges of data management, including volume, velocity, and variety.[39] Machine learning is another hot research topic, which tries to acquire knowledge by extracting patterns from raw data[40][41] and solve problems using this knowledge. The concept of a "data lake" is another to have recently emerged, which addresses the shortcomings of data warehouses. A data lake can be defined as a data management platform that allows the storing of both structured and unstructured data, unlike data warehouses that handle only structured data. This type of platform is designed to enable big data processing, real time analytics, and machine learning.

Based on these recent trends, four search strings were used to form the initial paper pool. The phrase “reference architecture” was combined with four phrases representing the recent trends in data processing for sustainable agriculture (i.e., "data management," "big data," "machine learning," and "data lake"). The search keywords were kept general to have a high recall and relatively low precision. Although this required more effort from the authors, obtaining a broader initial set of papers decreased the possibility of missing relevant studies.

The database search on Scopus was conducted in February 2021. No criterion was set for the publication date. A total number of 270 papers was obtained for the pool of candidate papers. All the results were combined in an Excel sheet, which included useful information about the papers such as title, abstract, keywords, and publication date, which are used in further steps.

To identify the relevant papers for designing a reference architecture, we applied the exclusion criteria to the papers obtained from Scopus. Those papers that were duplicated, not written in English, or lacking full text were filtered out. The papers involving a reference architecture to process data in any business domain were included. Seven papers included a reference architecture for data processing, along with the essential components.

The data extraction phase followed the selection of relevant papers. The components of the data processing architecture listed in the papers were extracted and recorded in an Excel sheet. These components were unified by reading the definitions presented in the papers. Table 2 shows the unified list of the components and the source papers where each component was identified.

Table 2. The unified list of components identified in the literature
Component Nadal et al.[15] Dayal et al.[42] Suriarachchi and Plale[43] Rao et al.[44] Sang et al.[45] Arass et al.[46] Pääkkönen and Pakkala[47]
Ingestion
Information extraction
Data quality management
Integration
Analysis
Storage
Security/Privacy
Metadata management
Replication/Archiving

All the papers address three main components that deal with collecting data from various sources (i.e., acquisition), processing data to provide some value to data consumers (i.e., analysis), and addressing persist data (i.e., storage). Pääkkönen and Pakkala[47] identified information extraction as a component to extract structured data from unstructured and semi-structured data, such as email or images. Data quality management is another vital component to handle data quality problems. Data received from various sources should be integrated for further analysis. Security and privacy components are used to protect data from unauthorized data and for proper handling of personal data. Metadata management refers to the creation and storage of metadata to document the meaning of data. The replication component ensures redundant storage of data sources to provide better data availability in case of technical problems, and the archiving component is responsible for storing cold data for future possible needs.

Domain modeling

Feature modeling is one of the approaches to represent domain knowledge in a reusable format.[24][25] Figure 2 shows the feature model, which is derived from the unified list of components presented in Table 2. A feature diagram can include mandatory and optional features. Three components identified by all papers are treated as mandatory features. The remaining features are optional and can be used depending on business requirements.


Fig2 Giray Sustain21 13-13.png

Fig. 2 The feature model for data management in the sustainable agriculture domain

Data need to be onboarded to a data platform for further processing and storage, a process referred to as the ETL (extract, transform, and load) process in traditional data warehousing architectures.[48] Such architectures possess a pre-defined data schema and data are loaded based on this schema. Data ingestion refers to the process of transferring data from providers to a platform for further processing.[49] The umbrella term for such processes is "data acquisition." Data can be acquired in batches at regular intervals or in real time (or in near real time) as streams. A component acquiring data streams should be able to handle data with high velocity.[50]

Information extraction is intended for obtaining useful information from unstructured and semi-structured data.[51] Unstructured and semi-structured data may include natural language text, image, audio, and video. Several tasks performed under information extraction include classification, named entity recognition, relationship extraction, and structure extraction.[16][48][52] Named entity recognition (i.e., named entity identification) aims at identifying and classifying named entities in unstructured or semi-structured texts into predefined categories such as a person, organization, or location. For instance, Gangadharan and Gupta[53] extracted names of crops, soil types, crop diseases, pathogen names, and fertilizers from documents on agriculture. Relation extraction is the task of detecting and classifying predefined types of associations among recognized entities.[52] For example, relationships among crop diseases and locations can be extracted. The sub-features for information extraction can be expanded depending on domain-specific requirements.

Data quality management refers to handling data quality problems that may arise due to several reasons[48], including missing, incorrect, unusable, or redundant data.[54] To address these main data quality problems, missing data can be completed, incorrect data can be corrected, unusable data can be transformed, and redundant data can be cleaned.

In general, data management platforms acquire data from multiple sources that usually involve differences in data models, schemas, and data semantics.[55] Data integration aims at combining heterogeneous data and providing a unified view of these data.[56] One technique for data integration is schema mapping, which refers to conveying the data schemas of multiple data sources into one global common schema.[57]

Data are analyzed to obtain some value from them. Results of analysis may provide insights to users and constitute some intermediate output for further processing.[48] Stream analysis encompasses the timely processing of flowing data and generates required outputs. For instance, an environmental monitoring system can process raw data coming from sensor networks to identify critical cases.[58] On the contrary, batch analysis is conducted on static datasets.[59] Data mining and machine learning, including deep learning algorithms, may be utilized to produce deeper analyses.[60][61]

Storage is a feature supporting other features and refers to the temporary and persistent storage of data. To manage the increased volume, velocity, and variety of data[39], different types of data stores are released. Therefore, the storage feature involves various database management systems (e.g., Microsoft SQL Server, Oracle, PostgreSQL, or MongoDB) implementing different data models such as relational or nonrelational (or NoSQL).

The security and privacy feature addresses authentication and authorization, access tracking, and data anonymization.[48] Several standards, guidelines, and mechanisms can affect the realization of this feature, such as data encryption standards and mechanisms, access guidelines, and remote access standards.[62]

Metadata management is related to planning, implementation, and control activities to enable access to metadata.[62] This feature mainly involves capabilities related to collecting and integrating metadata from diverse sources and providing a standard way to access these metadata.[62]

The replication feature manages the storage of the same data on multiple storage devices.[62] While having replicated instances of data support remain highly available, data consistency may become an issue to deal with. The archiving feature addresses the movement of infrequently used data onto media with a lower retrieval performance.[62]

Reference architecture design

Based on the abstraction derived from the three cases presented prior, Figure 3 shows the context diagram of a data management system. The context diagram shows the overall purpose of the system and its interfaces with the external environment.[63] At a very high level, some data providers send data to a data management system. These may be humans entering data through a graphical user interface (GUI) or external systems providing input data to be processed. Data obtained from data providers are stored, processed, and served to data consumers based on their requirements.


Fig3 Giray Sustain21 13-13.png

Fig. 3 The context diagram for a data management system to conceptualize the cases identified in the sustainable agriculture domain

Figure 4 shows the decomposition view of the data management reference architecture proposed for data processing.


Fig4 Giray Sustain21 13-13.png

Fig. 4 Decomposition view of the data management reference architecture

Acquisition components are responsible for onboarding data to the data management platform for further processing. Useful information such as named entities and relations can be extracted from the acquired data. Quality problems can be resolved by completing, correcting, transforming, and cleaning the acquired data to obtain more accurate results from analyses. The data obtained from various sources need to be integrated to end up with better and richer insights. Various components can be used to analyze data to support decision-making. The storage component handles different modes of storing data. The security and privacy component is needed to protect data from unauthorized access. Data on the description of acquired data, are handled by the metadata management component. Replication may be needed for high availability. An archive component is usually required to manage the process of archiving unused historical data.

The next section presents how an application architecture can be derived from the reference architecture based on the requirements in sustainable agriculture.

Validation

To evaluate the reference architecture, a set of requirements was extracted from recent papers on sustainable agriculture.[26][27][28][29][30] The high-level requirements address issues on crop yield prediction[27], irrigation management[28], real time variable-rate fertilization[30], and exotic animal infectious diseases monitoring.[26] These diverse high-level requirements address various data management aspects in sustainable architecture.

Sustainability is a long-term, high-level goal involving several aspects and sub-goals. Generally, there is a gap between software requirements and sustainability goals. While software requirements tend to be more tangible, sustainability goals tend to be more intangible. Therefore, a tangible decomposition of sustainability and the ability to map it to concrete software requirements are required to monitor the achievement of sustainability goals.[64]

Penzenstadler and Femmer proposed a reference model to show the dimensions of sustainability (i.e., individual, social, economic, environmental, and technical) and map them to high-level software requirements.[65] Figure 5 shows the high-level software requirements obtained from the literature and how they are mapped in the sustainability model proposed by Penzenstadler & Femmer.[64][65]


Fig5 Giray Sustain21 13-13.png

Fig. 5 The requirements obtained from the literature on sustainable agriculture. The requirements are mapped to sustainability dimensions using the sustainability model proposed by Penzenstadler & Femmer.[64][65]

The goal of sustainability has several dimensions such as economic and environmental. There are moral and natural goods that are perceived as an expression of a specific dimension.[64] Long-term profit and healthy environment are two examples of values contributing to economic and environmental dimensions, respectively. Indicators are qualitative or quantitative metrics that express a specific degree or score regarding a value.[64] Consumption amounts of resources, water, agricultural pesticide, and weedicide are examples of indicators. Activities are measures taken to contribute to values[64] and can have different levels of granularities, which are associated with each other. Lower-level activities such as crop phenotypic monitoring, shown in Figure 5, can contribute to a higher-level activity such as variable-rate fertilization management. Lower-level leaf activities, such as predicting crop yield, shown in Figure 5, can be treated as the high-level software requirements (e.g., high-level use cases or epics) against which one or more system features are developed.

Components of the data management reference architecture

The components of the data management reference architecture used to realize each high-level software requirement are described as follows.

Crop yield prediction is an essential task for growers and farmers to decide on what and when to grow.[66] However, it is extremely challenging due to numerous complex factors.[67] To overcome this challenge, researchers started to use machine learning algorithms to predict crop yield based on various input variables.[27] Mohsen et al.[27] suggest using weather, soil, plant population, and planting process data. These historical data are extracted from various sources such as surveys[68][69] and stored (acquisition: batch). Unusable data such as rows with missing values are removed (cleaning) and some of the values are normalized (transformation). The data obtained from different sources are combined (integration). Several machine learning models are built through experimentation (machine learning: model development). One of the obtained machine learning models that exhibits a satisfactory performance is deployed (machine learning: model deployment). The performance of the deployed model should be monitored for a possible performance degradation.[70][71] When the performance does not meet expectations, a new machine learning model should be trained and deployed.

Wireless sensor networks using IoT technologies can be utilized for irrigation management.[28] Sensors can measure real-time environmental data such as soil moisture in predefined periods and send these data to the data storage over the internet (acquisition: stream).[72] A weather forecast can be obtained from a data provider (acquisition: batch) to manage irrigation by considering the conditions that affect the irrigation process, such as rain or strong winds.[72] The measurement and forecast data are combined (integration) by considering time dimension, i.e., data obtained from different sources must fit into the same temporal window.[73] Based on the combined data, irrigation decisions can be drawn based on predefined rules[28] or a prediction model (machine learning).[73] The data on this decision can be sent to an actuator to control irrigation.

Crop phenotypic information can be used to enable real-time variable-rate fertilization.[74][75] The predictors of phenotypic information involve crop three-dimensional size, biomass, and vegetation index, as well as other indicators.[30] These data can be acquired using aviation-based[76][77] and ground-based[78] approaches. Data obtained through sensors mounted to unmanned aerial vehicles (UAVs)[79], or ground-based phenotypic platforms with a series of sensors and a GPS[80][81] can be ingested into a data management platform (acquisition: stream). To obtain accurate predictions of crop phenotypic information, it is necessary to combine multi-source sensor data such as color, depth, and spectral data with environmental and crop physiology data[30] (integration) and develop machine learning models (machine learning: model development). Those models need to be deployed (machine learning: model deployment) and used for real time variable-rate fertilization. As a result, improvements in the level of fertilizer utilization efficiency enable environmental and economic sustainability benefits by maximizing crop output and minimizing fertilizer input.[30]

Exotic animal infectious diseases are considerable threats to global health security and economic stability.[26][82] One of the vital sources for detecting signals of disease is online news platforms. Data can be collected from platforms such as Google News (acquisition: batch) and stored for further analysis (storage). The named entities such as location, date, disease, hosts, and number of cases can be extracted using Natural Language Processing (NLP) techniques[26] (information extraction: named entity recognition). In addition, predefined types of relationships among the recognized entities can be detected (information extraction: relation extraction). As an example, the sentence “12 pigs have been infected by African swine fever in Poland” can provide the following entities[26]: number of cases = 12; host = pig; disease = African swine fever; location = Poland.

Table 3 shows the functionalities extracted from the three problem cases (PC) and the four validation cases (VC). Figure 6 presents how these functionalities are mapped to the components of the reference architecture proposed in this study.

Table 3. The functionalities in the problem cases (PC) and validation cases (VC)
Case # Functionality
PC1 PC1.1 Obtain and store satellite images
PC1.2 Process images to derive plant parameters, such as leaf area index (LAI), biomass, and chlorophyll content
PC1.3 Deduce the current growth status and development of cultivated crops at each location in the field
PC2 PC2.1 Obtain and store harvested crop volume in real time using sensors
PC2.2 Calculate parameters such as quantity per hectare and flow and build crop productivity maps
PC3 PC3.1 Obtain and store machinery process data such as speed, angle, pressure, and flow rate through sensors in tractors
PC3.2 Compute basic figures such as minimum, maximum, standard deviation and produce documentation of the production process
VC1 VC1.1 Acquire and store historical data on weather, soil, plant population, and planting process
VC1.2 Remove unusable data and normalize the remaining data
VC1.3 Combine data on weather, soil, plant population, and planting process
VC1.4 Build machine learning models and deploy the one that best satisfies requirements
VC2 VC2.1 Measure and store real time environmental data
VC2.2 Obtain and store weather forecast
VC2.3 Combine measurement and forecast data
VC2.4 Decide on irrigation based on predefined rules or a prediction model
VC3 VC3.1 Obtain and store data, such as crop three-dimensional size, biomass, and vegetation index
VC3.2 Combine multisource sensor data, such as color, depth, and spectral data, with environmental and crop physiology data
VC3.3 Build machine learning models and deploy the one that best satisfies requirements for real-time variable-rate fertilization
VC4 VC4.1 Obtain and store news from online news platforms
VC4.2 Extract named entities, such as location, date, disease, hosts, and number of cases
VC4.3 Detect predefined types of relationships among recognized entities


Fig6 Giray Sustain21 13-13.png

Fig. 6 Mapping functionalities to the reference architecture components

Discussion and caveats

This research presents a novel data management reference architecture for sustainable agriculture using well-established architecture modeling techniques. As such, this study can pave the way for similar studies on data management reference architectures. The reference architecture was designed based on domain analysis. Other data management application architectures can be developed based on this reference architecture using variant features specified in this study.

The features shown in this study were obtained from peer-reviewed papers in the Scopus database. The inclusion of other databases may reveal unidentified features and the presented reference architecture may be extended accordingly. Since precision agriculture and smart farming are still evolving, new features can be integrated in the future, and data management reference architecture can be adapted. Different tools, techniques, and systems are currently developed by practitioners and researchers in smart farming, and therefore we expect to see new papers in databases that might bring new functionalities and features to the presented reference architecture. However, the presented methodology and the overall reference architecture can be easily changed to reflect the recent changes in smart farming.

The reference architecture is used to derive application architectures based on a multi-case study approach. For the multi-case study, there is a threat of misinterpretation of applied concepts. Although we discussed the concepts carefully and iteratively, there is a possibility that several concepts may have been interpreted differently compared to the concepts presented in the selected studies. The generalization of the presented findings must be taken with caution because different case studies may require new functionalities. While we can identify many different application scenarios in smart farming, the scenarios share some features from a data management perspective. Therefore, four case studies were shown to demonstrate the applicability of the proposed reference architecture. Other researchers can also evaluate the applicability of this architecture using different case studies in smart farming and create an application architecture for their uses.

It was shown that reference architecture design is useful for the agri-food domain. This study focused on sustainability; however, it can be extended to a larger context by covering other critical aspects of agriculture. For sustainable agriculture, the presented features are beneficial when designing new systems for agricultural production. Further research is needed to evaluate the applicability of the data management reference architecture for different application domains. We expect that increasingly more researchers will focus on sustainability in agriculture in the near future and develop novel models to fully address sustainability from several aspects. The advancement in machine learning and particularly deep learning techniques can also contribute to the development of novel models addressing sustainability.

Conclusions and future work

In this study, a data management reference architecture for sustainable agriculture was proposed and evaluated using different case studies. To the best of our knowledge, this is the first study that focuses on sustainability within the context of data management reference architecture. The design science research (DSR) method was applied while designing the reference architecture. Domain scoping, domain modeling, and reference architecture design stages were followed for solution design. Domain scoping was performed based on relevant papers, the domain model was represented as a feature model, and the reference architecture was built at the end of the reference architecture design stage. Three case studies were investigated from different perspectives, and the applicability of the data management reference architecture was evaluated. We consider that this research can improve the research in sustainable agriculture with respect to data management and pave the research for designing smart systems for smart farming and precision agriculture.

As future work, we plan to extend this study with new case studies and evaluate the applicability of the presented reference architecture for different scenarios. Another planned study involves the mapping of the reference architecture to the components of the farm management information systems and platforms used in the industry. This mapping can help us identify the possible missing components in the reference architecture. In addition, we can identify enhancement opportunities for the systems and platforms used in the industry.

Acknowledgements

Author contributions

Conceptualization, G.G. and C.C.; methodology, G.G.; investigation, G.G.; data curation, G.G.; validation, G.G. and C.C.; writing—original draft preparation, G.G.; writing—review and editing, G.G. and C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of interest

The authors declare no conflict of interest.

References

  1. 1.0 1.1 1.2 1.3 Wolfert, Sjaak; Goense, Daan; Sorensen, Claus Aage Gron (1 April 2014). "A Future Internet Collaboration Platform for Safe and Healthy Food from Farm to Fork". 2014 Annual SRII Global Conference (San Jose, CA, USA: IEEE): 266–273. doi:10.1109/SRII.2014.47. ISBN 978-1-4799-5193-2. http://ieeexplore.ieee.org/document/6879694/. 
  2. Pierce, Francis J.; Nowak, Peter (1999), "Aspects of Precision Agriculture" (in en), Advances in Agronomy (Elsevier) 67: 1–85, doi:10.1016/s0065-2113(08)60513-1, ISBN 978-0-12-000767-7, https://linkinghub.elsevier.com/retrieve/pii/S0065211308605131 
  3. Murakami, Edson; Saraiva, Antonio M.; Ribeiro, Luiz C.M.; Cugnasca, Carlos E.; Hirakawa, Andre R.; Correa, Pedro L.P. (1 August 2007). "An infrastructure for the development of distributed service-oriented information systems for precision agriculture" (in en). Computers and Electronics in Agriculture 58 (1): 37–48. doi:10.1016/j.compag.2006.12.010. https://linkinghub.elsevier.com/retrieve/pii/S0168169907000609. 
  4. 4.0 4.1 4.2 4.3 Steinberger, Georg; Rothmund, Matthias; Auernhammer, Hermann (1 March 2009). "Mobile farm equipment as a data source in an agricultural service architecture" (in en). Computers and Electronics in Agriculture 65 (2): 238–246. doi:10.1016/j.compag.2008.10.005. https://linkinghub.elsevier.com/retrieve/pii/S0168169908002226. 
  5. Sørensen, C.; Bildsøe, P.; Fountas, S. et al. (31 May 2011). "Integration of Farm Management Information Systems to support real-time management decisions and compliance of management standards". CORDIS. European Commission. https://cordis.europa.eu/project/id/212117. 
  6. 6.0 6.1 Köksal, Ö.; Tekinerdogan, B. (1 October 2019). "Architecture design approach for IoT-based farm management information systems" (in en). Precision Agriculture 20 (5): 926–958. doi:10.1007/s11119-018-09624-8. ISSN 1385-2256. http://link.springer.com/10.1007/s11119-018-09624-8. 
  7. Groeneveld, Desirée; Tekinerdogan, Bedir; Garousi, Vahid; Catal, Cagatay (1 August 2021). "A domain-specific language framework for farm management information systems in precision agriculture" (in en). Precision Agriculture 22 (4): 1067–1106. doi:10.1007/s11119-020-09770-y. ISSN 1385-2256. https://link.springer.com/10.1007/s11119-020-09770-y. 
  8. Jin, Xue-Bo; Yu, Xing-Hong; Wang, Xiao-Yi; Bai, Yu-Ting; Su, Ting-Li; Kong, Jian-Lei (14 February 2020). "Deep Learning Predictor for Sustainable Precision Agriculture Based on Internet of Things System" (in en). Sustainability 12 (4): 1433. doi:10.3390/su12041433. ISSN 2071-1050. https://www.mdpi.com/2071-1050/12/4/1433. 
  9. Kaya, Aydin; Keceli, Ali Seydi; Catal, Cagatay; Yalic, Hamdi Yalin; Temucin, Huseyin; Tekinerdogan, Bedir (1 March 2019). "Analysis of transfer learning for deep neural network based plant classification models" (in en). Computers and Electronics in Agriculture 158: 20–29. doi:10.1016/j.compag.2019.01.041. https://linkinghub.elsevier.com/retrieve/pii/S0168169918315308. 
  10. Loures, Luís; Chamizo, Alejandro; Ferreira, Paulo; Loures, Ana; Castanho, Rui; Panagopoulos, Thomas (6 May 2020). "Assessing the Effectiveness of Precision Agriculture Management Systems in Mediterranean Small Farms" (in en). Sustainability 12 (9): 3765. doi:10.3390/su12093765. ISSN 2071-1050. https://www.mdpi.com/2071-1050/12/9/3765. 
  11. Podlasek, Anna; Koda, Eugeniusz; Vaverková, Magdalena Daria (8 January 2021). "The Variability of Nitrogen Forms in Soils Due to Traditional and Precision Agriculture: Case Studies in Poland" (in en). International Journal of Environmental Research and Public Health 18 (2): 465. doi:10.3390/ijerph18020465. ISSN 1660-4601. PMC PMC7827450. PMID 33430097. https://www.mdpi.com/1660-4601/18/2/465. 
  12. Verdouw, Cor; Tekinerdogan, Bedir; Beulens, Adrie; Wolfert, Sjaak (1 April 2021). "Digital twins in smart farming" (in en). Agricultural Systems 189: 103046. doi:10.1016/j.agsy.2020.103046. https://linkinghub.elsevier.com/retrieve/pii/S0308521X20309070. 
  13. Keceli, Ali Seydi; Catal, Cagatay; Kaya, Aydin; Tekinerdogan, Bedir (1 March 2020). "Development of a recurrent neural networks-based calving prediction model using activity and behavioral data" (in en). Computers and Electronics in Agriculture 170: 105285. doi:10.1016/j.compag.2020.105285. https://linkinghub.elsevier.com/retrieve/pii/S0168169919312220. 
  14. Catal, Cagatay; Tekinerdogan, Bedir (2019). "Aligning Education for the Life Sciences Domain to Support Digitalization and Industry 4.0" (in en). Procedia Computer Science 158: 99–106. doi:10.1016/j.procs.2019.09.032. https://linkinghub.elsevier.com/retrieve/pii/S1877050919311901. 
  15. 15.0 15.1 Nadal, Sergi; Herrero, Victor; Romero, Oscar; Abelló, Alberto; Franch, Xavier; Vansummeren, Stijn; Valerio, Danilo (1 October 2017). "A software reference architecture for semantic-aware Big Data systems" (in en). Information and Software Technology 90: 75–92. doi:10.1016/j.infsof.2017.06.001. https://linkinghub.elsevier.com/retrieve/pii/S0950584917304287. 
  16. 16.0 16.1 Avci Salma, Cigdem; Tekinerdogan, Bedir; Athanasiadis, Ioannis N. (2017), "Domain-Driven Design of Big Data Systems Based on a Reference Architecture" (in en), Software Architecture for Big Data and the Cloud (Elsevier): 49–68, doi:10.1016/b978-0-12-805467-3.00004-1, ISBN 978-0-12-805467-3, https://linkinghub.elsevier.com/retrieve/pii/B9780128054673000041 
  17. 17.0 17.1 Tummers, J.; Kassahun, A.; Tekinerdogan, B. (1 February 2021). "Reference architecture design for farm management information systems: a multi-case study approach" (in en). Precision Agriculture 22 (1): 22–50. doi:10.1007/s11119-020-09728-0. ISSN 1385-2256. https://link.springer.com/10.1007/s11119-020-09728-0. 
  18. DeLonge, Marcia S.; Miles, Albie; Carlisle, Liz (1 January 2016). "Investing in the transition to sustainable agriculture" (in en). Environmental Science & Policy 55: 266–273. doi:10.1016/j.envsci.2015.09.013. https://linkinghub.elsevier.com/retrieve/pii/S1462901115300812. 
  19. Gregor, Shirley; Hevner, Alan R. (2 February 2013). "Positioning and Presenting Design Science Research for Maximum Impact". MIS Quarterly 37 (2): 337–355. doi:10.25300/misq/2013/37.2.01. ISSN 0276-7783. https://doi.org/10.25300/MISQ/2013/37.2.01. 
  20. 20.0 20.1 Hevner; March; Park; Ram (2004). "Design Science in Information Systems Research". MIS Quarterly 28 (1): 75–105. doi:10.2307/25148625. https://www.jstor.org/stable/10.2307/25148625. 
  21. Wieringa, Roel J. (2014). Design Science Methodology for Information Systems and Software Engineering (1st ed. 2014 ed.). Berlin, Heidelberg: Springer Berlin Heidelberg : Imprint: Springer. ISBN 978-3-662-43839-8. 
  22. Runeson, Per; Engström, Emelie; Storey, Margaret-Anne (2020), Felderer, Michael; Travassos, Guilherme Horta, eds., "The Design Science Paradigm as a Frame for Empirical Software Engineering" (in en), Contemporary Empirical Methods in Software Engineering (Cham: Springer International Publishing): 127–147, doi:10.1007/978-3-030-32489-6_5, ISBN 978-3-030-32488-9, http://link.springer.com/10.1007/978-3-030-32489-6_5 
  23. Koksal, Omer; Tekinerdogan, Bedir (1 June 2017). "Feature-Driven Domain Analysis of Session Layer Protocols of Internet of Things". 2017 IEEE International Congress on Internet of Things (ICIOT) (Honolulu, HI, USA: IEEE): 105–112. doi:10.1109/IEEE.ICIOT.2017.19. ISBN 978-1-5386-2011-3. http://ieeexplore.ieee.org/document/8039061/. 
  24. 24.0 24.1 van Geest, Maarten; Tekinerdogan, Bedir; Catal, Cagatay (1 January 2021). "Design of a reference architecture for developing smart warehouses in industry 4.0" (in en). Computers in Industry 124: 103343. doi:10.1016/j.compind.2020.103343. https://linkinghub.elsevier.com/retrieve/pii/S0166361520305777. 
  25. 25.0 25.1 Tekinerdogan, Bedir; Öztürk, Karahan (2013), Mahmood, Zaigham; Saeed, Saqib, eds., "Feature-Driven Design of SaaS Architectures", Software Engineering Frameworks for the Cloud Computing Paradigm (London: Springer London): 189–212, doi:10.1007/978-1-4471-5031-2_9, ISBN 978-1-4471-5030-5, http://link.springer.com/10.1007/978-1-4471-5031-2_9 
  26. 26.0 26.1 26.2 26.3 26.4 26.5 Arsevska, Elena; Valentin, Sarah; Rabatel, Julien; de Goër de Hervé, Jocelyn; Falala, Sylvain; Lancelot, Renaud; Roche, Mathieu (3 August 2018). Dórea, Fernanda C.. ed. "Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System" (in en). PLOS ONE 13 (8): e0199960. doi:10.1371/journal.pone.0199960. ISSN 1932-6203. PMC PMC6075742. PMID 30074992. https://dx.plos.org/10.1371/journal.pone.0199960. 
  27. 27.0 27.1 27.2 27.3 27.4 Shahhosseini, Mohsen; Hu, Guiping; Huber, Isaiah; Archontoulis, Sotirios V. (1 December 2021). "Coupling machine learning and crop modeling improves crop yield prediction in the US Corn Belt" (in en). Scientific Reports 11 (1): 1606. doi:10.1038/s41598-020-80820-1. ISSN 2045-2322. PMC PMC7810832. PMID 33452349. http://www.nature.com/articles/s41598-020-80820-1. 
  28. 28.0 28.1 28.2 28.3 28.4 Sanjeevi, P.; Prasanna, S.; Siva Kumar, B.; Gunasekaran, G.; Alagiri, I.; Vijay Anand, R. (1 December 2020). "Precision agriculture and farming using Internet of Things based on wireless sensor network" (in en). Transactions on Emerging Telecommunications Technologies 31 (12): e3978. doi:10.1002/ett.3978. ISSN 2161-3915. https://onlinelibrary.wiley.com/doi/10.1002/ett.3978. 
  29. 29.0 29.1 Sharma, Rohit; Kamble, Sachin S.; Gunasekaran, Angappa; Kumar, Vikas; Kumar, Anil (1 July 2020). "A systematic literature review on machine learning applications for sustainable agriculture supply chain performance" (in en). Computers & Operations Research 119: 104926. doi:10.1016/j.cor.2020.104926. https://linkinghub.elsevier.com/retrieve/pii/S0305054820300435. 
  30. 30.0 30.1 30.2 30.3 30.4 30.5 Shi, Yinyan; Zhu, Yang; Wang, Xiaochan; Sun, Xin; Ding, Yangfen; Cao, Wexing; Hu, Zhichao (1 December 2020). "Progress and development on biological information of crop phenotype research applied to real-time variable-rate fertilization" (in en). Plant Methods 16 (1): 11. doi:10.1186/s13007-020-0559-9. ISSN 1746-4811. PMC PMC6998365. PMID 32042303. https://plantmethods.biomedcentral.com/articles/10.1186/s13007-020-0559-9. 
  31. Clevers, Jan; Kooistra, Lammert; van den Brande, Marnix (25 April 2017). "Using Sentinel-2 Data for Retrieving LAI and Leaf and Canopy Chlorophyll Content of a Potato Crop" (in en). Remote Sensing 9 (5): 405. doi:10.3390/rs9050405. ISSN 2072-4292. http://www.mdpi.com/2072-4292/9/5/405. 
  32. Bach, H.; Mauser, W.; Angermair, W. et al. (2010). "An integrative approach of using satellite-based information for precision farming: TalkingFields". Proceedings of the 61st International Astronautical Congress. http://iafastro.directory/iac/archive/browse/IAC-10/B5/1/6894/. 
  33. Bach, Heike; Mauser, Wolfram (2018), Mathieu, Pierre-Philippe; Aubrecht, Christoph, eds., "Sustainable Agriculture and Smart Farming" (in en), Earth Observation Open Science and Innovation (Cham: Springer International Publishing): 261–269, doi:10.1007/978-3-319-65633-5_12, ISBN 978-3-319-65632-8, http://link.springer.com/10.1007/978-3-319-65633-5_12 
  34. 34.0 34.1 Burlacu, George; Costa, Ruben; Sarraipa, Joao; Jardim-Golcalves, Ricardo; Popescu, Dan (2014), Camarinha-Matos, Luis M.; Barrento, Nuno S.; Mendonça, Ricardo, eds., "A Conceptual Model of Farm Management Information System for Decision Support", Technological Innovation for Collective Awareness Systems (Berlin, Heidelberg: Springer Berlin Heidelberg) 423: 47–54, doi:10.1007/978-3-642-54734-8_6, ISBN 978-3-642-54733-1, http://link.springer.com/10.1007/978-3-642-54734-8_6 
  35. Srivastava, S. (2002). "Space inputs for precision agriculture: scope for prototype experiments in the diverse Indian agro-ecosystems". Proceedings of the Map Asia 2002. https://www.geospatialworld.net/article/space-inputs-for-precision-agriculture-scope-for-proto-type-experiments-in-the-diverse-indian-agro-ecosystems/. 
  36. Nikkilä, Raimo; Seilonen, Ilkka; Koskinen, Kari (1 March 2010). "Software architecture for farm management information systems in precision agriculture" (in en). Computers and Electronics in Agriculture 70 (2): 328–336. doi:10.1016/j.compag.2009.08.013. https://linkinghub.elsevier.com/retrieve/pii/S0168169909001859. 
  37. Kaloxylos, Alexandros; Groumas, Aggelos; Sarris, Vassilis; Katsikas, Lampros; Magdalinos, Panagis; Antoniou, Eleni; Politopoulou, Zoi; Wolfert, Sjaak et al. (1 January 2014). "A cloud-based Farm Management System: Architecture and implementation" (in en). Computers and Electronics in Agriculture 100: 168–179. doi:10.1016/j.compag.2013.11.014. https://linkinghub.elsevier.com/retrieve/pii/S0168169913002846. 
  38. Kruize, J.W.; Wolfert, J.; Scholten, H.; Verdouw, C.N.; Kassahun, A.; Beulens, A.J.M. (1 July 2016). "A reference architecture for Farm Software Ecosystems" (in en). Computers and Electronics in Agriculture 125: 12–28. doi:10.1016/j.compag.2016.04.011. https://linkinghub.elsevier.com/retrieve/pii/S0168169916301296. 
  39. 39.0 39.1 McAfee, A.; Brynjolffsson, E. (October 2012). "Big Data: The Management Revolution". Harvard Business Review. https://hbr.org/2012/10/big-data-the-management-revolution. 
  40. Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep learning. Adaptive computation and machine learning. Cambridge, Massachusetts: The MIT Press. ISBN 978-0-262-03561-3. 
  41. Ashmore, Rob; Calinescu, Radu; Paterson, Colin (30 June 2022). "Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges" (in en). ACM Computing Surveys 54 (5): 1–39. doi:10.1145/3453444. ISSN 0360-0300. https://dl.acm.org/doi/10.1145/3453444. 
  42. Dayal, U.; Akatsu, M.; Gupta, C. et al. (2014). "Expanding Global Big Data Solutions with Innovative Analytics". Hitachi Review 63 (6): 333–39. https://www.hitachi.com/rev/archive/2014/r2014_06.html. 
  43. Suriarachchi, Isuru; Plale, Beth (1 October 2016). "Crossing analytics systems: A case for integrated provenance in data lakes". 2016 IEEE 12th International Conference on e-Science (e-Science) (Baltimore, MD, USA: IEEE): 349–354. doi:10.1109/eScience.2016.7870919. ISBN 978-1-5090-4273-9. http://ieeexplore.ieee.org/document/7870919/. 
  44. Rao, Wei; Jiang, Jing; Yang, Ming; Peng, Wei; Zhou, Aihua (2017), Li, Kang; Xue, Yusheng; Cui, Shumei et al.., eds., "Research on Energy Interconnection Oriented Big Data Sharing Platform Reference Architecture", Advanced Computational Methods in Energy, Power, Electric Vehicles, and Their Integration (Singapore: Springer Singapore) 763: 217–225, doi:10.1007/978-981-10-6364-0_22, ISBN 978-981-10-6363-3, http://link.springer.com/10.1007/978-981-10-6364-0_22 
  45. Sang, Go Muan; Xu, Lai; de Vrieze, Paul (2017), Camarinha-Matos, Luis M.; Afsarmanesh, Hamideh; Fornasiero, Rosanna, eds., "Simplifying Big Data Analytics Systems with a Reference Architecture" (in en), Collaboration in a Data-Rich World (Cham: Springer International Publishing) 506: 242–249, doi:10.1007/978-3-319-65151-4_23, ISBN 978-3-319-65150-7, https://link.springer.com/10.1007/978-3-319-65151-4_23 
  46. Arass, M.E.; Ouazzani-Touhami, K.; Souissi, N. (25 August 2020). "Data Life Cycle: Towards a Reference Architecture". International Journal of Advanced Trends in Computer Science and Engineering 9 (4): 5645–5653. doi:10.30534/ijatcse/2020/215942020. http://www.warse.org/IJATCSE/static/pdf/file/ijatcse215942020.pdf. 
  47. 47.0 47.1 Pääkkönen, P.; Pakkala, D. (1 December 2020). "Extending reference architecture of big data systems towards machine learning in edge computing environments" (in en). Journal of Big Data 7 (1): 25. doi:10.1186/s40537-020-00303-y. ISSN 2196-1115. https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00303-y. 
  48. 48.0 48.1 48.2 48.3 48.4 Maier, M. (31 October 2013). "Towards a big data reference architecture". Eindhoven University of Technology. https://research.tue.nl/en/studentTheses/towards-a-big-data-reference-architecture. 
  49. Meehan, J.; Aslantas, C.; Zdonik, S. et al. (2017). "Data Ingestion for the Connected World". Proceedings of the 8th Biennial Conference on Innovative Data Systems Research: 1–11. https://dblp.org/db/conf/cidr/cidr2017.html. 
  50. Stonebraker, Michael; Madden, Sam; Dubey, Pradeep (1 May 2013). "Intel "big data" science and technology center vision and execution plan". ACM SIGMOD Record 42 (1): 44–49. doi:10.1145/2481528.2481537. ISSN 0163-5808. https://doi.org/10.1145/2481528.2481537. 
  51. Adnan, Kiran; Akbar, Rehan (1 December 2019). "An analytical study of information extraction from unstructured and multidimensional big data" (in en). Journal of Big Data 6 (1): 91. doi:10.1186/s40537-019-0254-8. ISSN 2196-1115. https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0254-8. 
  52. 52.0 52.1 Singh, Sonit (2018). "Natural Language Processing for Information Extraction". arXiv. doi:10.48550/ARXIV.1807.02383. https://arxiv.org/abs/1807.02383. 
  53. Gangadharan, Veena; Gupta, Deepa (2020). "Recognizing Named Entities in Agriculture Documents using LDA based Topic Modelling Techniques" (in en). Procedia Computer Science 171: 1337–1345. doi:10.1016/j.procs.2020.04.143. https://linkinghub.elsevier.com/retrieve/pii/S1877050920311224. 
  54. Oliveira, P.; Rodrigues, F.; Henriques, P.R. (2005). "A Formal Definition of Data Quality Problems". Proceedings of ICIQ 2005. https://dblp.uni-trier.de/db/conf/iq/iq2005.html#OliveiraRH05. 
  55. Ziegler, Patrick; Dittrich, Klaus R. (2007), Krogstie, John; Opdahl, Andreas Lothe; Brinkkemper, Sjaak, eds., "Data Integration — Problems, Approaches, and Perspectives" (in en), Conceptual Modelling in Information Systems Engineering (Berlin, Heidelberg: Springer Berlin Heidelberg): 39–58, doi:10.1007/978-3-540-72677-7_3, ISBN 978-3-540-72676-0, http://link.springer.com/10.1007/978-3-540-72677-7_3 
  56. Lenzerini, Maurizio (2002). "Data integration: a theoretical perspective" (in en). Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '02 (Madison, Wisconsin: ACM Press): 233. doi:10.1145/543613.543644. ISBN 978-1-58113-507-7. http://portal.acm.org/citation.cfm?doid=543613.543644. 
  57. Batini, C.; Lenzerini, M.; Navathe, S. B. (11 December 1986). "A comparative analysis of methodologies for database schema integration" (in en). ACM Computing Surveys 18 (4): 323–364. doi:10.1145/27633.27634. ISSN 0360-0300. https://dl.acm.org/doi/10.1145/27633.27634. 
  58. Cugola, Gianpaolo; Margara, Alessandro (1 June 2012). "Processing flows of information: From data stream to complex event processing" (in en). ACM Computing Surveys 44 (3): 1–62. doi:10.1145/2187671.2187677. ISSN 0360-0300. https://dl.acm.org/doi/10.1145/2187671.2187677. 
  59. Carbone, P.; Katsifodimos, A.; Ewen, S. et al. (2015). "Apache Flink: Stream and Batch Processing in a Single Engine". Bulletin of the Technical Committee on Data Engineering 38 (4): 28–38. http://sites.computer.org/debull/A15dec/issue1.htm. 
  60. Begoli, Edmon (2012). "A short survey on the state of the art in architectures and platforms for large scale data analysis and knowledge discovery from data" (in en). Proceedings of the WICSA/ECSA 2012 Companion Volume on - WICSA/ECSA '12 (Helsinki, Finland: ACM Press): 177. doi:10.1145/2361999.2362039. ISBN 978-1-4503-1568-5. http://dl.acm.org/citation.cfm?doid=2361999.2362039. 
  61. Chen, Jinchuan; Chen, Yueguo; Du, Xiaoyong; Li, Cuiping; Lu, Jiaheng; Zhao, Suyun; Zhou, Xuan (1 April 2013). "Big data challenge: a data management perspective" (in en). Frontiers of Computer Science 7 (2): 157–164. doi:10.1007/s11704-013-3903-7. ISSN 2095-2228. http://link.springer.com/10.1007/s11704-013-3903-7. 
  62. 62.0 62.1 62.2 62.3 62.4 Henderson, Deborah; Earley, Susan; Data Administration Management Association, eds. (2017). DAMA-DMBOK: data management body of knowledge (Second edition ed.). Basking Ridge, New Jersey: Technics Publications. ISBN 978-1-63462-236-3. OCLC 1012690183. https://www.worldcat.org/title/mediawiki/oclc/1012690183. 
  63. Kim, Cheol-Han; Weston, R.H.; Hodgson, A.; Lee, Kyung-Huy (1 January 2003). "The complementary use of IDEF and UML modelling approaches" (in en). Computers in Industry 50 (1): 35–56. doi:10.1016/S0166-3615(02)00145-8. https://linkinghub.elsevier.com/retrieve/pii/S0166361502001458. 
  64. 64.0 64.1 64.2 64.3 64.4 64.5 Penzenstadler, B.; Femmer, H. (26 November 2012). "A Generic Model for Sustainability" (PDF). Technische Universitat. https://mediatum.ub.tum.de/attfile/1121449/hd2/incoming/2012-Nov/561736.pdf. 
  65. 65.0 65.1 65.2 Penzenstadler, Birgit; Femmer, Henning (2013). "A generic model for sustainability with process- and product-specific instances" (in en). Proceedings of the 2013 workshop on Green in/by software engineering - GIBSE '13 (Fukuoka, Japan: ACM Press): 3. doi:10.1145/2451605.2451609. ISBN 978-1-4503-1866-2. http://dl.acm.org/citation.cfm?doid=2451605.2451609. 
  66. van Klompenburg, Thomas; Kassahun, Ayalew; Catal, Cagatay (1 October 2020). "Crop yield prediction using machine learning: A systematic literature review" (in en). Computers and Electronics in Agriculture 177: 105709. doi:10.1016/j.compag.2020.105709. https://linkinghub.elsevier.com/retrieve/pii/S0168169920302301. 
  67. Khaki, Saeed; Wang, Lizhi (22 May 2019). "Crop Yield Prediction Using Deep Neural Networks". Frontiers in Plant Science 10: 621. doi:10.3389/fpls.2019.00621. ISSN 1664-462X. PMC PMC6540942. PMID 31191564. https://www.frontiersin.org/article/10.3389/fpls.2019.00621/full. 
  68. U.S. Department of Agriculture (17 December 2015). "Natural Resources Conservation Service Web Soil Survey". Ag Data Commons. U.S. Department of Agriculture. https://data.nal.usda.gov/dataset/natural-resources-conservation-service-web-soil-survey. Retrieved 23 April 2021. 
  69. National Agricultural Statistics Service (2019). "Surveys". U.S. Department of Agriculture. https://www.nass.usda.gov/Surveys/. 
  70. Wan, Zhiyuan; Xia, Xin; Lo, David; Murphy, Gail C. (2020). "How does Machine Learning Change Software Development Practices?". IEEE Transactions on Software Engineering: 1–1. doi:10.1109/TSE.2019.2937083. ISSN 0098-5589. https://ieeexplore.ieee.org/document/8812912/. 
  71. Yokoyama, Haruki (1 March 2019). "Machine Learning System Architectural Pattern for Improving Operational Stability". 2019 IEEE International Conference on Software Architecture Companion (ICSA-C) (Hamburg, Germany: IEEE): 267–274. doi:10.1109/ICSA-C.2019.00055. ISBN 978-1-7281-1876-5. https://ieeexplore.ieee.org/document/8712157/. 
  72. 72.0 72.1 Glória, André; Dionisio, Carolina; Simões, Gonçalo; Cardoso, João; Sebastião, Pedro (4 March 2020). "Water Management for Sustainable Irrigation Systems Using Internet-of-Things" (in en). Sensors 20 (5): 1402. doi:10.3390/s20051402. ISSN 1424-8220. PMC PMC7085535. PMID 32143482. https://www.mdpi.com/1424-8220/20/5/1402. 
  73. 73.0 73.1 G. S. Campos, Nidia; Rocha, Atslands R.; Gondim, Rubens; Coelho da Silva, Ticiana L.; Gomes, Danielo G. (29 December 2019). "Smart & Green: An Internet-of-Things Framework for Smart Irrigation" (in en). Sensors 20 (1): 190. doi:10.3390/s20010190. ISSN 1424-8220. PMC PMC6983084. PMID 31905749. https://www.mdpi.com/1424-8220/20/1/190. 
  74. 汪小旵; 陈满; 孙国祥; 张瑜; 章永年; Xiaochan, Wang; Man, Chen; Guoxiang, Sun et al. (11 November 2015). "冬小麦变量施肥机控制系统的设计与试验". 农业工程学报 31 (Z2): 88–92. doi:10.11975/j.issn.1002-6819.2015.z2.013. http://www.tcsae.org/nygcxb/article/abstract/2015Z213. 
  75. Yinyan, Shi; Man, Chen; Xiaochan, Wang; Odhiambo, Morice Oluoch; Weimin, Ding (1 January 2018). "Numerical simulation of spreading performance and distribution pattern of centrifugal variable-rate fertilizer applicator based on DEM software" (in en). Computers and Electronics in Agriculture 144: 249–259. doi:10.1016/j.compag.2017.12.015. https://linkinghub.elsevier.com/retrieve/pii/S0168169917311213. 
  76. Boegh, Eva; Soegaard, H.; Broge, N.; Hasager, C.B.; Jensen, N.O.; Schelde, K.; Thomsen, A. (1 August 2002). "Airborne multispectral data for quantifying leaf area index, nitrogen concentration, and photosynthetic efficiency in agriculture" (in en). Remote Sensing of Environment 81 (2-3): 179–193. doi:10.1016/S0034-4257(01)00342-X. https://linkinghub.elsevier.com/retrieve/pii/S003442570100342X. 
  77. Deery, David M.; Rebetzke, Greg J.; Jimenez-Berni, Jose A.; James, Richard A.; Condon, Anthony G.; Bovill, William D.; Hutchinson, Paul; Scarrow, Jamie et al. (6 December 2016). "Methodology for High-Throughput Field Phenotyping of Canopy Temperature Using Airborne Thermography". Frontiers in Plant Science 7: 1808. doi:10.3389/fpls.2016.01808. ISSN 1664-462X. PMC PMC5138222. PMID 27999580. http://journal.frontiersin.org/article/10.3389/fpls.2016.01808/full. 
  78. Qinghua, G.; Weicai, Y.; Fangfang, W. et al. (2018). "高通量作物表型监测:育种和精准农业发展的加速器 - 中国知网". Bulletin of Chinese Academy of Sciences (Chinese Version). doi:10.16418/j.issn.1000-3045.2018.09.007. https://www.cnki.net/kcms/doi/10.16418/j.issn.1000-3045.2018.09.007.html. 
  79. Tian, M.; Ban, S.; Chang, Q. et al. (2016). "Use of hyperspectral images from UAV-based imaging spectroradiometer to estimate cotton leaf area index". Transactions of the Chinese Society of Agricultural Engineering 32 (21): 102–108. https://www.ingentaconnect.com/content/tcsae/tcsae/2016/00000032/00000021/art00014. 
  80. Busemeyer, Lucas; Mentrup, Daniel; Möller, Kim; Wunder, Erik; Alheit, Katharina; Hahn, Volker; Maurer, Hans; Reif, Jochen et al. (27 February 2013). "BreedVision — A Multi-Sensor Platform for Non-Destructive Field-Based Phenotyping in Plant Breeding" (in en). Sensors 13 (3): 2830–2847. doi:10.3390/s130302830. ISSN 1424-8220. PMC PMC3658717. PMID 23447014. http://www.mdpi.com/1424-8220/13/3/2830. 
  81. Sharma, L.K.; Bu, H.; Franzen, D.W.; Denton, A. (1 June 2016). "Use of corn height measured with an acoustic sensor improves yield estimation with ground based active optical sensors" (in en). Computers and Electronics in Agriculture 124: 254–262. doi:10.1016/j.compag.2016.04.016. https://linkinghub.elsevier.com/retrieve/pii/S0168169916301399. 
  82. Arsevska, Elena; Roche, Mathieu; Hendrikx, Pascal; Chavernac, David; Falala, Sylvain; Lancelot, Renaud; Dufour, Barbara (1 April 2016). "Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web" (in en). Computers and Electronics in Agriculture 123: 104–115. doi:10.1016/j.compag.2016.02.010. https://linkinghub.elsevier.com/retrieve/pii/S0168169916300473. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original citation five for Sørensen et al. had a dead URL; an alternate source was found with the same information and used for this version.