Journal:SCADA system testbed for cybersecurity research using machine learning approach
Full article title | SCADA system testbed for cybersecurity research using machine learning approach |
---|---|
Journal | Future Internet |
Author(s) |
Teixeira, Marcio Andrey; Salman, Tara; Zolanvari, Maede; Jain, Raj; Meskin, Nader; Samaka, Mohammed |
Author affiliation(s) |
Federal Institute of Education, Science, and Technology of Sao Paulo, Washington University in Saint Louis, Qatar University |
Primary contact | Email: marcio dot andrey at ifsp dot edu dot br |
Year published | 2018 |
Volume and issue | 10(8) |
Page(s) | 76 |
DOI | 10.3390/fi10080076 |
ISSN | 1999-5903 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.mdpi.com/1999-5903/10/8/76/htm |
Download | https://www.mdpi.com/1999-5903/10/8/76/pdf (PDF) |
This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you. |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
This paper presents the development of a supervisory control and data acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank’s control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes, and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environments.
Keywords: cybersecurity, machine learning, SCADA system, network security
Introduction
Supervisory control and data acquisition (SCADA) systems are industrial control systems (ICS) widely used by industries to monitor and control different processes such as oil and gas pipelines, water distribution systems, electrical power grids, etc. These systems provide automated control and remote monitoring of services being used in daily life. For example, state and municipal governments use SCADA systems to monitor and regulate water levels in reservoirs, pipe pressure, and water distribution.
A typical SCADA system includes components like computer workstations, a human-machine interface (HMI), programmable logic controllers (PLCs), sensors, and actuators.[1] Historically, these systems had private and dedicated networks. However, due to the wide-range deployment of remote management, open IP networks (e.g., the internet) are now used for SCADA system communication.[2] This exposes SCADA systems to the cyberspace and makes them vulnerable to cyber-attacks using the internet.
Machine learning (ML) and artificial intelligence techniques have been widely used to build intelligent and efficient intrusion detection systems (IDS) dedicated to ICS. However, researchers generally develop and train their ML-based security system using network traces obtained from publicly available datasets. Due to malware evolution and changes in attack strategies, these datasets fail to protect the system from new types of attacks, and consequently, the benchmark datasets should be updated periodically.
This paper presents the deployment of a SCADA system testbed for cybersecurity research and investigates the feasibility of using ML algorithms to detect cyber-attacks in real time. The testbed was built using equipment deployed in real industrial settings. Sophisticated attacks were conducted on the testbed to develop a better understanding of the attacks and their consequences in SCADA environments. The network traffic was captured, including both abnormal and normal traffic. The behavior of both types of traffic (abnormal and normal) was analyzed, and features were extracted to build a new SCADA-IDS dataset. This dataset was then used for training and testing ML models, which were further deployed in the network. The performance of the ML model depends highly on the available datasets. One of the main contributions of this paper is building a new dataset updated with recent and more sophisticated attacks. We argue that IDS using ML models trained with a dataset generated at the process control level could be more efficient, less complicated, and more cost-effective as compared to traditional protection techniques. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naïve Bayes, and KNN. Once trained and tested, the ML models were deployed in the network, where real network traffic was used to analyze the effectiveness and efficiency of the ML models in a real-time environment. We compared the performance obtained during the training and test phase of the ML models with the performance obtained during the online deployment of these models in the network. The online deployment is another contribution of this paper since most of the published papers present the performance of the ML models obtained during the training and test phases. We conducted this research to build an IDS software based on ML models to be deployed in ICS/SCADA systems.
The remainder of this paper is organized as follows. The next section presents a brief background of the ICS-SCADA system reference model and related works. Afterwards, we describe the developed SCADA system testbed, and then we describe the ML algorithms and the performance measurements used in this work. The last three sections show conducted attack scenarios and the main features of the dataset used to train the algorithms, the results and the interoperations behind them, and a summary of the main points and outcomes.
Background
In this section, we briefly present a description of the ICS-SCADA reference model and some related works in the domain of ML algorithms for SCADA system security.
ICS reference model
"ICS" is a general term that covers numerous control systems, including SCADA systems, distributed control systems, and other control system configurations.[3] An ICS consists of combinations of control components (e.g., electrical, mechanical, hydraulic, pneumatic) that are used to achieve various industrial objectives (e.g., manufacturing, transportation of matter or energy). Figure 1 shows an example of an ICS reference model.[4]
|
As can be seen from Figure 1, the ICS model is divided into four levels, from 3 to 0. Level 3 (the corporate network) consists of traditional information technology, including the general deployment of services and systems, such as file transfer, websites, mail servers, resource planning, and office automation systems. Level 2 (the supervisory control local area network) includes the functions involved in monitoring and controlling the physical processes and the general deployment of systems such as HMIs, engineering workstations, and history logs. Level 1 (the control network) includes the functions involved in sensing and manipulating physical processes, e.g., receiving the information, processing the data, and triggering outputs, which are all done in PLCs. Level 0 (the I/O network) consists of devices (sensors/actuators) that are directly connected to the physical process.
As shown in Figure 1, Level 3 is composed of the traditional IT infrastructure system (internet access service, file transfer protocol server, virtual private network (VPN) remote access, etc.). Levels 2, 1, and 0 represent a typical SCADA system, which is composed of the following components:
- HMI: Used to observe the status of the system or to adjust the system parameters for processes control and management purposes
- Engineering workstation: Used by engineers for programming the control functions of the HMI
- History logs: Used to collect the data in real-time from the automation processes for current or later analysis
- PLCs: Slave stations in the SCADA architecture that are connected to sensors or actuators
The SCADA communication protocol
There are several communication protocols developed for use in SCADA systems. These protocols define the standard message format for all inter-device communications in the network. One popular protocol, which is widely used in SCADA system environments, is the Modbus protocol.[5] Modbus is an application-layer messaging protocol that provides the client/server communications between devices connected to an Ethernet network and offers services specified by function codes. The function codes tell the server what action to take. For example, a client can read the status of the discrete outputs or the values of digital inputs from the PLC; or it can read/write the data contents of a group of registers inside the PLC. Figure 2 illustrates an example of Modbus client/server communication.
|
The Modbus register address type consists of four data reference types[5][6] which are summarized in Table 1. The “xxxx” following a leading digit represents a four-digit address location in the user data memory.
|
Related works
Cyber-attacks are continuously evolving and changing behavior to bypass security mechanisms. Thus, the utilization of advanced security mechanisms is essential to identify and prevent new attacks. In this sense, the development of real testbeds advances the research in this area.
Morris et al.[7] describe four datasets to be used for cybersecurity research. The datasets include network traffic, process control, and process measurement features from a set of attacks against testbeds which use Modbus application layer protocol. The authors argue there are several datasets developed to train and validate IDS associated with traditional information technology systems, but in the SCADA security area there is a lack of availability and access to SCADA network traffic. In our work, a new dataset with new types of attacks was created. So, once our dataset is available, we are providing a resource that could be used by researchers to train, validate, and compare their results with other datasets.
In order to investigate the security of the Modbus/TCP protocol, Miciolino et al.[8] explored a complex cyber-physical testbed, conceived for the control and monitoring of a water system. The analysis of the experimental results highlights the critical characteristics of the Modbus/TCP as a popular communication protocol in ICS environments. They concluded that by obtaining sufficient knowledge of the system, an attacker is able to change the commands of the actuators or the sensor readings in order to achieve its malicious objectives. Obtaining knowledge of the system is the first step in attacking a system. This attack is also known as a reconnaissance attack. Hence, in our work, our ML models are trained to recognize this kind of attack.
References
- ↑ Aragó, A.S.; Martínez, E.R.; Clares, S.S. (2014). "SCADA Laboratory and Test-bed as a Service for Critical Infrastructure Protection". Proceedings of the 2nd International Symposium on ICS & SCADA Cyber Security Research 2014: 25–9. doi:10.14236/ewic/ics-csr2014.4.
- ↑ Communication Technologies, Inc. (October 2004). "Supervisory Control and Data Acquisition (SCADA) Systems" (PDF). Technical Information Bulletin 04-1. National Communications System. https://www.cedengineering.com/userfiles/SCADA%20Systems.pdf. Retrieved 08 August 2018.
- ↑ Filkins, B. (2 February 2016). "IT Security Spending Trends". SANS Analyst Papers. SANS Institute. https://www.sans.org/reading-room/whitepapers/analyst/membership/36697. Retrieved 05 June 2018.
- ↑ 4.0 4.1 4.2 Stouffer, K.; Pilitteri, V.; Lightman, S. et al. (May 2015). "Guide to Industrial Control Systems (ICS) Security" (PDF). NIST Special Publication 800-82 Revision 2. National Institute of Standards and Technology. doi:10.6028/NIST.SP.800-82r2. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r2.pdf. Retrieved 05 June 2018.
- ↑ 5.0 5.1 "Modbus Technical Resources". Modbus Organization, Inc. http://www.modbus.org/tech.php. Retrieved 05 December 2017.
- ↑ 6.0 6.1 "Modbus Application Protocol Specification V1.1b3" (PDF). Modbus Organization, Inc. 26 April 2012. http://www.modbus.org/docs/Modbus_Application_Protocol_V1_1b3.pdf. Retrieved 08 August 2018.
- ↑ 7.0 7.1 Morris, T.; Wei, G. (2014). "Industrial Control System Traffic Data Sets for Intrusion Detection Research". Proceedings from the International Conference on Critical Infrastructure Protection VIII: 65–78. doi:10.1007/978-3-662-45355-1_5.
- ↑ Miciolino, E.E.; Bernieri, G; Pascucci, F.; Setola, R. (2015). "Communications network analysis in a SCADA system testbed under cyber-attacks". Proceedings of the 23rd Telecommunications Forum TELFOR: 341-344. doi:10.1109/TELFOR.2015.7377479.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added.