Model-Based Risk Assessment for Cyber Physical Systems Security
Model-Based Risk Assessment for Cyber Physical Systems Security
Tantawy, Ashraf;Erradi, Abdelkarim;Abdelwahed, Sherif;Shaban, Khaled
2020-05-28 00:00:00
Model-Based Risk Assessment for Cyber Physical Systems Security Ashraf Tantawy, Abdelkarim Erradi, Sherif Abdelwahed, and Khaled Shaban Abstract |Traditional techniques for Cyber- I. Introduction Physical Systems (CPS) security design either treat Cyber-Physical Systems (CPS) integrate physical el- the cyber and physical systems independently, or do not address the speci c vulnerabilities of real ements, for sensing and actuation, with cyber elements, time embedded controllers and networks used to for computation and communication, to automate and monitor and control physical processes. In this work, control industrial processes. CPS is pervasively used we develop and test an integrated model-based in critical application domains such as health care, approach for CPS security risk assessment utilizing trac management, manufacturing, and energy in- a CPS testbed with real-world industrial controllers frastructures. These systems are increasingly adopt- and communication protocols. The testbed monitors and controls an exothermic Continuous Stirred ing commercial and open source software components Tank Reactor (CSTR) simulated in real-time. and standard communication protocols in order to CSTR is a fundamental process unit in many reduce infrastructure costs and ease integration and industries, including Oil & Gas, Petrochemicals, connection with corporate networks. However, this has Water treatment, and nuclear industry. In addition, the process is rich in terms of hazardous scenarios exposed such systems to new security threats and that could be triggered by cyber attacks due to made them a prime target for cyber-attacks to disrupt the lack of possible mechanical protection. The their normal operation. This may result in profound paper presents an integrated approach to analyze and catastrophic impacts such as endangering public and design the cyber security system for a given safety and economic stability. Despite ongoing eorts CPS where the physical threats are identi ed rst to guide the risk assessment process. A to secure and protect CPS, these critical infrastructure mathematical model is derived for the physical components remain vulnerable to cyber attacks. Re- system using a hybrid automaton to enumerate cent intensi ed sophisticated attacks on these systems potential hazardous states of the system. The cyber have stressed the importance of methodologies and system is then analyzed using network and data tools to assess and manage cyber security risks [1]. ow models to develop the attack scenarios that may lead to the identi ed hazards. Finally, the Additionally, it is necessary to identify and address attack scenarios are performed on the testbed and safety and security requirements earlier as part of the observations are obtained on the possible ways system design process [2]. to prevent and mitigate the attacks. The insights Traditional IT security risk assessment is a well- gained from the experiments result in several key established domain, guided by several international ndings, including the expressive power of hybrid automaton in security risk assessment, the hazard standards, e.g., [3]. More recently, standards emerged development time and its impact on cyber security to address the speci c needs of CPS domains, such design, and the tight coupling between the physical as IEC 62443 for securing industrial automation and and the cyber systems for CPS that requires an control systems. However, these standards provide best integrated design approach to achieve cost-eective practices for the security system independent of the and secure designs. monitored physical system. It is the responsibility of Index Terms |CPS, Security, Safety, Hybrid Au- the CPS designer to integrate the safety and security tomaton, Hybrid System, Modeling, Attack tree, aspects of the CPS, often in an ad-hoc manner. Realiz- Penetration Testing, Risk Assessment, Testbed, ing this gap, IEC/TC65 plenary board proposed a new SCADA, Industrial Automation, Modbus, CSTR group to consider how to bridge functional safety and cyber security for industrial automation systems [4]. In this work, we propose a model-based design approach where physical system modeling, data
ow A. Tantawy and S. Abdelwahed are with the Department of modeling, and attack trees are integrated to deliver Electrical and Computer Engineering, Virginia Commonwealth University, VA, 23220 USA e-mail: (amatantawy, sabdelwa- a uni ed design framework of safe and secure CPS. hed@vcu.edu) First, the physical system modeling documents all the A. Erradi and K. Shaban are with the Computer Science and system components and their interaction interfaces in- Engineering Department, Qatar University, Doha, Qatar e-mail: (erradi, khaled.shaban)@qu.edu.qa cluding system sensors, controllers, and the supporting arXiv:2005.13738v1 [cs.CR] 28 May 2020 TANTAWY et al. networks and protocols. Then we identify the data countermeasure trees, petri nets, as well as several ow and information exchange between the system quantitative risk measures. CPS security for industrial components to enable monitoring and controlling the processes has been studied in [8], where a multi-layer physical process. The resulting system model and data cyber-security protection architecture is proposed. A ow model are used as inputs for threat modeling system for runtime attack detection and prevention using attack trees. The latter are conceptual diagrams for industrial control systems is proposed in [9]. The describing the system threats and possible attacks to application of attack-defense trees to analyze cyber realize those threats [5]. security for CPS is reported in [10]. The authors in [11] We summarize the key contributions of the paper as proposed an integrated CPS safety and security lifecy- follows: (1) the introduction of hybrid system automa- cle process, merging ISA84/IEC 61511 and ISA99/IEC ton as a powerful tool for cyber security risk assess- 62443 lifecycle processes, where a combined failure ment, and countermeasure & mitigation design. (2) De- and attack graph is proposed for risk assessment. A velopment of an integrated safety-security model-based safety/security risk analysis approach that combines co-design approach for CPS. The approach integrates bowtie analysis for safety systems with attack tree physical and cyber models for attack scenario gener- analysis for security systems is considered in [12]. CPS ation, risk assessment, and countermeasures design. security for the electric power grid is discussed in (3) Providing insight on key research directions as [13], where the authors proposed a risk assessment revealed throughout the design and implementation of methodology and identi ed the potential threats for the case study CPS. each component of the grid. We applied and empirically validated our proposed As a research aid, researchers build testbeds that model-based approach for CPS security risk assess- represent scaled down versions of actual physical indus- ment on a CPS testbed that monitors and controls an trial control systems in order to provide a research en- exothermic Continuous Stirred Tank Reactor (CSTR) vironment that allows extensive attack-defense experi- simulated in real-time. CSTR is a widely used model mentation with realistic attack scenarios. The testbed for chemical reactor engineering. This process is se- facilitates the vulnerability and risk assessment, impact lected because of its practical importance and its asso- analysis of cyber-attacks on the controlled process, ciated hazards that can be triggered by cyber attacks and risk mitigation, in order to enable the design and with no possible mechanical protection, including re- evaluation of eective detection and defense mecha- actor over
ow and thermal runaway with the risk of nisms. Several testbeds developed at various univer- re, explosion or environmental hazards [2]. A technical sities and national research labs have been reported in report detailing the design of the CPS testbed along the literature for dierent CPS application domains, with the implementation of the CSTR simulator, the predominantly focusing on critical infrastructures such Basic Process Control System (BPCS) and the Safety as the smart power grid testbeds presented in [14], Instrumented System (SIS) are available online . [15], [16]. One of the early power grid testbeds are The rest of the paper is organized as follows: Sec- the National SCADA TestBed (NSTB) [17] and Idaho tion II discusses important related work. Section III National Labs (INL) SCADA Testbed which deploys presents the overall architecture for the CPS used real physical grid components including generators, as a case study throughout the paper. Section IV transmitters, substations and controllers [18]. In ad- introduces the hybrid automaton approach for hazard dition to the energy domain, a smart transportation identi cation. The design of cyber attacks is presented system testbed has been proposed in [19]. Industrial in Section V. Section VI summarizes the penetration Control Systems SCADA testbed has been discussed testing results and the formal risk assessment. The key in [20], and a water treatment testbed is reported in insights gained from the case study are summarized in [21]. However, these testbeds are either very costly to Section VII, and the paper is concluded with future implement and maintain, making them unaordable research directions in Section VIII. for most researchers, or they use commercial hardware and software components that are often treated as II. Related Work black boxes with little ability to model their inner working. The testbed we used for CPS security design, In addition to international standards, the research brie
y presented in this paper, uses open hardware and community recently started to address the CPS secu- software components, yet implements industrial stan- rity problem in non-traditional ways. A survey on CPS dard communication protocols, e.g., Modbus/TCP. security, challenges, and solutions could be found in Recently many CSTR-based testbeds were used to [6]. A review of risk assessment methods for SCADA study cyber attacks on chemical processes. The study systems is conducted in [7], including attack trees, in [22] addressed the problem of joint distributed at- https://github.com/qucse/CpsSecurity tack detection and secure estimation of the system 2 TANTAWY et al. states for a networked CPS over a wireless sensor Most of the reviewed testbeds mainly use control- network (WSN). A malicious adversary simultaneously theoretic approaches for the detection of cyber-attacks launches a False Data Injection (FDI) attack at the on chemical processes using state estimation tech- physical system layer to intentionally modify the sys- niques. The approaches in these papers are largely tem's state and jamming attacks at the cyber layer based on theoretical mathematical analysis and can to block the wireless transmission channels between get intractable for realistic CPS. Unlike the work re- sensors and remote estimators. The discretized and ported in this paper, none of the studies considered linearized state-space model of the CSTR near the the integration of the physical and cyber aspects of operating point is used to show the eectiveness of the the CPS for risk assessment and countermeasures de- approach. To improve the overall resilience of CPS, sign. Existing work mainly considers the model-based the authors in [23] propose a framework based on a approach for the physical system only, disregarding distributed middleware that integrates a multiagent the cyber system. In addition, the work in this paper topology. The proposed framework is evaluated using uses a standard CPS architecture that resembles in- a CPS simulator composed of a CSTR benchmark dustrial installations with industry-standard hardware system model, a Wireless Sensor and Actuator Net- and software. works (WSAN) and additional remote devices, includ- In this paper, we develop a model-based approach ing a remote controller, a server where the model for CPS security risk assessment. Rather than abstract of the plant is running and an HMI. Jamming and formulations, we experiment with the design process node loss attacks are carried out as experiments for on an industrial CPS testbed. The key objective of assessing the proposed attack framework. The work the experiments is to verify the design approach and in [24] develops a nonlinear system framework for to gain insights that could be used by the research understanding cyberattack-resilience of CSTR process community for future re nements. The work presented using three control designs, where the focus is on data in this paper dierentiates itself from existing literature injection attacks on sensor measurements to impact in three main aspects: (1) it follows a model-based the process safety. The study in [25] integrates a approach to couple CPS safety and security throughout neural network (NN)-based detection method and a the whole lifecycle, (2) it proposes a formal approach to Lyapunov-based model predictive controller for a class identify system hazards from a given physical model, of nonlinear systems. A CSTR with constant volume rather than relying on expert opinion, and (3) it is used to illustrate the application of the proposed implements the whole risk assessment lifecylce from NN-based detection method to handle cyber-attacks. asset identi cation to mitigation design, via a case A methodology for detecting abnormal events in aging study on an actual testbed, which provides both a clear Industrial Internet of Things (IIoT) has been developed implementation methodology as well as an insight for in [26]. The authors developed an ecient anomaly further research work. detection methodology that uses the correlation among process variables in order to detect stealthy cyber- III. CPS Testbed Architecture attacks, and presented extensive experimental results Figure 1 illustrates the overall CPS architecture used on a CSTR model. An approach has been proposed in this paper. The physical system is an exo-thermal in [27] to either certify that a given control system Continuous Stirred Tank Reactor (CSTR) simulated is safe under possible cyber-attacks on the measured in real-time. Simulation data is exchanged over an I/O data used for feedback and/or the commanded control physical data bus to both the Basic Process Control signals, or alternatively synthesize a particular spoo ng System (BPCS) and the Safety Instrumented System attack that corrupts the signals to make the closed-loop (SIS). The BPCS implements the process control func- system unsafe. A two state CSTR has been used as tions while the SIS implements the process shutdown testbed for implementing the approach. The research logic functions, and both run RT Linux OS. The mon- work in [28] employed a CSTR testbed to elucidate itoring workstation implements the Human Machine the dynamic interaction between feedback control and Interface (HMI) for the operator to monitor and control safety systems in the context of both model-based and the plant. The HMI has access control mechanisms that classical control systems. To this aim, the interaction allow dierent levels of control based on the user role of a Model Predictive Control (MPC) system with a (typically operator, supervisor, and engineer). Figure 2 safety system is studied in the context of the Methyl shows the implemented HMI. A control network inter- isocyanate (MIC) hydrolysis reaction in a CSTR to connects BPCS, SIS, Human Machine Interface (HMI) avoid thermal runaway. The authors develop a speci c workstation, and the Engineering workstation. A re- action for the MPC to take when the safety system is wall is con gured to isolate the control network from activated due to signi cant feed disturbances that lead the corporate network. A DeMilitarized Zone (DMZ) to thermal runaway conditions. is formed to host the historian and real time data 3 TANTAWY et al. A. Physical System We choose the Continuous Stirred Tank Reactor (CSTR) as the physical system for the testbed. The CSTR is an essential equipment in process plants where new products are formed from raw inlet reactants. The CSTR process possesses several features that make it a good choice for CPS security studies. First, the process variables to be controlled are closely-coupled, hence any change in one process variable will impact other variables as well and manifest itself in the overall Engineering WS HMI WS process behavior. Second, the process has a number of potential safety hazard scenarios that may be produced by a cyber attack. Finally, mitigation layers for a number of safety hazards rely mainly on the control Historian Server Real Time Data Server and safety systems, which are cyber systems that could be compromised by a cyber attack. Safety Application Control Application NI RT Linux OS NI RT Linux OS Controller HW Controller HW 1) Process Description: We consider an irreversible NI c RIO 9064 NI c RIO 9064 exothermic CSTR process, with a rst order reaction in the reactant A to produce product B with rate k SIS BPCS and a heat of reaction . A ! B Figure 3 shows the Piping & Instrumentation Dia- gram (P&ID) for the reactor. The reactor vessel has an inlet stream carrying the reactant A, an outlet stream Simulation PC carrying the product B, and a cooling stream carrying the cooling
uid into the surrounding jacket to absorb the heat of the exothermic reaction. Reactant A enters Figure 1: CPS Testbed Architecture the reactor with concentration C , temperature T A 0 and volumetric
ow rate F . A rst order reaction takes place where a mole fraction of reactant A is consumed to produce product B. The outlet stream contains both reactant A and product B, with reactant A concentration C , outlet temperature T , and
ow rate F . The outlet temperature T is the same as the reactor temperature. The coolant
uid
ows into the reactor jacket with temperature T and
ow rate F , J J 0 0 and leaves the jacket with temperature T . The total coolant volume in the jacket is designated by V . The detailed mathematical model is developed in [30]. 2) Process Measurement & Control: The reactor temperature could be controlled by adjusting the coolant
uid inlet
ow. A single variable control loop is used to regulate the reactor temperature. The con- trol loop is composed of a temperature sensor TT- 01, Temperature Controller TIC-01, and Control Valve Figure 2: HMI for the Reactor Process CV-2. PID control algorithm is used for the controller as it still represents 98% of all feedback control at over fty thousand manufacturing facilities around the world [31], hence it is the de facto standard in the servers, following NIST proposed architecture [29]. The process control industry. Similarly, the reactor liquid RT Server runs the open source mysql database to store level could be controlled by adjusting the inlet
ow plant data for later retrieval by the HMI application rate, using a control loop composed of a level sensor and corporate nancial analysis applications. LT-01, Level Controller LIC-01, and Control Valve CV- 4 TANTAWY et al. be closed. Following IEC 61511 international standard, both the Safety Instrumented Function (SIF) and the LIC control function have to be independent [33]. There- fore, the SIF will be composed of an independent level F , C , T 0 A0 0 sensor LT-02, a logic solver implementing an interlock XV-1 CV-1 function I-1, and a feed inlet shutdown valve XV-1. Upon detecting a high reactor level, the inlet feed will LT be stopped via the independent shutdown valve. LT I-1 For the reactor high temperature hazard, the inlet T IC TT 01 TT stream has to be closed. Therefore, the SIF will be composed of an independent temperature sensor TT- F , T J J0 FJ, TJ 02, a logic solver implementing an interlock function I-2, and the inlet stream shutdown valve XV-1. In CV-2 R-100 addition, since it is preferred to keep the inventory, HIC I-2 a shutdown valve XV-2 is added to the outlet stream. Therefore, upon detecting a high reactor temperature, F, CA, T both the inlet and outlet stream shutdown valves will CV-3 XV-2 be closed. These two safety functions are illustrated in Figure 3. Figure 3: Reactor P&ID B. Cyber System Architecture Hazard Initiating Consequences Safeguards Event (IPL) Understanding the cyber system architecture is es- (Cause) sential for CPS security design. Using the cyber ar- High Level BPCS failure 2 or more Reactor (Reactor OR Outlet fatalities dike (Miti- chitecture, combined with in-depth knowledge about over
ow) control valve (safety), gation) the physical process dynamics and its associated haz- fully closed Product loss ards, a model-based approach could be followed to OR Inlet valve ( nancial), stuck fully Environmental design an optimal security system for the given CPS open contamination according to predetermined optimality criteria, such as (environment) the minimization of nancial loss or the probability of High Tem- Coolant inlet 10 or more None occurrence of a hazardous event. perature control valve fatalities (Reactor fully (partially) (safety), Figure 4 shows the data
ow diagram for the testbed. Melt- closed OR Product loss There is a one-way data communication from SIS to down/ex- Inlet valve ( nancial), plosion) stuck fully Environmental BPCS, and HMI does not have direct communication open contamination to the SIS as per industrial practice. The HMI has (environment) direct communication with BPCS for monitoring and supervisory control actions. Plant data is stored in RT Table I: Partial HAZOP sheet for the reactor process server database, for later on-demand retrieval by the HMI application running on the monitoring worksta- tion and the corporate analysis application running 1. Figure 3 shows the two control loops using ISA on corporate PCs. As the RT Server has multiple standard symbols [32]. communication links, it will play a key role in attacker 3) Process Safety Shutdown: Process hazards are penetration from the corporate LAN to the control usually identi ed through a systematic risk assessment LAN as detailed in the paper. process involving hazard studies, e.g., HAZard and OPerability (HAZOP) study. High reactor level will IV. Identification of Cyber-Related Hazards lead to reactor over
ow. The hazard of the over
ow A. Hybrid Automaton Formulation will depend on several factors, including the toxicity of the reactants, the operating temperature of the reac- CPS security attacks distinguish themselves from tor, and the occupancy level of operators around the generic IT system attacks in that the main attacker's reactor area. Similarly, high reactor temperature will goal is to cause damage to the end physical system. lead to exceeding the reactor design temperature and Although CPS attacks may have the objective of steal- possible reactor damage. Table I summarizes these two ing con dential information that may impact company key hazards for the process and the possible initiating business, such attacks could be launched on the busi- events. ness LAN level, and does not require the extra eort For the Reactor over
ow hazard, inlet stream has to to penetrate down to the control network. Accordingly, 5 TANTAWY et al. the guard conditions for each edge is described in terms of respective valve positions. Plant In addition to the hybrid system automaton, the Plant data read/write Plant data read/write system is usually constrained to run inside an operating envelope, E R , where n is the number of state vari- SIS BPCS ables, i.e., n = jXj. A system hazard occurs when one Safety Logic Plant/Logic data write Control Logic or more of the state variables are outside the operating envelope, i.e., X 2= E . This criterion was applied to Store the CSTR system via simulation to identify the hybrid Plant/control data read/write Plant Meas. automaton hazardous states. Four hazardous states Database were identi ed, S ; S ; S , and S . In state S , the 2 4 6 7 2 Retrieve Retrieve RT Server Corporate PC Monitoring WS coolant stream is closed. The reactor and coolant
uid Corporate HMI Analysis temperatures exceed the design limit of 550 K after 14 minutes and 23 minutes, respectively, until reaching the thermal equilibrium point T = T = 582 K via Figure 4: CPS Testbed Data
ow diagram heat exchange after 60 minutes. We de ne the process hazard time by the minimum time taken by the process to exceed the design limit, therefore = 14 min. In the rst natural step to analyze CPS security threats is state S , the outlet and coolant streams are closed. If to identify cyber attacks that are hazard-related. Given the coolant steam is closed rst, then the system will the complete process design with control and safety transition through state S and the reactor will sta- systems, as represented by the P&ID in Figure 3, it bilize at the unsafe temperature T = T = 582. State is possible to construct the hybrid system automaton S , where the inlet and coolant streams are closed, and and utilize it to identify the process hazards and state S , where the reactor is completely isolated, are the associated cyber-controlled components that may similar to state S . In summary, the reactor will be be manipulated by an attacker to cause the process subject to a high temperature beyond the design limit hazard. in all hazardous states, potentially leading to reactor Hybrid system automaton is a formal model that runaway. Hazardous states are designated in Figure 5 describes hybrid dynamical systems that have both dis- by the red border lines. crete and continuous dynamics [34]. A simpli ed math- Given the hybrid automaton H of the system with ematical model for the hybrid automaton is represented the designated hazardous states, we can extract the as a collection H = (Q; E; ; X; Init,Inv,Flow,Jump), hazard initiating events. Algorithm 1 presents a formal where Q is a nite set of modes, E is a nite set of method to obtain the attack scenarios that potentially event names, : Q E ! Q is a transition function cause a process hazard. The input to the algorithm representing the discrete changes, X is a set of real- is the hybrid system automaton, designated hazardous valued variables, Init, Inv, and Flow are functions that states, and the initial system state S . The algorithm de ne the initial values, constrains, and evolution of returns the hazard execution tree, where the tree paths the state variables X for each mode, respectively, and are the shortest paths from the initial state S to each nally Jump assigns to each labelled edge a guard hazardous state. These traces are later linked to cyber condition. components that manipulate eld devices to cause the According to the hybrid automaton de nition, we process hazard. The algorithm can use any variant can describe the CSTR process by the hybrid automa- of the shortest path algorithm [35]. Algorithm 1 was ton in Figure 5. The set of modes Q = fS ; S ; : : : ; S g 0 1 7 applied to the CSTR hybrid automaton, and Figure is de ned by the status of the three process streams, 6 shows the hazard execution tree output. Hazard namely inlet stream I , outlet stream O , and cooling s s states will always be leaf nodes in the hazard tree. jacket stream J . The corresponding set of events are 2c Example hazard execution traces are: s ! s and o 2 E = fI = 0; I = 1; O = 0; O = 1; J = 0; J = 1g, s s s s s s X jjC 2c 3c C 2c s ! s ! s . where a value of zero refers to a closed stream and 0 1 4 a value of 1 refers to an open stream. The set of It should be highlighted that for large systems with state variables X = fL; T; T ; C g describes the reac- a large number of inputs, the number of states of j A tor level, temperature, coolant temperature, and inlet the hybrid automaton may be prohibitively large. One product mole concentration, respectively. The initial approach to solve this problem is the decomposition of state is S , designating normal reactor operation when the large system into smaller components, and apply all streams are open. The state constraints and evolu- algorithm 1 to each individual component. In fact, this tion are expressed according to the reactor dynamics is the industrial practice when conducting manual haz- and omitted for brevity. Finally, the Jump describing ard analysis by experts, e.g., HAZard and OPerability 6 X &C 2o 3o X jjC 2c 3c X jjC 2c 3c TANTAWY et al. X jjC 1c 1c S X jjC X jjC 2c 3c 1c 1c = 14 2 C X &C 2c 1o 1o 2c C 2o s s s 1 2 3 2c 2o X jjC C 1c 1c 2c X jjC C 1c 1c 2c S start = 2 2 s s s X &C 4 5 6 1o 1o 2c Figure 6: Hazard execution tree for the CSTR process X &C 1o 1o = 11 X jjC C 1c 1c 2c C 2o CSTR Hazard C C 2o 2c X jjC 1c 1c S S 7 4 X &C 1o 1o Figure 5: Reactor hybrid automaton. For transition 7 S labels, X refers to a shutdown valve and C refers to a control valve. Subscript 'o' refers to valve status 'open', while 'c' refers to valve status 'closed'. It is assumed that the associated Boolean variable is high when the valve is closed. Reactant stream: X jjC , 1 1 Coolant stream: C , Outlet stream: X jjC . 5 2 2 3 Algorithm 1: Generation of hazard execution tree input : H; S ; S; (Automaton, initial state, hazardous states) output: M (Adjacency matrix for the hazard B(C _C) S(X _C) B(C _C) S(X _C) 2 1 1 2 execution tree) for S 2 S do ShortestPath(S ; S; M ) ; Figure 7: Partial attack tree for the CSTR hazard tree end in gure 6 return M ; In order to convert the hazard tree into an at- (HAZOP) study [36]. In HAZOP, the system is divided tack tree, we need a mapping from process actions into nodes and each node is studied separately. This to cyber component actions. From Figures 1 and 3, approach works well as long as the coupling between the control valves are connected to the BPCS while system components is captured in the analysis of each the shutdown valves are connected to the SIS. We individual component. designate a relevant valve action using the terminol- Figure 6 shows how a process hazard is generated in ogy <System>(<Valve tag> <Action>), with system terms of process actions, such as opening or closing a values being B for BPCS and S for SIS, and action valve, but it does not show how these actions could be values being 'C' for Close and 'O' for Open. Therefore, taken by an attacker. To do so, the cyber architecture B(C C ) refers to closing valve C via BPCS relevant 2 2 of the CPS is required. From the cyber architecture, vulnerability exploitation. Figure 7 shows a partial abstract attack tree generated from the hazard tree in the cyber components that are linked to process actions Figure 6 using the introduced terminology. could be identi ed along with their associated vulner- abilities that could be exploited to launch the cyber From the attack tree, it is evident that B(C C ), i.e., attacks. Once vulnerabilities are identi ed, attack trees closure of valve C via BPCS vulnerability exploita- could be used for threat modeling [5]. tion, is sucient to cause a reactor hazard. Therefore, X &C 2o 3o X jjC 2 3 X &C 2o 3o X &C 2o 3o X jjC 2c 3c TANTAWY et al. in the next section we focus on the compromise of the BPCS to cause a process hazard. It should be BPCS SIS highlighted that in general, it may be required to Corp PC take multiple actions by dierent cyber components to achieve a process hazard. The CSTR process is a special case in the sense that the closure of C is necessary, and sucient, to cause a process hazard. RT Server B. Hybrid Automaton Expressiveness The analysis shows that the hybrid system automa- ton is a rich diagram, where important information HM I EWS could be inferred. First, a hazard execution tree could be generated, where hazard states are identi ed and Firewall, no communication execution traces that lead to process hazards along Firewall, communication with the associated actions could be extracted. Second, No Firewall, communication the time it takes the process to produce a hazard is No Firewall, No communica tion labeled inside the hazardous state, enabling an accu- rate assessment of the required response time. Finally, Figure 8: BPCS connectivity graph. Dierent line dierent ways to take the process out of the hazardous styles designate dierent connectivity patterns and state before the time to hazard elapses could be in- hence the diculty of attacks. Only one DMZ server is ferred from the outward transitions, noting that the shown. system could return to the previous state by reversing the transition action (e.g., opening a valve that was closed), Therefore, it is possible to identify the physical ow graph in Figures 1 and 4, respectively. The number countermeasures in order to protect the process in case of all possible paths between the corporate PC and of an attack. the BPCS target grows exponentially with the graph size. However, given that most paths are very unlikely V. Design of Cyber Attacks due to the inherent diculty, e.g., the presence of a Figure 7 illustrates the fact that valve CV-2 closure rewall or absence of any data communication between via BPCS is a necessary and sucient cause for reactor the connected nodes, we exclude paths that have a hazard. Therefore, this section focuses on the design rewall and lack data communication between nodes. of cyber attacks for BPCS to cause the desired valve The probability of a cyber attack across these excluded closure, either by integrity or DoS attacks. paths will be insigni cant and their impact on the risk From the CPS architecture, the attack entry point assessment process could be neglected. It should be is the corporate PC. This could represent an insider highlighted that for typical CPS networks, the number attack, using a personal laptop or a corporate PC, or of nodes is small so the extraction of all possible paths an outside attacker over the Internet who compromised is still feasible, although computationally expensive. a corporate PC. The mechanism by which the outside Figure 9 shows the attack tree that enumerates the attacker could gain access to the corporate PC either likely paths to compromise the reactor process starting locally or remotely via a remote session has been from the corporate PC. The root of the tree represents studied extensively in the IT Security literature and the cyber attack objective of closing valve CV , which will not be treated in the paper. For further details 2 is the leaf node in the abstract attack tree in Figure 7. about IT security vulnerabilities, refer to [37]. In the The basic events in the tree in Figure 9 are modeled rest of the paper, we will assume the attacker has a using detailed attack trees in this section. full privileged access to a corporate PC, but not to the control network. In the following, vulnerability scanning tools were Connectivity Graph: CV-2 is part of a temperature used to identify vulnerabilities and to design cyber control loop that stabilizes the reactor temperature. attacks for the identi ed nodes in Figure 9, namely The full control loop, including process I/O commu- Real time server, BPCS, and HMI. Whether the vul- nication and the PID controller, is implemented on nerabilities are exploitable is veri ed by penetration the BPCS. The compromise of CV-2 thus could be testing. Client-side attacks, such as phishing emails, achieved by compromising the BPCS itself or any of are not considered for the BPCS, as it is an embedded its connected cyber components to act as a pivot. unattended node (no direct HMI attached), and for Figure 8 shows the connectivity graph for the BPCS, the HMI, as it is not an industrial practice to run any constructed from the CPS architecture and the data internet application such as a browser or email client 8 TANTAWY et al. version. In addition, nmap identi ed the target as NI B(C _C) cRIO 9049 running RT Linux OS. There are many other open ports reported that are used for vendor- speci c services, which could be exploited with proper knowledge about vendor hardware and software. We limit our discussion to known services only to keep the treatment general. According to open ports and services, Figure 11 shows the BPCS attack tree. Port 502 listens to Modbus communication, so a Modbus malicious write attack, with and without a speci c register address, is HM I-BPS Da ta RT Server BPCS D irect considered. Information about register addresses could flow Comp romise Compromise Compromise be obtained from controller con guration data, either (Figure 12) (Figure 10) (Figure 11) as a hard copy or stored on local controller drive. Figure 9: Reactor Runaway Abstract Attack Tree. At- Lack of a speci c register address for valve CV-2 will tack scenarios are sequential, starting from RT server require writing to an address range, which may lead compromise, e.g., RT server compromise followed by a to a random valve con guration that won't cause a BPCS compromise represents one attack scenario. reactor hazard, e.g., closing all inlet and outlet valves. To increase the probability of success of such an attack, Port State Service Version Vulnerabilities process knowledge is required. Further, Modbus write 3306/tcp open sql 5.7.28, Pro- Authentication attack will not be successful unless no other Modbus tocol 10 bypass, passwords master is writing to the same register. For the given dump, Brute testbed, HMI can write to valve registers after setting force passwords process controllers to manual mode. This requires that 22/tcp open ssh OpenSSH 7.9p1 the reactor hazard time is less than the HMI-BPCS Ubuntu 10 (a)periodic communication rate. With SSH port open, another possible attack is Table II: RT Server vulnerabilities - nmap scan Brute-force SSH login attempt to guess admin pass- word and gain privileged access to the BPCS. This attack will succeed only if the SSH server is con gured on the HMI. Table X in the appendix summarizes the with no maximum number of password attempts. After tools and commands used. gaining the required privileged access, BPCS controller could be shutdown to cause DoS attack, or the control A. Design of RT Server Cyber Attacks We start from the compromised corporate PC that has legitimate access to the Real Time (RT) server RT Server Compromise for data logging purposes. Using nmap port scanning command against RT server, we get the information reported in Table II for mysql server and ssh ser- Linux root access vices. Mysql could be exploited in dierent ways in- cluding authentication bypass, password hash dump, and brute-force login, depending on the speci c server settings. Figure 10 shows an attack tree to compromise the RT server by escalating the attacker privileges via Mysql priviledged access 14 c c c 16 17 15 mysql exploitation. Remote desktop Cr ack Linux Apparmor SSH Enabled (Port From the attack tree, the probability of RT Server enabled (Port Passwor d Hash disable d for 22) 3389) Dump mysql compromise could be approximated by: P [Srv-Comp] = c c (c + c )(c c + c ) (1) 14 15 16 17 11 12 13 Penetration testing would assisst in assigning probabil- ity measures to the success of each attack action. c c c 11 12 13 Cr acking Bypass mysql Brut e-Force passwor ds fr om B. Design of BPCS Cyber Attacks Authentication mysql Login hashes Using nmap port scanning tool, Table III lists the Figure 10: RT Server Attack Tree. open ports and associated listening services and service 9 TANTAWY et al. Port State Service Version Vulnerabilities Port State ServiceVersion Vulnerabilities 22/tcp open ssh OpenSSH Brute-force 502/tcp open mdps Modbus/TCP Modbus 7.4 SSH login write, stop 502/tcp open mdps Modbus/TCP Modbus 3306/tcpopen mysql MySQL Write, 3389/tcpopen rdp Stop, MITM Table IV: HMI Vulnerabilities - nmap scan Table III: BPCS Vulnerabilities - nmap scan It should be highlighted that the presented attacks are not meant to be comprehensive. As an example, we problem could be overwritten to cause integrity attack, did not consider Man In The Middle (MITM) attacks assuming the attacker has sucient knowledge about as the compromised RT server is behind a rewall, the hardware as well as the necessary software tools. hence it cannot monitor the trac or cause ARP Finally, A DoS attack could also be designed by poisoning unless one node on the control network is exploiting Modbus STOP CPU function or by using compromised. Also, we assume perfect security around SYN Flood attack against the Modbus or SSH open the company premises, which does not allow for direct ports. This later attack will be successful only if the physical access to hardware. Finally, we assume the BPCS controller schedule is not a real time scheduler IT security policy does not allow the plug in of foreign that will give priority to control tasks. It should be USB devices to plant control equipment. Should any of noted that DoS attack will not succeed in causing a these assumptions be violated, the attack trees have to reactor hazard if process outputs are con gured as be expanded to accommodate for these new threats. fail-safe. In such case, the controller will automatically Moreover, the design of the cyber attacks does not write the pre-con gured safe value to process valves in include day zero attacks, which are unknown at design case of controller software crash. time and could only be detected online using anomaly From the BPCS attack tree, the probability of BPCS detection techniques [38]. direct compromise could be approximated by: P [BPCS-Comp] =c [c c + (c + c )] + a c c + 9 3 4 1 2 1 3 4 VI. Penetration Testing and Risk Assessment c [a c + (c + c )] (2) 8 2 7 5 6 Penetration testing was carried out according to the attack trees designed in Section V to verify the iden- ti ed vulnerabilities and support the risk assessment C. Design of HMI-BPCS Data Flow Attacks process by quantifying the probability of success of In this attack vector, HMI has to be compromised dierent attack actions. rst to mount the malicious data over the HMI-BPCS data
ow stream. Table IV shows the nmap scan results A. RT Server for the HMI. The machine is running Windows 10 and Table V summarizes the penetration testing results fully patched, therefore very few vulnerabilities were for the attack tree in Figure 10. Bypass mysql authen- detected. Mysql service is running to communicate tication attack failed due to enforced authentication with the RT Server for real time data and trend display con guration. Brute-force mysql login attack succeeded for the operator. Modbus master service is running to to obtain mysql weak root password "sql2563". With poll and write data to the BPCS controller. Remote mysql root password, Linux password hash le was desktop is turned on to grant dierent operators access loaded successfully into mysql using LOAD FILE com- to HMI machines for other plant zones. mand and dumped for password cracking. Finally, Figure 12 shows the HMI attack tree. Remote desk- Linux hashed password for root was recovered. The RT top is exploited using either brute-force password at- server is now under full control using root password and tack or via password leak. In addition, Modbus open either SSH or remote desktop login. Attack injection is port is exploited to write malicious values to rele- semi-automated using Metasploit modules. vant registers, provided process and I/O con guration From penetration testing results, it can be safely knowledge are available. Since HMI has mysql client assumed that mysql login authentication is enforced, connection to RT Server, mysql client side attacks especially for industrial plants, therefore c = 0. In could be considered as well. However, given the low addition, we can assume that only one remote access number of mysql client side vulnerabilities and the dif- capability is allowed, either remote desktop or SSH, culty associated, we ignored mysql client side attacks for operational purposes. We arbitrarily choose SSH in the penetration testing. hence c = 0. Therefore, the probability of RT Server From the HMI attack tree, the probability of HMI- compromise reduces to: BPCS data
ow compromise could be approximated by: P [HMI-BPCS] =c c + c (c + c c ) (3) P [Srv-Comp] = c c c c (4) 5 7 21 18 19 20 13 14 15 17 10 TANTAWY et al. BPCS Dir ect Compromise Int DoS Int a c Fail Safe Output 1 8 Configured Incorr ectly HMI Per iodic Comm Overwrite contr ol per iod > Reactor pr ogram runaway time Modbus Write to CV-2 c c c a c c c 1 2 3 4 2 7 5 6 Comm Task in not No Max Password Brut e-for ce SSH Modbus Write to Obtain Controller I/O Configuration Modbus STOP Shutdown Pr ocess running in r eal t ime Att empts Login, Shutdown Address Range I/O Configuration Stor ed on Local CPU A ttack Contr oller Knowledge mode, SYN Flood Att ack Configured Contr oller Hit (Insider) Drive Figure 11: BPCS Attack Tree. Probabilities related to the corporation are designated by c and to the attacker by a Vulnerability Exploit Exploit Notes HMI-BPCS Tool Result Dat aflow Compromise Bypass mysql Metasploit Fail mysql server Authentica- con gured to tion enforce au- thentication Brute-force Metasploit Success Weak mysql login mysql root password used Linux Metasploit Success Apparmor password disabled for Remote De sktop hash dump mysql Enabled Crack Linux Metasploit Success root 1 hashed password c 7 passwords recovered Passwor d Leak Obtain Controller (Insider) I/O Configuration c c 20 19 Table V: Real Time Server Penetration Testing Results Modbus Write to Pr ocess No RDP L ockout Brut e-for ce RDP CV-2 Knowledge policy passwor d attack to dicult passwords to remember, hence written down Figure 12: HMI-BPCS Data
ow Attack Tree passwords that may be leaked. B. BPCS The probability of such an sql attack could be made Table VI summarizes the penetration testing results zero by either closing the SSH port or enabling OS for the attack tree in Figure 11. Modbus STOP attack security on mysql database. Both options may not be failed as it is not supported by the controller Modbus possible for operational purposes. However, the prob- implementation. SYN
ood attack did not also work ability could be made arbitrarily small by enforcing because the BPCS controller runs RT Linux OS, which strong password policy for both mysql database and gives minimum guarantees on the control algorithm Linux server. It should be noted, however, that the task periodicity. The Brute-force SSH login attack was enforcement of a very strong password policy may lead successful in acquiring the root password, but unsuc- 11 TANTAWY et al. cessful in causing a DoS attack as the fail safe output D. Overall Risk Assessment is correctly con gured on the controller. However, with To assess the overall risk, a formula for risk scoring controller root password, it is possible to overwrite in terms of pre-de ned risk metrics is required. This the control program with malicious code and cause formula usually gives dierent weights for dierent risk integrity attack. Modbus write attack was successful metrics according to the organizational policy. The in writing the required value to the pre-determined two key metrics in any risk assessment are the threat register but the written value did not persist due to the likelihood, L, and the cost of the consequences, q. In periodic HMI-BPCS communication that continuously industry, and to simplify the analysis, the risk scoring writes to the same register. Finally, random write to a function is usually categorical, using risk ranking ta- contiguous address range was unsuccessful to cause a bles. The risk ranking table is on a matrix form where hazard without sucient knowledge about the output- the rows represent nite likelihood categories and the register mapping. columns represent nite consequence categories. The From the penetration testing results, we can make row-column intersection represent the risk rank, e.g., some simplifying assumptions to the attack tree prob- No, Low, Medium, and High risk. According to the risk abilities. The probability of incorrect con guration for rank, a Target Mitigated Event Likelihood (TMEL) is fail safe output could be set to 0, assuming the con- de ned. If the event likelihood is > TMEL, then an guration is made once per controller software lifetime additional protection layer is required to reduce the (we ignore here the possibility of a human error during likelihood to the TMEL. More information on Layer controller reprogramming). Also, the probability that Of Protection Analysis (LOPA) could be found in [33]. a random Modbus write would result in a hazard is In this work, we adopt a continuous (rather than very small given the large number of registers typically categorical) function that directly calculates the risk used in an industrial setup, so a = 0. Finally, strong score. We de ne four risk metrics: Likelihood L, and password policy enforcement decreases the probablity three consequences; Safety loss S, Financial loss F , of a successful brute force SSH attack signi cantly, and Environmental loss E. We designate the target hence c = 0. Accordingly, the probability of BPCS corporation risk score to be r. We de ne q to be the compromise reduces to: normalized cost given by: S F E P [BPCS-Comp] =c (c + c ) (5) 8 5 6 q = + +
(7) S F E m m m 1 = + + C. HMI S ; F ; E are normalization factors representing the m m m Table VII summarizes the penetration testing results maximum value set for each relevant category, and ; for the attack tree in Figure 12. RDP brute-force and
are weight factors de ning the contribution of password attack was successful with user: operator each metric to the overall risk score, and usually de- and password: "reactorws". Such passwords are not ned by the organization. The total cost Q is a random uncommon in industrial plants as often times there is a variable de ned on the sample space = f0; 1; 2; : : :g repeated naming convention used, which is composed representing the number of hazardous events in a time of plant unit name and WS for WorkStation. With interval. The best well-known risk score is given by the remote access granted, access to HMI to switch reactor expected value of the total cost: conroller to manual mode and close the coolant valve was possible. The Modbus attack was also successful R = E[Q] = E[qN ] = qE[N ] = qL (8) in writing to Modbus registers at the HMI. However, the period of writing data has to be much shorter where L is the likelihood of the hazardous event in than both the Modbus poll delay between the HMI terms of the number of events per time interval. The and BPCS and the GUI-Modbus registers writing loop target risk score r is chosen according to the conse- frequency. This is mainly to avoid malicious data quence. We assume the following linear relationship: overwrite. This later Modbus attack assumes sucient process knowledge and Modbus con guration data. log r = q (9) One nding from penetration testing that could simplify the probability of HMI-BPCS compromise in where is a proportionality constant. The risk score is (3) is that RDP lockout policy is an easy x that given by: could be con gured in Windows 10 settings. Therefore, setting c = 0 in (3) we get: S F E R = L + +
(10) S F E m m m P [HMI-BPCS] =c c + c c (6) 5 7 18 21 12 TANTAWY et al. Vulnerability Exploit Tool Exploit Hazard Notes Result Caused Modbus STOP Metasploit Fail No Function not supported by BPCS Modbus CPU Attack implementation SYN Flood At- Metasploit Fail No RTOS does not degrade control algorithm tack to ports 22, performance with communication spikes Brute-force SSH Metasploit Success Y/N user: admin, pass: "niroot" successfully de- login duced. DoS attack was unsuccessful due to fail safe setting. Integrity attack succeeded with proper software tools Modbus write Metasploit Success No Random write to Modbus registers did not to contiguous cause a process hazard address range Modbus write to Metasploit Success No HMI has periodic communication with a speci c register BPCS that overwrites written data every 5 sec. Table VI: BPCS Penetration Testing Results Vulnerability Exploit Tool Exploit Hazard Notes Result Caused Brute-force RDP Hydra Success Yes user: admin, pass: "reactorws" successfully login deduced. HMI access gained Modbus write Metasploit Success Yes Process and Modbus con guration knowl- to contiguous edge assumed address range Table VII: HMI Penetration Testing Results The risk score has to satisfy R r. In addition, since to reduce or eliminate the associated probabilities. It the likelihood is usually very small, we adopt the log should be highlighted that some vulnerabilities could function for the risk: be entirely eliminated by taking the extreme policy of blocking the relevant services. However, the price paid S F E log L log r log + + is less
exibility in asset management and potentially S F E m m m higher cost of ownership. = (q + log q) (11) To calculate the likelihood L, we combine (4), (5), E. Risk Assessment Results and (6) according to the attack tree in Figure 9. We apply the risk assessment methodology to the Ignoring higher order probability terms, the probability case study presented in the paper. It should be high- of reactor runaway is given by: lighted that the risk assessment methodology works P [Runaway] = c c c c [c (c + c ) + c c + c c ] 13 14 15 17 8 5 6 5 7 8 21 by de ning a target tolerable risk level, then working (12) backwards to nd an upper bound on the probabil- where, without loss of generality, we assume a unity ity of cyber failures for CPS components. This is in attack event per the chosen time interval. Assuming comparison to the forward risk assessment process the probability that an attacker could compromise a where the risk is assessed for a given system using its corporate PC is given by P , the overall probability of actual failure probability gures. The advantage of the reactor runaway is given by: backward approach is that it represents a proactive approach that is suitable for design time as well as L = P P [Runaway] runtime. = P c c c c [c (c + c ) + c c + c c ] (13) c 13 14 15 17 8 5 6 5 7 8 21 Equation (14) represents the design equation for the system, where the exponent on the right hand side is Combining (11) and (13): calculated from (11) using the desired target risk level. (q+log q) P c c c c [c (c + c ) + c c + c c ] 10 c 13 14 15 17 8 5 6 5 7 8 21 All probabilities are considered design variables that (14) need to be speci ed to achieve the target risk level. To Equation (14) represents the design constraint for illustrate the process, Table IX shows an instance of CPS security. Table VIII enumerates the design vari- the design parameters and risk target for the reactor ables with description and possible countermeasures runaway risk scenario. The values in Table IX are used 13 TANTAWY et al. Design Description Possible Countermea- Drawbacks Variable sures P Probability of compromising a cor- IT security policy porate PC c Probability of leaking I/O and Non-technical - HR Policy Modbus con guration c Probability of storing con guration Prevent local storage Local storage is more documentation on the controller convenient and guaran- tees no data loss c Probability of leaking process infor- Non-technical - HR Policy mation c Probability that HMI-BPCS comm. Very low probability period is greater than reactor haz- except for small reactors- ard time Increase HMI-BPCS comm. frequency c Probability of successful brute-force Enforce strong password Forgetting passwords mysql login attack policy and writing them down c Probability of successful password Enforce strong password Forgetting passwords recovery from a hash dump policy and writing them down c Probability that security monitor- Increase security armoring Less
exibility in ing app is disabled for mysql for mysql application database management and interaction c17 Probability that SSH is enabled for Block remote access Operational in
exibility RT Server and higher operational cost c Probability of leaking operation Non-technical - HR Policy passwords c Probability that RDP is enabled for Disable RDP In
exible operational HMI workstations environment and higher HMI con guration cost Table VIII: CPS Security Design Variables and Countermeasures in (7) to yield a normalized cost q = 0:8005. With design equation: = 6, the target risk score is obtained from (9) as 2 2 5 (c c c ) P 10 (16) 15 c 5 13 r ' 10 . With these values, we can substitute in (14) | {z } to obtain the relationship between the design variables. CPS where the rst term P represents the cyber attack CPS failure probability due to CPS security weaknesses and P c c c c [c (c + c ) + c c + c c ] 10 c 13 14 15 17 8 5 6 5 7 8 21 P represents the cyber attack failure probability due (15) to IT security weaknesses. Equation (16) formalizes the interplay between IT security, as ending at the corporate network level, and CPS security, from the To get more insight into the design process, we make isolating rewall down to the plant
oor. Figure 13 is a some assumptions regarding design variables. First, we log-log plot for (16), where the design space is the area assume that the probability of leaking information is under the curve. Any point outside the design region the same whether it is software con guration or process results in a higher risk level than the tolerable value documentation. Therefore, c = c . Second, we assume speci ed by the corporation. In addition, as one prob- 5 7 the same password policy is enforced for dierent ability increases, the other probability has to decrease systems, including OS, database server, and remote to compensate and achieve the target risk level. Ex- 5 5 desktop connection. Therefore, c = c = c . Third, amples are the extreme points (1,10 ) and (10 ,1). 13 14 18 the probability that HMI-BPCS communication period In practice, an operating point that represents a com- is greater than reactor hazard time is negligibly small promise between IT security and CPS security should 4 2 for all practical purposes, so c = 0. Fourth, since we be selected, e.g., (P ; P ) = (10 ; 10 ) in Figure 8 CPS c chose SSH as the only remote con guration tool for the 13. If the design constraint cannot be achieved, then RT server and disabled remote desktop capability, we a process modi cation, e.g., a safety instrumented can set c = 1. Finally, we assume remote desktop is function, needs to be added. This example shows the disabled for all operator HMIs despite the operational interplay between security and safety assessment, and inconvenience, hence c = 0. This leads to the reduced that they cannot be carried out sequentially, but rather 14 TANTAWY et al. Design Description Value S , and a hazard is developed, then opening valve CV- Parame- 2 will cause a transition to the safe state S . However, ter two key questions are in order: (1) how the current Safety loss weight 0.7 Financial loss weight 0.2 process state could be estimated?, and (2) how can we Environmental loss 0.1 take an action if the cyber system is compromised? The weight rst question concerns the problem of state estimation S Safety loss 10 fa- in hybrid systems which has been studied extensively talities in the literature [39]. To answer the second question, S Maximum safety loss 10 fa- taking an action with a compromised cyber system has talities F Financial loss $5 M one of two ways; either having a backup image of the F Maximum nancial $10 M whole system that could be restored, or diagnosing loss the system online to isolate the attack and gaining E Environmental loss $10; 000 control over the CPS. In both cases, the time taken E Maximum $2 M to implement the mitigation action, , should satisfy: environmental loss Target risk prop. 6 const < + (17) m p s d Table IX: Reactor Runaway Risk Scenario - Design where is the time the system can operate outside its Example safe operating envelope with no damage and is the time taken to detect the hazard and is the process hazard development time, as reported in the system hybrid automaton. If this condition is not satis ed, e.g., covert attack that misleads the operator via malicious HMI data and resulting in high detection time , then a process hazard will take place, and the mitigation action may or may not reduce the damage according to the process design. In such cases, a non-cyber (usually mechanical) mitigating solution has to be implemented in the process. For the reactor system, control valve CV-2 could have a manual override in the eld to open or close it by a manual action. Figure 13: Cyber Security Design Curve. Any point in VII. Discussion the area under the curve represents a valid design that satis es the target mitigated risk level The design and analysis process as applied to the case study in this paper could be generalized as in Figure 14, where the physical process is included in the in an integrated way. This is one of the main insights design cycle and hence subject to several iterations. A gained from this work, which is further discussed in truly integrated approach to design both the safety and Section VII. security systems should include both the physical and cyber systems at early stages of the design process. F. Hazard Mitigation The physical process modeling using hybrid automa- The purpose of the risk assessment and countermea- ton revealed several important insights. First, careful sure design is to minimize the probability of an attack process modeling is crucial to identify the true hazards that would cause a process hazard to the target mit- and associated cyber components to design a t-for- igated likelihood. However, there is still a probability purpose security system. Second, the time to develop a that an attack will be launched that cause a process process hazard is an important parameter that should hazard. In such case, a hazard mitigation strategy is be taken into account during risk assessment and mit- mandatory. igation design. Along with the detection system, this One of the advantageous of the process hybrid au- may lead to more cost-eective cyber designs. Finally, tomaton in Figure 5 is that it shows the best course as hazard generation is dependent on both the sequence of action when being in a hazardous state. The guard of attack and time spent in each state, combining conditions on the outward transitions identify the pro- the attack traces from the hybrid automaton with the cess manipulation required to get out of the hazardous attacker probabilistic model results in more accurate state. As an example, If the current process state is risk assessment. 15 TANTAWY et al. Identify process hazards & related cyber components Develop data
ow Develop hybrid Start automaton diagram Configuration data CPS Host/Network Sensors Design Develop abstract countermeasures attack tree & Implementation Vulnerability Countermeasure Anomaly Risk assessment analysis & Attack Trees Selection Detection Pentest Vulnerability Attack Penetration Analysis Design Testing Figure 14: Model-Based Design Cycle, as followed in Online Risk the case study Assessment Vulnerability Database There is a tight coupling between safety and secu- rity, in terms of their impact on the physical system. Figure 15: Model-Based risk assessment with day zero Designing both systems independently may not lead attacks to an optimal design. As the case study shows, only one physical component and two cyber nodes are the most critical components. All other components play a augmented with online anomaly detection to perform secondary role. a complete risk assessment. The architecture of this The model-based design approach followed in this solution is shown in Figure 15, including the possible work, where physical system modeling, data
ow mod- countermeasure design. Figure 15 represents a natural eling, and attack trees are integrated provides one extension to the work presented in this paper. uni ed framework to design safe and secure CPS. The adoption of this integrated approach in industry is VIII. Conclusion and Future Work contingent on the development of software tools that In this paper, we pursued an integrated approach automate most of the tasks that have been carried out to design the security system of a given CPS. The manually in this work. This includes model develop- key nding is that by exploring the physical system ment, hazard identi cation, data
ow graph develop- behavior, it was observed that not all attacks could ment, vulnerability scanning, attack tree development, cause a system hazard, a hazard may take time to penetration testing, and countermeasure design. Al- develop, and it may be possible to nullify the attack though several tools exist to automate individual tasks, eect. Therefore, the paper highlights the need for an an integrated tool that de nes and implements the integrated approach to design the safety and security interface between dierent tasks is necessary. systems. Model-based design is a cornerstone for this The value of working backwards by identifying rst integrated approach to be successful. For successful the process hazards, though reducing signi cantly the industrial adoption, a design automation tool chain number of attack scenarios, may not be readily obvious that integrates physical and cyber domain modeling, as in centralized architectures. For example, the work well as attack modeling and penetration testing, needs presented shows that valve CV-2 is the sole critical to be developed. component, and therefore, BPCS and HMI are the Several research directions could be identi ed from cyber components that should be protected. However, this work. First, the attacker pro le was ignored in BPCS is the main process controller that implements the risk assessment (all probabilities were assumed to all control algorithms, so the identi cation of CV-2 did be 1, certain events). This may lead to non-optimal not lead to eort or cost savings as the whole BPCS system designs. Attackers posses dierent knowledge will be hardened anyway. For centralized systems like and skill set, and this needs to be captured in the risk the CPS presented in this paper, this may lead to assessment process. In addition, the probability that program segmentation and special protection for CV- a process hazard may not be produced even though 2 related software modules. However, the signi cant a process upset is caused due to random attacker ac- value of this approach is revealed when considering tions was not considered. This factor may signi cantly distributed systems, where smart sensors and actuators decrease the overall likelihood of a process hazard post are embedded devices that implement their own soft- a cyber attack. Second, most successful attacks in the ware and communicate over a common bus, without a CPS domain are covert attacks that deceive the user need for a centralized controller. via concurrent HMI manipulation. These attacks have Finally, it should be highlighted that the presented complex structures and multiple objectives that need model-based design approach works for both oine to be studied in more depth. Third, cyber attacks may and online risk assessment. However, for online risk lead to a system malfunction, and therefore should assessment, it is also desirable to detect day zero at- be included in the safety risk assessment as an ini- tacks that are not available in vulnerability databases. tiating cause. This may lead to a reformulation of In such case, the presented framework needs to be the current industrial practice for hazard identi cation 16 TANTAWY et al. and protection studies. Finally, when simulating the [11] G. Sabaliauskaite and A. P. Mathur, \Aligning Cyber- Physical System Safety and Security," Complex Systems physical process for dierent cyber attack actions, it Design & Management Asia, pp. 41{53, 2015. was assumed that the system disturbances are within [12] H. Abdo, M. Kaouk, J. M. Flaus, and F. Masse, \A safe- operating limits. This assumption may be violated if ty/security risk analysis approach of Industrial Control Systems: A cyber bowtie { combining new version of attack the attack is organized against multiple system units tree with bowtie analysis," Computers and Security, vol. 72, simultaneously, resulting in the system being subject pp. 175{195, jan 2018. to concurrent disturbances and a cyber attack. The [13] S. Sridhar, A. Hahn, and M. Govindarasu, \Cyber-physical system security for the electric power grid," Proceedings of study of the composition of component risk assessment the IEEE, vol. 100, pp. 210{224, jan 2012. to yield an overall system risk assessment measure [14] A. Hahn, A. Ashok, S. Sridhar, and M. Govindarasu, is an interesting research direction. Finally, the risk \Cyber-physical security testbeds: Architecture, applica- assessment process presented in this work is model- tion, and evaluation for smart grid," IEEE Transactions on Smart Grid, vol. 4, no. 2, pp. 847{855, 2013. based, hence relying on known vulnerabilities and at- [15] S. Poudel, Z. Ni, and N. Malla, \Real-time cyber physical tack scenarios. However, day zero attacks represent a system testbed for power system security and control," In- signi cant threat and challenge to CPS security. The ternational Journal of Electrical Power & Energy Systems, vol. 90, pp. 124{133, sep 2017. integration of the model-based approach and machine [16] J. Jarmakiewicz, K. Parobczak, and K. Ma slanka, \Cyber- learning approaches used for day zero attack detection security protection for power grid control infrastructures," in the context of online risk assessment is an important International Journal of Critical Infrastructure Protection, vol. 18, pp. 20{33, 2017. future research direction. [17] P. A. Craig, \Metrics for the National SCADA Test Bed Pro- gram," Paci c Northwest National Lab.(PNNL), Richland, WA, no. October, 2008. Acknowledgment [18] T. Inl and Idaho National Laboratory, \National SCADA This research was made possible by NPRP 9-005- Test Bed Substation Automation Evaluation Report," Tech. Rep. October, 2009. 1-002 grant from the Qatar National Research Fund [19] X. Koutsoukos, G. Karsai, A. Laszka, H. Neema, B. Pot- (a member of The Qatar Foundation). The statements teiger, P. Volgyesi, Y. Vorobeychik, and J. Sztipanovits, made herein are solely the responsibility of the authors. \Sure: A modeling and simulation integration platform for evaluation of secure and resilient cyber{physical systems," Proceedings of the IEEE, vol. 106, no. 1, pp. 93{112, 2017. References [20] T. Morris, A. Srivastava, B. Reaves, W. Gao, K. Pavurapu, and R. Reddi, \A control system testbed to validate critical [1] C. Alcaraz and S. Zeadally, \Critical infrastructure protec- infrastructure protection concepts," International Journal tion: Requirements and challenges for the 21st century," of Critical Infrastructure Protection, vol. 4, pp. 88{103, aug International Journal of Critical Infrastructure Protection, vol. 8, pp. 53{66, jan 2015. [21] A. P. Mathur and N. O. Tippenhauer, \SWaT: a water [2] I. Friedberg, K. McLaughlin, P. Smith, D. Laverty, and treatment testbed for research and training on ICS security," S. Sezer, \STPA-SafeSec: Safety and security analysis for in 2016 International Workshop on Cyber-physical Systems cyber-physical systems," Journal of Information Security for Smart Water Networks (CySWater), pp. 31{36, IEEE, and Applications, vol. 34, pp. 183{196, jun 2017. apr 2016. [3] G. Stoneburner, A. Goguen, and A. Feringa, \Risk manage- [22] Y. Guan and X. Ge, \Distributed Attack Detection and ment guide for information technology systems :," tech. rep., Secure Estimation of Networked Cyber-Physical Systems Against False Data Injection Attacks and Jamming At- [4] H. Kanamaru, \Bridging functional safety and cyber se- tacks," IEEE Transactions on Signal and Information Pro- curity of SIS/SCS," in 2017 56th Annual Conference of cessing over Networks, vol. 4, no. 1, pp. 48{59, 2018. the Society of Instrument and Control Engineers of Japan, [23] F. P. Januario, J. Leit~ ao, A. Cardoso, and P. Gil, \Resilience SICE 2017, vol. 2017-Novem, pp. 279{284, IEEE, sep 2017. enhancement in cyber-physical systems: A multiagent-based [5] S. Mauw and M. Oostdijk, \Foundations of attack trees," framework," Multi-agent Systems, p. 185, 2017. in International Conference on Information Security and [24] H. Durand, \A nonlinear systems framework for cyberattack Cryptology, pp. 186{198, Springer, 2005. prevention for chemical process control systems," Mathe- [6] Y. Ashibani and Q. H. Mahmoud, \Cyber physical systems matics, vol. 6, no. 9, p. 169, 2018. security: Analysis, challenges and solutions," Computers & [25] Z. Wu, F. Albalawi, J. Zhang, Z. Zhang, H. Durand, and Security, vol. 68, pp. 81{97, 2017. P. D. Christo des, \Detecting and handling cyber-attacks [7] Y. Cherdantseva, P. Burnap, A. Blyth, P. Eden, K. Jones, in model predictive control of chemical processes," Mathe- H. Soulsby, and K. Stoddart, \A review of cyber security matics, vol. 6, no. 10, pp. 1{22, 2018. risk assessment methods for SCADA systems," Computers [26] B. Genge, P. Haller, and C. Enachescu, \Anomaly Detection & Security, vol. 56, pp. 1{27, 2016. in Aging Industrial Internet of Things," IEEE Access, vol. 7, [8] S. Huang, C. J. Zhou, S. H. Yang, and Y. Q. Qin, pp. 74217{74230, 2019. \Cyber-physical system security for networked industrial [27] N. Paoletti, Z. Jiang, M. A. Islam, H. Abbas, R. Mang- processes," International Journal of Automation and Com- haram, S. Lin, Z. Gruber, and S. A. Smolka, \Synthesizing puting, vol. 12, pp. 567{578, dec 2015. stealthy reprogramming attacks on cardiac devices," in Pro- [9] E. N. Ylmaz and S. G onen, \Attack detection/prevention ceedings of the 10th ACM/IEEE International Conference system against cyber attack in industrial control systems," on Cyber-Physical Systems, pp. 13{22, ACM, 2019. Computers and Security, vol. 77, pp. 94{105, aug 2018. [10] X. Ji, H. Yu, G. Fan, and W. Fu, \Attack-defense trees based [28] Z. Zhang, Z. Wu, H. Durand, F. Albalawi, and P. D. cyber security analysis for CPSs," in 2016 IEEE/ACIS Christo des, \On integration of feedback control and safety 17th International Conference on Software Engineering, systems: Analyzing two chemical process applications," Arti cial Intelligence, Networking and Parallel/Distributed Chemical Engineering Research and Design, vol. 132, Computing, SNPD 2016, pp. 693{698, IEEE, may 2016. pp. 616{626, 2018. 17 TANTAWY et al. Node Tool Module Purpose Commands RT Server nmap - Scan nmap -p- -sT -sV 192.170.1.2 RT Server Metasploit mysql authbypass hashdump Bypass auth. set RHOSTS 192.170.1.2, set THREADS 50, exploit RT Server Metasploit mysql login Brute-force set RHOSTS 192.170.1.2, set SQL login USERNAME root, set PASS_FILE ./passfile.lst, exploit BPCS nmap - Scan nmap -p- -sT -sV 192.168.1.10 BPCS Metasploit modicon command Controller set RHOSTS 192.168.1.10, set Remote RPORT 502, set MODE STOP, Start/Stop exploit BPCS Metasploit syn
ood BPCS SYN set RHOSTS 192.168.1.10, Flood attack set RPORT 22, set RPORT 502, exploit BPCS Metasploit ssh login Brute-force set RHOSTS 192.168.1.10, set SSH login USERPASS_FILE ./passfile.list, exploit BPCS Metasploit modbusclient Write to set RHOSTS 192.168.1.10, set a single RPORT 502, set DATA_ADDRESS 0, Modbus set DATA 0, exploit register BPCS Metasploit modbusclient Write to mul- set RHOSTS 192.168.1.10, set tiple Modbus RPORT 502, set DATA_ADDRESS 0, register set DATA_REGISTERS 0,100,100, exploit HMI nmap - Scan nmap -p- -sT -sV 192.168.1.20 HMI Hydra - Brute force Hydra -v -f -l opera- RDP tor -P ./passfile.list rdp://192.168.1.20 HMI Metasploit modbusclient Write to mul- set RHOSTS 192.168.1.20, set tiple Modbus RPORT 502, set DATA_ADDRESS 0, register set DATA_REGISTERS 0,100,100, exploit Table X: Vulnerability Scanning and Penetration Testing Commands [29] Stouer, Keith, Joe Falco and K. Scarfone., \Guide to 2012. industrial control systems (ICS) security.," tech. rep., NIST, [39] J. Prakash, S. C. Patwardhan, and S. L. Shah, \State estima- 2015. tion and nonlinear predictive control of autonomous hybrid system using derivative free state estimators," Journal of [30] Ashraf Tantawy, Sherif Abdelwahed and Q. Chen, \Con- Process Control, vol. 20, pp. 787{799, aug 2010. tinuous Stirred Tank Reactors: Modeling and Simulation for CPS Security Assessment," in The Internationl Con- ference on Computational Intelligence and Communication Networks, CICN 2019, 2019. [31] R. Miller and L. Desborough, \Web-enabled control loop assessment," Chemical Engineering, vol. 107, no. 5, pp. 76{ 79, 2000. [32] ISA, \ANSI/ISA-5.1-1992, Instrumentation Symbols and Identi cation," tech. rep., 1992. [33] IEC, \IEC 61511-1:2016, Functional safety - Safety instru- mented systems for the process industry sector - Part 1: Framework, de nitions, system, hardware and application programming requirements," tech. rep., IEC, 2016. [34] J.-F. Raskin, \An introduction to hybrid automata," in Handbook of networked and embedded control systems, pp. 491{517, Springer, 2005. [35] C. E. L. Thomas H. Cormen, Introduction to Algorithms, Third Edition. sep 2009. [36] J. Dunj o, V. Fthenakis, J. A. V lchez, and J. Arnaldos, \Hazard and operability (hazop) analysis. a literature re- view," Journal of hazardous materials, vol. 173, no. 1-3, pp. 19{32, 2010. [37] W. Stallings and L. Brown, Computer Security : principles and practice (4th edition). Pearson, 2018. [38] A. AlEroud and G. Karabatis, \A contextual anomaly de- tection approach to discover zero-day attacks," in 2012 In- ternational Conference on Cyber Security, pp. 40{45, IEEE,
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngElectrical Engineering and Systems SciencearXiv (Cornell University)http://www.deepdyve.com/lp/arxiv-cornell-university/model-based-risk-assessment-for-cyber-physical-systems-security-ksYAZANLxe
Model-Based Risk Assessment for Cyber Physical Systems Security