Measuring the Wind: Determining a System’s Cyber Combat Survivability Level
by William Bryant
Today’s combat aircraft, in addition to being threatened by many traditional kinetic weapons that attack from air, land, and sea domains, are threatened by a new class of weapons that attack from cyberspace. While these cyber weapons differ in many ways from kinetic weapons, the proven fundamentals of Aircraft Combat Survivability (ACS) can still be applied to them through the Aircraft Cyber Combat Survivability (ACCS) discipline, which has been developed in partnership with Dr. Robert Ball and detailed in a series of four articles previously published in the Aircraft Survivability journal [1–4].
Like ACS, ACCS considers both the aircraft’s susceptibility—its ability to avoid a hit from a cyber weapon—and its vulnerability—its ability to absorb a hit and continue to accomplish its mission. In the cyber survivability realm, it is important that these terms be clearly defined because most cybersecurity experts use a much broader definition for vulnerability, and this difference can sometimes confuse cybersecurity and survivability experts working together [2]. ACCS also uses a probabilistic kill chain that is similar to kinetic ACS to model the success of attacks. However, the kill chain is slightly modified to account for the different characteristics and “physics” that apply to cyber vs. kinetic weapons [2].
In theory, the probabilistic kill chain provides a simple and robust way to measure a system’s level of survivability; however, in practice, determining what probabilities are relevant and reasonable can be difficult and expensive, even for kinetic ACS. Because of the extreme expense involved in destroying test assets, a combination of analysis, modeling and simulation (M&S), small-scale component testing, and minimal full-scale live fire testing typically determines the level of ACS. These approaches should support and rely on each other in an integrated way. For example, M&S is much less expensive than a destructive test, so it should be run before testing, with the M&S results informing the selection of test cases for physical testing and the test results validating that the M&S is reasonably accurate.
ACCS can follow a similar process, even though the analysis, M&S, and testing will all likely have higher levels of uncertainty given the nature of cyber weapons and our relative immaturity in understanding and modeling them. But without some meaningful measurement, it becomes extremely challenging to understand which Cyber Survivability Enhancement Concepts (CSECs) and Cyber Survivability Enhancement Features (CSEFs) to implement in a design or how survivable a system is in being able to execute a particular mission in an expected cyber-contested operating environment.
NO MANAGEMENT WITHOUT MEASUREMENT
As illustrated in Figure 1, we can create a meaningful measurement of ACCS that is similar to measuring kinetic ACS using a combination of risk assessment, M&S, component-level testing, and selected full-scale live fire testing.
While cyber survivability damage may not be as “visible” as the physical damage left by, say, a 20-mm cannon shell after a kinetic test, the effects of cyber survivability damage on the ability of a system to function and accomplish its mission can still be seen and measured. This concept is similar to our inability to see the wind but still be able to measure its effect on a physical device, such as an anemometer. And because, as the well-known aphorism goes, “you can’t manage what you don’t measure,” if we want to effectively manage our aircraft systems’ cyber survivability going forward, a proven ability to measure it must be established.
RISK ANALYSIS
We create all of our aviation systems for a reason—to provide some needed capability. Moreover, loss of that capability results in a functional or mission kill; so, we start the process of measuring ACCS with a Mission-Based Cyber Risk Assessment (MBCRA). An MBCRA helps connect potential cyber attacks to a mission effect, which is a weak area of much current cyber analysis and testing on aircraft and weapon systems. Cyber test reports sometimes provide long lists of vulnerabilities discovered. However, if it is unclear what those vulnerabilities ultimately mean for the mission, decision-makers do not know what issues to address and what they can ignore or defer.
One MBCRA tool/process that we have used successfully on a number of systems is the Unified Risk Assessment and Measurement System (URAMS®) [5]. As illustrated in Figure 2, this system includes a diverse set of integrated qualitative and quantitative tools that provide risk management for weapon systems and aviation platforms throughout the development life cycle and across a range of contested cyberspace environments.
URAMS starts with an engineering analysis, and our preferred tool for this is System-Theoretic Process Analysis for Security (STPA-Sec). This tool was developed by leveraging the safety analysis work performed at the Massachusetts Institute of Technology (MIT), and it has since been used with great effectiveness across a range of military weapon systems and civilian aerospace systems. STPA-Sec is grounded in systems engineering and focused on mission-level losses as the true drivers of relevant security design. The tool also enables analysis of a system’s security posture early in the life cycle, enabling true “baking in” of security. (Note, for a detailed description of STPA, see Leveson and Thomas [6]; for a detailed description of STPA-Sec, see Young [7].)
From the analysis, a set of risk scenarios specific to the system under consideration and its expected operating environment is developed. Those risk scenarios are then scored using any of a wide range of available scoring tools. URAMS scoring tools are characterized first by the model of risk and what factors are assumed to contribute to overall risk, and second by input type. Inputs can be provided as single-point values, point values with a confidence score, three-point estimates, or 90% confidence intervals (CIs). Selection of input type depends on the training and experience of the assessors, as well as how important uncertainty is to decision-makers. While human subject-matter experts (SMEs) are used as the basis for scoring in URAMS, automated- and algorithmic-based risk approaches can and should be used to inform those SMEs when available.
The risk scenarios can then be combined using a simple Monte Carlo simulation to determine the overall risk for a system or portfolio of systems. Combining risk facilitates building a structured assurance case that includes the analyzed mission structure connected to the specific risk scenarios and their scores, which flow up through the mission elements to the overall system.
MODELING ACCS
With an understanding of what potential adversary attacks can produce unacceptable mission effects, the ACCS probabilistic kill chain can model those specific attacks. As illustrated in Figure 3, the six steps of the ACCS probabilistic kill chain are analogous to the six steps of the ACS probabilistic kill chain.
Note that the kill chain starts at the top of the figure, where the aircraft enters the combat zone. The probability that the adversary has an active cyber weapon searching for the aircraft is PA, and the complementary probability that the adversary does not is PCA. If the adversary does have a weapon searching, then the calculation proceeds to the next step of considering if the aircraft is detected in cyberspace. If the adversary does not, the aircraft has survived, and no more calculation is required. The process continues for each of the six steps, and all six must occur for a cyber weapon to kill the aircraft.
With the probabilistic kill chain as the model of how to kill an aircraft, the next step is a simple 1-vs.-1 simulation of one cyber weapon against one aircraft. Because the level of uncertainty in a number of the probabilities will likely be high, one technique to incorporate this uncertainty is to use 90% CIs as our inputs to the probabilities instead of point values. An input with a high uncertainty will have a wide 90% CI, such as 20–80%; and one with low uncertainty will have a narrow 90% CI, such as 45–55%.
A simple Monte Carlo simulation can calculate the probability of a successful attack, or PK, as shown in the example in Figure 4 (taken from Bryant and Ball [4]).
Note that the six probabilities in Figure 3 are notional examples listed left to right across the top of Figure 4, and the result, in this case, is a mean PK of 16% with a 90CI of 13–20%. The math is relatively simple and uncontroversial here; the difficulty comes in developing meaningful probabilities to input into the simulation.For kinetic ACS, inputs such as the probability of detection for a given surface-to-air missile (SAM) system against a particular aircraft at a particular angle and range can be calculated from known physical characteristics of the aircraft, such as radar cross section and the capabilities of the detection system on the SAM. However, with the current state of cyber weapons knowledge, those types of calculated probabilities with high certainty are extremely rare. Instead, human experts normally provide probabilities as the measurement tools, dramatically increasing the values’ uncertainty.
Human experts have many weaknesses and known biases thoroughly documented in the academic literature on human decision-making. (For a nonacademic summary of the literature, see Kahneman [8]. For a more academic summary, as well a large number of relevant articles, see Kahneman, Slovic, and Tversky [9] and Gilovich, Griffin, and Kahneman [10].) While education on some key heuristics and biases can improve performance in certain cases, and calibration has been shown to decrease overconfidence, humans remain flawed measurement tools that should be augmented by other approaches, such as direct-attack simulations and testing whenever available. The expected high level of uncertainty from the inputs always needs to be tracked and presented to decision-makers.
With the ability to calculate the PK for individual cyber weapons against individual aircraft, those values can be used in simulation tools (such as the Advanced Framework for Simulation Integration and Modeling [AFSIM]) that give the ability to model many-vs.- many environments (such as an airborne strike package attempting to strike a set of targets). At a higher level, campaign-level modeling tools can use these results to understand what the impact of cyber weapons can be at the campaign level.
To illustrate the modeling of a cyber weapon at the campaign level, the Combat Forces Assessment Model (CFAM) was used with an unclassified air campaign modeled in Desert Storm in 1990. In the baseline historical case, the campaign took 44 days to complete without any cyber weapons, which is close to what actually happened. When a cyber weapon similar to the one in Figure 4 was postulated to attrit 15% of only F-16 sorties, the air campaign took 2 days longer, which is an increase of about 5% for the overall campaign. When a more virulent cyber weapon that attritted 30% of F-16 sorties was modeled, the air campaign took an additional 6 days, a 14% increase in overall campaign length. The total number of sorties flown by aircraft type for both scenarios is shown in Figure 5.
At the campaign level, there is some ability for other aircraft types to fly more sorties to make up for attrition in a particular type, which is why there was only a 2-day increase in overall campaign length with a loss of 15% of F-16 sorties. However, when attrition was increased to 30% of F-16 sorties lost, that excess capacity had already been used, so a greater impact of 4 additional days (at total of 6 days) was generated by doubling the mission. Note that, in this scenario, we were modeling purely the more common cyber mission kills as the sorties were lost, but the aircraft were still available to try again the next day. If the aircraft were destroyed via attrition kills instead, this loss would have affected the campaign much more dramatically.
These results illustrate how modeling can quantify the impact of potential cyber weapons in terms much more meaningful to senior leaders and Warfighters. A Combined Forces Air Component Commander (CFACC) immediately understands what increasing the campaign length by 5% or 14% means vs. a more nebulous sense of cyber risk. Note also that this simulation was for a single, modestly effective cyber weapon; in a full-scale conflict, we expect to see multiple cyber weapons, some of which may have extremely high levels of effectiveness.
COMPONENT TESTING
As discussed previously, the greatest difficulty in the M&S of ACCS is likely to be the extremely high level of uncertainty because inputs from human experts assessing the probabilities will drive most analysis. Component testing provides a way to validate those probabilities and narrow the uncertainty.
Component testing is a common practice in assessing kinetic ACS. Typically, a small part of a system, such as a piece of armor, is tested against a threat or threat component. For example, a specified amount of explosive in a casing might be detonated at a set distance from the armor piece and the results recorded. These tests typically have much more in common with laboratory experiments under strictly controlled conditions than full-scale operational testing, which tries to replicate expected operational conditions. In addition, component testing does not attempt to test an attack’s entire probabilistic kill chain at once but normally focuses on one element. Testers often assume the other steps have already occurred, or will occur, outside the component test.
Despite their limited nature, component-level tests are extremely important and form a critical link between modeling and full-scale live fire testing. Component-level testing can also be significantly less expensive than full-scale testing and can be used to validate the M&S results. By using smart Design of Experiments (DOE), statistical methods can be used to minimize the number of data points required to validate that a model’s predictions are reasonable across a range of performance predictions [11].
For cyber weapons, component testing is sometimes referred to as blue team testing because the testers work closely with the design engineers. Cyber-physical system components (such as avionics boxes, individual cards, or software elements) can be tested using a range of techniques [12]. For example, static analysis of software code looks for errors and vulnerabilities in the instructions provided to the computing hardware, whether accomplished manually or by an automated test system. Dynamic analysis tests the code in operation, and many other types of tests can be executed depending on the software, hardware, and technology used in the system.
Whatever testing techniques are used, component testing should occur across the life cycle, from very early in design through sustainment. The results inform M&S and risk analysis and can serve as important inputs into the last step of full-scale live fire testing.
FULL-SCALE LIVE FIRE TESTING
Because of the perceived inadequacy of earlier methods of measuring aircraft survivability, major U.S. combat systems have required live fire testing since 1987 [13]. Live fire testing integrates knowledge of threat systems with the full system under test. It requires using test articles fully configured for combat, including all flammables and explosives that would normally be carried in the operational environment. In addition, where possible, actual threat systems are fired at a system in as accurate an environment as can be achieved. (For aircraft, this often involves simulating flight by passing air over them at high speed.) And the results are observed and recorded to validate all the previous analysis, modeling, and component-level testing.
Understandably, these types of tests are often enormously expensive, allowing for only a few test points to be collected at times, even with the largest and best-funded programs. Additional data points can sometimes be collected by repairing aircraft between tests, but some live fire testing is so destructive that reuse of a test asset is not always feasible or practical. Furthermore, in some cases full-scale live fire testing is simply impractical, so an alternative approach must be developed (and approved by Congress).
Full-scale testing integrates the entire kill chain, although it may not all be done in a single test. For example, an aircraft may be flown against a ground-based threat system to test its ability to track, other tests will validate the ability of the system to fly out to the aircraft, and live fire testing will validate the models that predict the amount of damage expected to be sustained by the aircraft. All of these data are integrated into the complete probabilistic kill chain and used to validate that the M&S and component testing are reasonably accurate and did not miss anything significant. For example, suppose a combat helicopter’s rotors catastrophically fail due to a hit from small arms fire that the models did not predict. In that case, the models will likely have to be adjusted, and design changes may require further testing.
The closest equivalent to live fire testing in the cyber world is red team testing, or penetration testing. These terms can be defined differently, but for ACCS measurement, we are talking about the test of a full system in its “combat configuration” with as much of its supporting infrastructure in place as possible that is tested by threat-representative attackers (a red team) from outside the system. However, as with live fire testing, testing the entire kill chain at once may not be practical, and tests can sometimes be separated. For example, a test team may establish that a particular system component can be accessed using a particular approach. Then in future tests, that access may be assumed so that the testers may start on the inside of the system instead of having them spend much of their limited test time repeating the previous test.
Test time available for full red teams is typically limited as there are few red teams certified by the Department of Defense (DoD) to do this type of testing. Additionally, most of those teams are focused on traditional-IT systems, so it is challenging for programs to get access to high-level red teams familiar with the specific techniques used to attack aircraft and weapon systems. This lack of availability may often drive programs to combine certified red team testing with testing performed by aviation cyber experts. For example, a certified red team might demonstrate the ability to access a system. However, the aviation-focused cyber experts might demonstrate the effects that can be generated within the aircraft.
Realistically but safely generating cyber effects within aviation systems is a significant concern. Not many pilots want to fly an aircraft recently full of cyber weapons that the testers are “almost certain” they removed. One approach to this issue is testing in a System Integration Lab (SIL), essentially the cyber-physical components of an aircraft set up in a ground lab. These components can come in different levels of fidelity, from just a few networked avionics buses to an almost complete set of aircraft systems. And because they cannot physically crash, SILs can be a good place for potentially destructive cyber testing.
Cyber ranges intended for exercising and testing cyber combat can be another way to accomplish this full-scale testing. In some cases, aviation-specific components can be attached to a larger cyber range or lab. There are numerous Government-run cyber ranges; some companies have also constructed capable labs that can be used.
Finally, the aircraft is typically the most accurate test article, either on the ground or in the air. Testing on an actual aircraft can, and has, been done even in flight, although care needs to be taken to work with safety authorities to ensure that the test is safe and will not result in potential future safety issues. For potentially catastrophic types of cyber attacks, one approach might be to leverage kinetic live fire testing and test the most dangerous cyber attacks against a system that will subsequently be shot at and destroyed physically during live fire testing.
REPORTING THE RESULTS
With all the analysis, modeling, and testing complete, the final step of measuring ACCS is to effectively report the results to decision-makers who will act. After all, without the ability to influence programmatic and operational decisions, all the work accomplished to this point will have no meaningful impact.
Unfortunately, senior leaders do not have the time (or, in many cases, the training) to be able to read numerous different test reports and integrate their results into a coherent overall picture. Thus, one proven approach to integrating a mass of data into a coherent picture is through structured assurance cases. These cases were developed and are widely used in the European aviation safety world, as well as by the National Aeronautics and Space Administration (NASA). An assurance case builds the argument that a particular claim is, or will be, met using a set format, such as the Goal Structuring Notation (GSN) shown in Figure 6.
The top-level goal in the assurance case is whatever mission we designed our system to accomplish. For example, if URAMS was used to do the risk assessment, the mission structure can be constructed using the results of the STPA-Sec analysis. However, an assurance case can also be used if another approach is accomplished, such as a mission thread analysis. At the bottom of the mission structure, the specific risk scenarios identified and tested can be located, and the actual evidence, such as specific test results, are placed beneath them.
To illustrate how an assurance case could be used without highlighting any potential issues in real systems, a purely notional unmanned aerial system (UAS) was designed to the conceptual level and run through the analysis process. The top-level results are shown in Figure 7.
The mission structure was developed using STPA-Sec, and 33 risk scenarios were developed and scored. Because the 33 risk scenarios overwhelmed the figure, they were grouped into 5 risk groups at the bottom of Figure 7. Each risk group has an Expected Mission Loss (EML) score, and the complete system has a total EML of 10.4%. Underneath the risk groups are the individual risks, their scoring, and the evidence (such as test results) that informs those scores.
An assurance case still has a tremendous amount of information that can appear overwhelming. However, the simple three-step process shown in Figure 8 can help enable decision-makers to step through it while only needing to concentrate on a single piece of the puzzle at a time.
In the first step, the decision-maker considers the overall mission structure to determine if all key mission elements are included correctly. Once the decision-maker is satisfied that the mission structure is correct, he/she can turn to each risk to consider the risk scoring and the evidence that supports that scoring. If a risk assessment system (such as URAMS) that enables the mathematically legitimate combination of risk scores was used, each risk can be considered separately. Finally, the risk scores are combined using the mission structure, and the decision-maker determines if the level of risk is acceptable. In the case of the aforementioned notional UAS, it is expected to lose about 10% of mission capability while under cyber attack. That loss of capability may be acceptable or not, depending of course on the mission and risk tolerance of the Warfighter.
CONCLUSIONS
As discussed herein, measuring ACCS is difficult because ACCS is not directly observable. However, it can be measured effectively using a similar approach to ACS through linked risk analysis, modeling, component testing, and selected full-scale live fire testing. This combined approach fits well within current engineering practices and provides the best opportunity to meaningfully measure and manage an aviation system’s vulnerability to cyber weapons.
While this approach can be successful, challenges still remain. The first is developing meaningful inputs to the risk assessment process. Even if the risk assessment process itself is perfect, if the inputs are not reflective of the real state of the system and its environment, the outputs will not be meaningful either and there will be a minimal amount of historical data with which to validate any proposed inputs.
The second issue is finding people with enough expertise in ACCS and related fields to be able to execute ACCS related risk assessment, simulation, and testing. These experts are extremely hard to find, and when they are found they tend to be very expensive, which contributes to the third challenge of cost. Programs are typically measured in terms of cost, schedule, and performance against requirements, and improving the ACCS of platforms tends to be expensive, increase schedule, and impact performance against other requirements.
Finally, there is a broad cultural issue across the DoD of many people not considering cyber weapons as real threats against platforms that require action beyond complying with a set of rules. As long as threat information remains highly classified and there are no major publicized cyber attacks on platforms, it will be hard to make much headway in changing the culture.
On the positive side, tremendous progress has been made in the last few years that gives hope for the future. While finding the right cyber-focused requirements is still difficult, many programs are trying hard to incorporate cyber survivability into their systems. Cyber risk assessments have become much more common, and cyber testing is being incorporated into many test plans. Future work that helps to connect these various cyber-related activities while grounding potential impact in the mission can help move things further forward. Education and training across the force can help more people to understand the importance of understanding cyber weapons and their potential effects on platforms, and senior leadership is placing more emphasis on these threats, which is providing needed funding. All of these factors improve the potential to successfully measure ACCS using the process outlined here, which will ultimately enable the DoD to manage our risk to this new category of weapons.
ABOUT THE AUTHOR
Dr. William “Data” Bryant is a cyberspace defense and risk leader who currently works for Modern Technology Solutions, Incorporated (MTSI). His diverse background in operations, planning, and strategy includes more than 25 years of service in the Air Force, where he was a fighter pilot, planner, and strategist. Dr. Bryant helped create Task Force Cyber Secure and also served as the Air Force Deputy Chief Information Security Officer while developing and successfully implementing numerous proposals and policies to improve the cyber defense of weapon systems. He holds multiple degrees in aeronautical engineering, space systems, military strategy, and organizational management. He has also authored numerous works on various aspects of defending cyber physical systems and cyberspace superiority, including International Conflict and Cyberspace Superiority: Theory and Practice [14].
References
[1] Bryant, William D., and Robert E. Ball. “Developing the Fundamentals of Aircraft Cyber Combat Survivability: Part 1.” Aircraft Survivability, spring 2020.
[2] Bryant, William D., and Robert E. Ball. “Developing the Fundamentals of Aircraft Cyber Combat Survivability: Part 2.” Aircraft Survivability, summer 2020.
[3] Bryant, William D., and Robert E. Ball. “Developing the Fundamentals of Aircraft Cyber Combat Survivability: Part 3.” Aircraft Survivability, fall 2020.
[4] Bryant, William D., and Robert E. Ball. “Developing the Fundamentals of Aircraft Cyber Combat Survivability: Part 4.” Aircraft Survivability, spring 2021.
[5] Bryant, William D. The Unified Risk Assessment and Measurement System (URAMS) for Weapon Systems and Platforms: Cutting the Gordian Knot. Version 2.0, www.mtsi-va.com/weapon-systems-cybersecurity/, Modern Technology Solutions Inc., 2022.
[6] Leveson, Nancy G., and John P. Thomas. STPA Handbook. http://psas.scripts.mit.edu/home/get_file.php?name=STPA_handbook.pdf, March 2018.
[7] Young, William. “Basic Introduction to STPA for Security (STPA-Sec).” Presented at the 2020 System-Theoretic Accident Model and Process (STAMP) Workshop, http://psas.scripts.mit.edu/home/wp-content/uploads/2020/07/STPA-Sec-Tutorial.pdf, 22 July 2020.
[8] Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus, and Giroux, 2011.
[9] Kahneman, Daniel, Paul Slovic, and Amos Tversky (editors). Judgement Under Uncertainty: Heuristics and Biases. Cambridge University Press, 1982.
[10] Gilovich, Thomas, Dale Griffin, and Daniel Kahneman (editors). Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge University Press, 2002.
[11] Johnson, Tom, John Haman, Heather Wojton, and Mark Couch. “Designs of Experiments (DOE) in Survivability.” Aircraft Survivability, summer 2019.
[12] Bryant, William D., and R. Lane Odom. “Integrating Test Into a Secure Systems Engineering.” ITEA Journal of Test and Evaluation, vol. 41, pp. 92-97, 2020.
[13] Ball, Robert E. The Fundamentals of Aircraft Combat Survivability Analysis and Design. Second edition, American Institute of Aeronautics and Astronautics, p. 172, 2003.
[14] Bryant, William D. International Conflict and Cyberspace Superiority: Theory and Practice. New York: Routledge, 2015.