The State of Incident Response by Bruce Schneier 4: OODA Loops in Cybersecurity

The concept of OODA loops, which originated in the U.S. Air Force, is being explained and extrapolated to digital incident response in this entry.

Alright, so, people, process, and technology. The key here is making it scale. I’m at the follow-on sentence from Lorrie Cranor, she wrote: “However, there are some tasks for which feasible, or cost effective, alternatives to humans are not available. …” She means we don’t have robots yet. “… In these cases, system designers should engineer their systems to support the humans in the loop, and maximize their chances of performing their security-critical functions successfully.” So in places where you can’t remove humans from the loop, you have to build technology to support humans in their critical tasks. Think of any emergency response system – think of police, think of fire, think of medical, think of military. That’s what technology does: technology supports the humans who are critical in the response system.

In IT security, in response, we need technology that aids people, and not the other way around. And the goal here is resilience. Very strongly, the goal here is to build resilient systems. We are not going to build impenetrable systems, and we shouldn’t build fragile systems. And a lot of the response strategies echo resilience: mitigation, survivability, recoverability, adaptability. These are all ways to achieve resilience. And because response is real-time, because response is the closest thing we have in IT to dogfighting, this is all about feedback loops.

OODA loop, in a nutshell

OODA loop, in a nutshell

And there is a really nice piece of systems theory coming from the U.S. Air Force that talks about this; actually it comes from dogfighting. It’s a notion of “OODA loops”. OODA loops are starting to be talked about in IT. I think there’s a danger we are going to overuse this concept, but I think it’s extraordinarily valuable and something we need to think about. OODA stands for “observe, orient, decide, act”, and it is a cycle. This was developed by Air Force military strategist John Boyd, and he developed it for thinking about dogfights, that a pilot in a dogfight is continuously going through this OODA loop in his mind – observe, orient, decide, and act. This is the process of collecting, evaluating, and then doing. And this type of process is widely applicable in any real-time adversarial situation. You’ll see articles that talk about not only airplane dogfights, but strategic military planning, business competition, anything else.

And it is, by definition, an iterative process. Someone in this kind of situation is continually going through OODA loops in their head. And what Boyd observed is that speed is essential here, that if you can make your OODA loop faster than the adversaries, if you can get (the phrase he uses) inside the other person’s OODA loop, then you have an enormous advantage. You can respond effectively faster than he can react to your response.

There’s some good writing about applying this to cybersecurity and incident response. There are papers – I recommend just googling the term and wandering around a bit. The reason I like this framework is it gives us a way of discussing effective tools for incident response. Really, what this talk is at this point is a plea for tools. We need good IR tools to facilitate all of these steps. And we can break them down.

First step is “observe” – knowing what’s happening on our networks in real time. That’s real-time threat detection from IDS’s, that’s log monitoring, log analysis tools, network performance analysis tools, network management tools, physical security information – pretty much everything. We need to be able to get all of that data in a place where it can be monitored in real time, both before and during an attack.

“Orient” – understanding what this information means in context. And context is critical in any response. So, in the context of the organization – what’s happening in the company at the time? In the context of the greater Internet community – what kind of malware is out there? What kind of zero-days are we just seeing? What kind of geopolitical situation is going on? Is there some new vulnerability that was just discovered or announced? Is the organization rolling out a new piece of software? Are they planning layoffs? Is there a merger? Has the organization seen attacks from this IP address before? Has the network been opened up to a partner? So, you are thinking of data feeds from the news, from intelligence feeds, from the rest of the organization, just ways to put what’s going on in context.

We need good IR tools to facilitate all of these steps.

Third is “decide” – just figuring out what to do in the moment. This is actually hard. Who has the power to make a decision? How do they make the decision? What sort of executive input is required? Is there marketing input? Is there PR input? Is there legal input? How do you justify the decision? Because after the fact you’ll have to hold off in front of some investigative body, either in your company or some lawsuit, to justify why you did what you did. That’s all part of the decision process.

And then “act” – being able to make changes quickly on the network. And again, here a lot of organizations fall down, because the people in the IR team might not be authorized to make changes all the way over there, they might not have the right authorities. And we won’t know what authorities they need until this all starts. So it’s going to require broad access, continual training. We need tools for all of these things. We need tools that are powerful, flexible, intuitive, tools that aid people. And we need a lot of them. This isn’t one thing. This is a whole ecosystem of incident response products and services that do this basket of things.

Incident response is getting more important. It’s getting more important for a lot of reasons. Attacks are getting more sophisticated. The regulatory environment is getting more complicated. Litigations are getting much more common. Geopolitical factors are major. And again, organizations underspend on prevention. The neat thing and the reason I’m really optimistic about this in the next few years is that IR software is not going to be like the rest of security software; that the requirements – none of that stuff I just listed – none of them are those non-functional “why” requirements. They are all stuff that products and services have to do. And this means the good stuff is going to beat out the mediocre stuff.

Co3 Systems' vision of end-to-end incident response

Co3 Systems’ vision of end-to-end incident response

And we as engineers need to start building the good stuff, because it’s important. I’ve started doing this. I have a company, Co3 Systems, and I’m trying to build a management platform to coordinate incident response. That’s just one piece of it. I think it’s an important one and cool one, but a lot of things have to feed into it, it has to feed into a lot of things – it all has to work together. The goal here is to bring people, process, and technology together in a way that hasn’t been done before, in a way that is going to mirror less IT and more things like generic crisis management. We have a lot to learn from other disciplines that have been doing this sort of thing for decades. And this how we’re going to defend against threats, this is what’s going to work.

Read previous: The State of Incident Response by Bruce Schneier 3: Effects of the Prospect Theory

Read next: The State of Incident Response by Bruce Schneier 5: Questions and Answers

Like This Article? Let Others Know!
Related Articles:

Leave a comment:

Your email address will not be published. Required fields are marked *

Comment via Facebook: