This section covers the final Q&A part of Ben Hagen’s talk at 29th Chaos Communication Congress in Germany, getting further into election campaign security.
Question: It would be interesting to know what kind of technologies you were using for your web applications, like Python, Ruby, .NET, or something else?
Ben Hagen: We had a lot of different stuff we used; probably the three most common stacks were Ruby, Apache, Rails, obviously, on Apache. We did a lot of Python code with Flask, Python and Nginx. We also did a few applications in PHP, also built on Apache. Back-ends were generally Amazon-provided services, like RDS, which is essentially a hosted MySQL application.
Question: How much of your system was on the cloud? And why did you decide to choose Amazon web service instead of having a private cloud like OpenStack? And you talked about open source: which networking device were you using if you had a physical setup?
Ben Hagen: So, in terms of our footprint on the Internet and our web applications, it’s something like 99% of it was in the cloud – almost everything. We worked with Amazon, because they have the most mature offering with a lot of different services that you can use. So, if you need a database, they have a database; if you need queuing, they have queuing; if you need scaling, they have scaling. I think private clouds are really interesting, OpenStack is really interesting, but in terms of having the capacity to scale dramatically in a very short period of time, you really need to go with one of the bigger providers that has the infrastructure built out already. And if you’re relying on a private cloud or something, you have some sort of limitation at some point. You might have a data center that can go this far, but it can’t go even further. Amazon lets you basically go basically infinitely large, which is something we needed on occasion. In terms of hardware in the campaign, for the security stuff that we were setting up it was just kind of a stock server that had 4 CPUs in it with some raided storage. Nothing special, basically; just kind of base 1U servers.
Question: I assume that you had a pretty decent SIP infrastructure. I was wondering if you saw any interesting exploits that involved SIP specifically, where people were trying to exploit the phones?
Ben Hagen: We saw scanning, we saw SIP-focused scanning; we were actually using a lot of Microsoft’s Lync back-end for SIP, which is an interesting conglomeration of different services. Aside from scanning, I don’t think we saw anything particularly interesting from it.
Question: Did you see any impersonation attacks? You know, “My alternative Barack Obama site, give me your credit card numbers;” and how quickly were you able to take them down if they were hosted some place dodgy?
Ben Hagen: I think we certainly did see that, and I think the Romney campaign saw the same thing, and I think you have very limited options other than communicating with law enforcement or communicating with the hosting providers to have that kind of thing taken down. We kind of relied on the community in terms of security community, but also like reddit or any number of places where very quickly you can bubble that thing up, and monitoring those kind of sources for fake impersonation sites is a really great way to find it quickly. And I think the only options you have are to warn people or to approach law enforcement and have them take it down.
Question: What did your preparations for disaster recovery look like?
Ben Hagen: Amazon makes things interesting, and we had a number of potential disaster situations come up in the course of the campaign, where we had a hurricane coming down on one of the major Amazon data centers. Preparations for that generally involved replicating as much of the infrastructure as possible into a different availability zone, so getting as much as possible working in another data center, essentially. In terms of the campaign itself, we had kind of the typical corporate disaster recovery, with secure offsite storage of critical files and disaster recovery of individual laptops, with essential backup of information, imaging of computers, that kind of thing. So, nothing terribly sophisticated on the corporate side; I think the interesting disaster recovery stuff is with the cloud services.
Question: Could you elaborate a little bit more on how you detect those fraud calls where someone is using your system in order to make Romney calls? How did you detect that and how did you prevent that from happening?
Ben Hagen: Basically, we were looking at the velocity of potential calls coming from individual users, and then also the response they were making to calls. So, essentially, if you made a call, we ask that you record information like: did somebody answer the phone? Were they not at home? Did they call and were they a supporter, were they pro-Obama, pro-Romney? There are several possible answers to making a call. Basically, looking at the typical spread of answers and looking at the velocity with which people can make the calls and actually get legitimate answers gives you a lot of information regarding what is and is not a valid user. So, detecting fraudulent activity becomes pretty easy as long as you’re not using kind of sophisticated impersonation tactics. We assume they didn’t have the insight in terms of what the typical behavior would be like so that we could rely on that more.
Host: There is another question from the Internet, and it’s a follow-up to the previous one: I’d like to know how the procedures, how things were documented, and how you tested which impact the disasters might have.
Ben Hagen: Speaking of documentation, we would provide as detailed synopsis of how to actually implement disaster recovery situations as possible; basically, step by step was the goal. That obviously doesn’t always happen, but the goal was so that everybody in DevOps should be able to handle any kind of emergency because it’s been documented, DevOps being the development operations, the people that are actually deploying things to the Internet.
In terms of deciding the impact to different scenarios, we had things we called “game days”, where we set aside a weekend or a day of a weekend, and we forced developers and DevOps people to contend with random issues with their applications. For example, on a staging network or a replicating environment, we would say: “Your IDS MySQL instance is down. Your application needs to be able to recover from that. Let’s try it out.” And we’d kill the database connection, see how it responded and try to get as friendly a response out of that as possible, either failing over to another application, redirecting a user, or presenting kind of static information. So, going through that entire stack of possible problems and figuring out the most graceful way to fail was a big part of those game days and the big goal in preparation for things like Election Day.