Dwelling on the ways to ensure accurate botnet evaluation, Tillmann Werner focuses on distinguishing peers and introduces the especially tailored Prowler tool.What you see here is analysis of the convergence for the P2P botnets we crawled (see right-hand image). On the left-hand side, you see a curve similar to the one on the previous slide, reflecting the actual number of the machines that we identified. The upper curve is ZeroAccess which, as I mentioned, is pretty large and gets way more hits. And the one at the bottom is for a botnet called Sality, which I haven’t looked into myself but one of my friends has and he has provided these numbers. Depending on the size of the botnet, the scale is different, but the shape is more or less the same: you can see that all of them kind of converge. When it’s a straight line, then you know you are pretty much done. You can also take a look at the population increase, that’s what’s displayed on the right-hand side, which basically correlates with the other graphs. So, how do you distinguish peers? I already talked about that: you have unique IP addresses vs. unique IDs, in the case where you have IDs. In the case where you haven’t, you can still derive some conclusions from other cases where IDs are available. I’m cheating a little bit here because these graphs are not generated by crawling; this is the Kelihos.C botnet (see left-hand image), the last version that was attacked. These numbers are not generated by crawling the botnet, but in this case we did node injection, so we propagated a special peerlist entry in the P2P network, it came very prominent, and then all the other peers reached out to that machine. By this you even get the ones that are not directly reachable, because at some point the entries propagate through NAT and through gateways, and so on.
So, this gives you way more accurate numbers, and that allows us to compare the IP address count with the ID count. Green here is the total number of bots, total number of unique IDs; and blue is the total number of unique IP addresses, and you can see that this goes up even though we have seen almost all unique IDs, so the slope is much slower for the green line. The ratio between the two after, say, 24 hours or 48 hours is almost the same for all botnets we’ve taken a look at.
So you can see after 24 hours, that’s where the two lines cross. Even if you do not have unique IDs you can say: “I take a look at the IP addresses I collected for 24 hours, and that gives me probably pretty accurate numbers.”I already mentioned speed. Speed is important, you want to be as fast as possible, but being fast is not easy (see right-hand image). I mean, if the protocol is UDP-based it’s a little bit easier because you don’t have to worry about session establishment, timeouts, etc. Most of these botnets use UDP for a reason, it’s way simpler. Usually people have either two threats – one that sends out messages and one that consumes incoming messages – many bots work that way, actually most of the UDP ones we have seen. If you do that you have to worry about synchronization, so you have to have a peerlist that you lock when you want to send out stuff, or when you receive data you also probably want to lock the peerlist. So you have to synchronize the two.
When you are talking TCP, it’s a little bit more difficult. You have to establish TCP connections and you have to worry about timeouts because you don’t want to get DoS’ed, right? If you don’t worry about all these things and you crawl the network, they might create half-open connections and not respond to you at all or keep established connections open forever; and then you’re running out of file descriptors and your crawling doesn’t work anymore. So you probably want to have a limited set of file descriptors, or sessions that you are able to handle. What we do, what the code does is it allocates a fixed number of slots for sessions, and that’s the amount of simultaneous sessions the code can handle. And when it wants to contact a new peer, it takes a next free slot from that array. By that you make sure that your crawler doesn’t get DoS’ed.Another thing is if you talk to a peer, then you can definitely say that it’s live, that it exists, and the question is how long you want to keep it in your peerlist flagged as active (see left-hand image). As I said previously, you want to distinguish between IP addresses, or peers, that you have encountered, and the ones that you can actually talk to, that are live. If you talk to a peer that’s live, for how long do you want to consider it live? That’s another thing. I mean, do you want to consider it live for 24 hours or only three minutes; or do you want to periodically re-contact it and if it doesn’t respond anymore, then you say it’s not live anymore? So, these are parameters that are really important. Might not sound like that, but they are really important and you might want to tune them for the specific botnet that you’re crawling to get accurate numbers.
Especially when you’re talking UDP – you can send out lots of UDP packets per time, and if you fill up your own line with UDP packets, you will have packet loss sometime and then you get funny results: either get a bigger line, bigger bandwidth, or slow down a little bit. So you want to have a parameter that allows you to slow down the whole crawling process.So, Prowler is the name of the tool that was recently released (see right-hand image). As I said, it just implements the crawling framework, so to speak, and you have to add the protocol implementation yourself. It provides you with some stub functions that get called, and that’s where you have to implement the protocol. So, if you want to check it out, please do. It’s only TCP for now. You can see what it looks like at the bottom of the slide; you can even see that it distinguishes between known peers and active peers. If you take a look at the last two lines, you can see that the number of active peers goes down from 719 to 717, and that is because after some time some peers don’t respond anymore, so they are not considered active anymore and get flagged as inactive.
In that case, we were crawling Kelihos.C and that was in February. The peerlist I started off with only contained two entries, you see that on the right-hand side. And Kelihos always shares 250 entries, and that is why, if you take a look at the first line – it immediately goes to 250 known peers. It contacts one peer, it learns 250 entries, so it knows 250 other ones immediately, and then it continues from there. But if you take a look at the two graphs, again, the green line is active peers that you can talk to, and the red line is peers that I have seen in peerlists. You can see that the green line gets constant very quickly, so it converges really fast. And that is because Kelihos also favors more recent peers, so they have this backbone of what they call ‘router nodes’, and there’s never more than in the range of 700. That’s why we’ll never be able to talk to more than around 700 peers at a time.
You can also see these sharp rises, or steps, on the red curve, and that is because if new peers come online they propagate in the peer-to-peer network, become active at some point, and then you get these steps – when a new peer comes online it immediately propagates to all peers that are online, and that’s what causes this effect.
I’m almost done here. This is the repository where you can check out the code (see URL above). As I’ve said, I will hopefully add a UDP version soon.I would also like to talk about the alternative that I’ve already touched upon briefly, which is node injection (see left-hand image). By crawling, you will never be able to reach the peers that are behind gateways, network address translation, and so on. So you can actively participate in a peer-to-peer network and propagate your own IP addresses, and then at some point, depending on the popularity of your node, the other peers will reach out to you and say: “Take me down”, or “Send me commands”. That’s actually a comparison here between tracking based on sensor injection and crawling (see right-hand image). Again, this is P2P Zeus, we have unique IDs and IP addresses – of course the number of IP addresses is much higher. The top two lines are what we achieved through sensor injection, and the other lines are what we achieved through crawling. The bottom lines are the active IP addresses, or the active peers, that we can talk to, so you see it’s much less than the peers that show up in the peerlist.
Okay, that’s basically it for my presentation. Thank you!