Tillmann Werner delves into the details of peer-to-peer botnet architecture and describes protocols used in the Miner and different versions of ZeroAccess.Interestingly, for all botnets that you’ve seen on the previous list the architecture is not purely peer-to-peer. It’s hybrid architecture. That’s what you see here (see right-hand image). The thing at the bottom is the actual peer-to-peer network, and the dashed lines represent a peer being in the peerlist of another peer. But when they want to receive commands for sending out spam or something like that, they still reach out to central components. And the boxes you see in the middle are proxy servers, they usually have another layer in between so that when some of the proxy servers get taken down they can easily replace them without losing their command and control infrastructure.
And then, there’s the command and control server on top – that is the actual backend. There might actually be multiple layers between the peer-to-peer network and the C2, but unless you get access to one of the proxy servers you don’t see what’s behind it. We are fairly certain that in most cases these are proxy servers because, you know, for example when they speak HTTP and respond with Nginx, then you can be certain it’s a proxy, most likely at least.Okay, let’s take a look at some protocol examples so that you get an idea of what these people create and come up with. This (see left-hand image) is the already mentioned Miner bot, and as I’ve said, that was a really trivial and also stupid protocol. It was HTTP-based and all the bots implemented their own tiny HTTP server which was a very rudimentary one that was backed up by the file system. So, if you would issue a GET request with the ‘search’ parameter and the ‘ip_list_2’ value, that filename would be looked up in the respective directory and then delivered to the requesting host. If there were other files on the file system you could request them as well with this method, and that was probably not intended by them.
So, you can see the response here. In that case, I think the Nginx server header is fake, they just copied that from somewhere and sent it with the responses. And you can see at the bottom is the actual payload, a list of other peers, a list of IP addresses. Miner always responds with the entire peerlist that it has, all peers that it knows about. And that’s stupid because it can be huge. And also that makes it easy for us to enumerate the bots and understand how many infected machines there are if you want to attack it, for example. You can see it’s 11107 in size, and this is by far not the largest response we’ve seen.You can try to recreate this peer-to-peer graph (see right-hand image), because it’s basically a graph – it’s nodes who know about other nodes and talk to other nodes, and so on. You can try to recreate that graph by crawling peers, and we will talk more about crawling, that’s the topic of the presentation, right? If you request a peerlist from one peer you can recreate these links on the graph and then take the IP address from the response you got back and plot pretty pictures like this one here. I think that’s about 37000 nodes, which is only a subset of the Miner botnet at the time, but it takes ages to render this picture here, so we only did that for a subset of the nodes we found. You can see that other P2P protocols are somewhat similar. This is ZeroAccess version 1 (see left-hand image); there are two versions out there, this is the earlier version. Again, it’s proprietary protocol that they implemented. They define, I think, six different message types, and one is a ‘getL’, which means ‘get peerlist from another peer’, and the ‘retL’ is the ‘return peerlist’ message. This is what you get when you reverse-engineer the message form and decode it. It’s not plaintext; I think ZeroAccess version 1 had a 4-byte key that it hashed with MD5 and then used that MD5 hash as an RC4 key to decrypt its messages, but it’s always the same key, so it was basically symmetric encryption with a static key. And version 2 just used XOR with another key.
So, if you undo the encryption you end up with something like this and you can see here, in the case of ZeroAccess version 1 a peerlist has 256 entries, so it always returns up to 256 entries. But since the botnet is large enough, every peer always has more than 256 entries at any time. So, whenever you ask a peer for its peerlist you will most likely get these 256 entries.
As you can see, there’s some order there. The first number is a timestamp, or time delta, so to speak, because the botnet favors peers that have recently been active. And that makes sense because you don’t want to keep the peers in your peerlist that might be offline already, or they reboot from time to time, getting your IP address, so the entry becomes invalid. So you might want to favor peers that have recently become online or that you have recently talked to; that’s why they sort this peerlist by the time delta and then return the 256 most recent ones.They changed this protocol a little bit in ZeroAccess version 2 (see right-hand image). You can see there, again, the two message types. I’ve already mentioned that the encryption is slightly different but for the most part the protocol is very similar. So there’s ‘getL’ and ‘retL’; again, you have the timestamps and you have the IP address. But they figured that they don’t need to send back 256 IP addresses, that’s way too much. It’s sufficient if you respond with only 16 IP addresses. That makes the messages smaller, hence less overall communication in the botnet. ZeroAccess version 2 is really huge. We’ve crawled some of the botnets, and they count around 3.7 million infected machines – I think it was Conficker. And if you have 3.7 million machines talking to each other, that’s a lot of traffic, so you might want to reduce the message size. That’s what they did.
If you take a look at the IP addresses you might notice that the last octet looks a little bit strange, it’s always very high. And that is because they do some deduplication. You don’t want two or multiple entries with the same IP address in your peerlist, obviously, because if you allow that it’s trivial for other people to poison your peerlist and inject one entry multiple times, overwriting all the legitimate ones, and then you are not connected to the P2P botnet anymore. So, that’s why they do deduplication, and in order to do that they sort the IP addresses, then go over the sorted list and if they have two consecutive entries that have the same IP address – they kick one out. But because IP addresses, at least on PCs, are sorted in little-endian, you have as a result these IP addresses with the high last octet in the response.
What’s interesting is that they do that but they don’t filter out invalid IP addresses. So, when you crawl the botnet you come across IP addresses like 255.255.255.255, which obviously is an invalid IP address, it shows up on this list because when you sort the list in decreasing order, it’s the topmost entry and it’s always included. And they have some other garbage in there; for some reason they don’t filter out these entries, which is interesting.