Content:
Andrew Morris, formerly an Intrepidus Group employee and currently researcher at Endgame, proves at ShmooCon that threat intel doesn’t have to be expensive.
Andrew Morris: So, this is “No Budget Threat Intelligence – Tracking Malware Campaigns on the Cheap”. First of all, thank you guys all for being here at 10:00 am after the ShmooCon party, when you’re all really hungover. Hopefully, some of you guys are still drunk, because it’s going to make this talk a lot more interesting. Actually, I wanted to give a quick shout-out to people who almost got in a fight last night when we were standing next to the ash tray, and one of them was like “Dude, AT&T syntax is so much better.” I was like “What?!”
Alright, my name is Andrew Morris. I work at Intrepidus Group, which is part of NCC Group. My background is actually in offense, and this is more of a defense-oriented talk. I don’t really have much incident response or operations defense experience, so if I say anything that’s stupid or if there’s anything that is not completely accurate or anything like that, feel free to shoot me an email or whatever. This is my information down here (see right-hand image) if you guys want to follow me on any various social media. So, we are going to go through the background: a little bit of info on threat intelligence, why you should care, a little bit of previous work that I’ve done in the same topic. We are going to talk about the infrastructure – setting up your no budget threat intelligence infrastructure. We are going to just quickly kind of breeze over that, because I’ve actually done another presentation that focuses on that a lot more, which I’ll talk about in a little bit. Discovery and investigation – we are going to look at analyzing sensor data, honeypot data, securing malware samples and doing a little bit of reverse engineering to look at the capabilities and look at some of the stuff that malware is talking to and things like that.And then, we are going to talk about automation. We are going to talk about this thing called the “Animus”, which is something that I have been building for a little while. We are going to talk about publishing automated reports, automating mass scanning and looking for adversary infrastructure, and publishing signatures. And then, defensive thoughts – we are going to talk about hardening machines, leveraging the data (how you can use the data that you collect doing the stuff that we are going to talk about today), implementing firewall rules, sharing IOCs and stuff like that. And we are going to talk about roadmap for the future, some of the stuff I want to look at doing the work in this space.
Background
Let’s start off with the background (see right-hand image). We are going to have a quick threat intelligence primer; set up cheat honeypots; examine attacks being executed on the open Internet; manage and aggregate data; locate malware artifacts; emulate malware traffic; track DDoS targets; automate C2 discovery; and we are going to report some data. So, threat intelligence (see left-hand image). What is threat intelligence? If you just break down the word, “threat” refers to bad guys, and “intelligence” refers to predicting the future, so “threat intelligence” refers to studying bad guys to predict what they are going to do, usually to defend yourself, but not always. Conventional threat intelligence – there’s a lot of people who actually do this and aren’t just random assholes giving presentations about it. They kind of study bad guys to develop IOCs, which are “indicators of compromise”. IOCs can take a bunch of different forms. Basically, an IOC can go anywhere from the MD5 of a file that it known bad, it’s an indicator of compromise; or it could be a URL, it could be an IP address, it could be a domain name, it could be a registry key – a ton of different forms that an IOC can take. A lot of these threat intelligence vendors, people who provide this stuff, are going to deploy agents on endpoints of their customer network (see right-hand image). It’s kind of like A/V – they are going to have sensors that are going to sit on desktops that are going to, basically, flag on anomalous behavior like seeing an indicator or something like that. Once is bad, right? You got one flag, one thing that shoots off – okay, that’s bad. It’s not that bad. But if you get two or more across the enterprise, that’s where they are going to say “Oh, this is really bad, it might be an APT,” whatever. And that’s how it usually functions. Threat intelligence vendors do a lot more than that also. I mean, they do a lot of big write-ups and reports talking about tactics and procedures and all this other stuff.But why? What’s the difference between that? Antivirus, as I would say, is so 2005. It’s dead. Everyone keeps saying that. Antivirus is dead. It’s checking some file, and you can just change one byte in that, and it’s going to change the checksum completely. Modifying binaries is really easy for bad guys to do, so A/V is super-dead.
Threat intelligence is so 2015. It’s really hard to change infrastructure, and it’s really-really hard to change your tactics as a bad guy. I mean, if the products and if the people are actually looking at the tactics that you are using as a bad guy, you can’t just change the way you operate in order to get around defenders anymore, or at least you are forcing people to.
Today the bad guys, the actual adversaries that I’m going to talk about, are bad guys that target the open Internet (see left-hand image). They just target everything that they can possibly see. I don’t do incident response; this was kind of the network that I had to work with. The bad guys we are going to be talking about are not terribly smart. They are not advanced. There’s a lot of them, and they compromise a lot of machines. It turns out you don’t actually have to be super-advanced to compromise lots of machines on the Internet. And they use really-really lame stuff like SSH default creds, open JMX consoles, shellshock, MS08_067 facing the Internet, whatever. We are not talking about these people (see leftmost part of the image). We are talking about these people (right-hand part of the image). It’s a spray and pray. It’s like Modern Warfare 2. It’s the numbers game trying to pop as many boxes as humanly possible, missing 99.9% of the time.Infrastructure (TL;DR)
So, again, the tl;dr of infrastructure – we are going to talk about honeypots briefly. Raise your hand if you know what a honeypot is. Okay, almost everybody’s hand just went up. That’s good. I was giving a very similar talk to this elsewhere and I was like “Okay, raise your hand if you know what a honeypot is,” and literally nobody in the room raised their hand. I was like “Oh god, this talk is going to suck so bad for all of you guys, I’m so sorry…” So, in case you aren’t aware, a honeypot is a machine or a service that serves no business purpose whatsoever. Its only job is to attract the attention of bad guys.
In terms of infrastructure, like I said, we are going to kind of breeze over this a little bit (see right-hand image), because I have another presentation that talks about this a lot more in-depth. Basically, I set up lots of cheap honeypots that are sitting directly on the Internet. I use a lot of Kippo, and I’m going to talk about that in a little bit; some Dionaea, which is another type of honeypot; and a bunch of empty Apache servers which actually, believe it or not, are really cool because they are going to collect logs on people that are just blasting the Internet looking for the presence of a file, or looking for shellshock, or looking for a bunch of stuff like that. We are going to talk about centrally managing and aggregating the data. Something that I use is called MHN, which is Modern Honey Network. It’s developed by ThreatStream, it’s awesome!And then, I’m going to talk about some stupid-cheap hosting. Cloud at Cost – if you guys aren’t familiar with it – is an awesome company that you can rent just stupid-cheap VPS’s with. You can get them for $1 a month, or you can get them for $35 one-time fee and you have them forever. Can confirm, have had Cloud at Cost VPS for, like, two years that I paid $35 for two years ago, so now I have, like, 30 of them. It’s awesome! And then there’s also AWS Free Tier, which you can get four or five of them and you pay nothing. You get cheap, low-performance VPS’s that sit on the Internet that you can do whatever you want with. So, if you want to set up 10 sensors for $40 a year, you can get five Amazon Free Tier boxes, four or five Cloud at Cost boxes, and you’ve just paid $40 for one year of having 10 servers that sit directly on the Internet. They might not have the best performance in the world, because they are Free Tier and they are Cloud at Cost, but honeypots don’t really necessarily take up a lot. They’re not super CPU-intensive or anything, and they’re not really going to be doing anything crazy.
So, I’ll talk about Kippo a little bit (see left-hand image). I use lots of Kippo. Kippo is a medium-interaction SSH honeypot. “Medium-interaction” basically means it sits between low and high interaction. A low-interaction honeypot can be classified as something that just sits there and maybe looks at things that are port-scanning it, or anything like that. It doesn’t actually emulate the service. Or maybe it even gives something like a banner. Kippo is a great example of a medium-interaction honeypot because it emulates the service itself, what SSH client is used to looking at. So, it’s kind of tough to tell whether this is a honeypot or not when you’re just looking at it, because you’re actually interacting with it, you’re typing commands, you’re authenticating, all that stuff. And then, a high-interaction honeypot will be something like it is the code, it’s not emulating, and maybe it’s just ephemeral, maybe it goes away once the bad guy logs out or something like that.Kippo logs bad guys’ terminal sessions for playback, which is really cool. So, when a bad guy logs into your Kippo session or logs into your Kippo instance trying to do bad stuff or whatever, it actually records the TTY log, and you can play it back. And sometimes it’s hilarious, because sometimes the bad guys have no idea what they are doing. You’ll see them, like, type a command and then the command will fail, and you’ll see them pause, and then you’ll see them type another command – backspace it and type another command. You’re like, alright man, come on, get it together.
You can configure what credentials you want it to allow. By default there’s, like, one password that allows that will let bad guys in. You can make it allow any password. I actually haven’t done that. That’s hilarious, I should do that. You can set up a list of, you know, five passwords that you want. And sometimes you do have to be careful with the passwords that you want it to accept, because sometimes bad guys will actually not use really-really-really easy passwords, believe it or not. Or if there are more than one passwords that are accepted on a box – the bad guys will know that it’s a honeypot and they won’t execute any activity on it, which is weird, but that’s what they’ve started doing recently.
The things that Kippo logs are the username that somebody is trying to authenticate with, the password, the source IP address it’s coming from, and the SSH library version. And then, once attackers actually get into Kippo, it has a fake ‘wget’ command that actually hooks an HttpGet request or whatever. So, if a bad guy logs in and tries something like ‘wget’ a piece of malware – even though Kippo is a honeypot, it’s not actually real – it will still reach out and grab that malware sample and pull it in so that you can analyze it later. Unrelated – I actually wrote a Metasploit module which identifies Kippo instances externally. I don’t think this has been fixed yet, so you can still identify, which is funny because that’s something that bad guys should do, but they don’t. It’s probably going to be fixed pretty soon.
I did some no budget illustrations in MS Paint (see right-hand image), because I don’t have Photoshop and I kind of wanted to stick with the theme. This is your machine talking to a honeypot. This is you managing a honeypot, and a bad guy is going to be attacking the honeypot. This is really entry-level and blown out of budget. You got one honeypot, you got your box, you log in, you look at attacks that people are doing, and that’s just kind of how it looks. And MHN comes in (see left-hand image), which is really handy for a number of different things. Yeah, it should say “Modern” instead of “Managed”, I’m sorry. It’s developed by ThreatStream. The developer is awesome for answering all my dumb ass questions. It’s “open source-ish”. You can still download and implement it, but they have a pay model. It allows you to deploy honeypots really easily. It’s got deployment scripts that you can just paste into your box when you are configuring them. It sets everything up. It’s super-easy, and it will configure it in five or ten minutes. It aggregates the data for you. It’s got the Mnemosyne database that sits on top of MongoDB, so if you have ten honeypots that are all configured to use hpfeeds and talk to an MHN instance, then it will aggregate everything together and you can query it kind of centrally, which is awesome. The API is awesome for it. It’s not documented currently, so you have to literally read through the Python code or, in my case, just email the developer until he emails you back, which he does. He’s getting sick of it, but he does. And yes, Mnemosyne is awesome. It’s something that somebody else wrote that, basically, sits in between; it expects hpfeeds data and it writes it to MongoDB. It’s really cool. It looks like this (see right-hand image). I took this about an hour ago. Usually those little question marks there are actually the country’s flag of where the IP address is originating from. In this case, I’m getting hit a lot by this group in Hong Kong. For some reason the geo data doesn’t report that it’s in Hong Kong, but it is in Hong Kong. These people are crazy. They’ll hit you with 200,000 attempts per day. It’s actually nuts. A couple of gotchas about MHN (see left-hand image). I really recommend that you update the deploy scripts so that you have more stuff, like include your own SSH public key or update the hostname, have it install packages – you can update all this stuff in the deployment scripts, which I didn’t realize for a long time. MHN pulls ThreatStream forks of popular GitHub repos by default, so if you want to use MHN to deploy a Kippo instance onto a box, then it will pull from the ThreatStream fork of Kippo instead. You can consider forking your own repos. I have actually just done this recently, so I can update Kippo and I can update the version that it pulls and I can add my own stuff and I don’t have to worry about doing it after the fact.And then, also some other things. I try to make a habit of maintaining a safe list, like a whitelist. If I’m testing my honeypots or whatever and I don’t want my benign data to contaminate otherwise 100% attacker data, it’s a good idea to maintain a safe list of the IP addresses that you’re coming from so that you can ‘grep -v’ that later or suck that out of the database.
So, after implementing MHN and some other stuff, my no budget threat intel stuff is kind of more blown up to look a little bit more like this (see right-hand image). We’ve got more no budget architecture diagrams right now. Didn’t use Visio because I got no budget. I used MS Paint. We’ve got untrusted honeypots that are sitting out there on the Internet. They are all centrally talking back to a semi-trusted MHN instance. It’s semi-trusted in that I still don’t have anything that’s on there. You can think of it like a DMZ almost. And then, behind that I have my trusted machine which connects to that. And I do actually trust the machine at the bottom; it’s configured with a password that I actually use, and things like that. Then my machine connects to the trusted machine. Everything sucks logs ever so. You can assume that if a honeypot just blows up or gets completely compromised – it doesn’t matter. You don’t trust any data on it, you’re not losing anything and so on and so forth. If this kind of stuff is interesting to you, just the infrastructure itself, see the “Ballin on a Budget” talk that I did at BSides Charleston.Discovery & Investigation
We are now going to talk about discovery and investigation. Bad guys are still using Shellshock to propagate pretty heavily on the Internet. You are still going to see a good bit of that. It’s still working, there’s still a bunch of stuff that’s unpatched for that, believe it or not. If you want to start no budget threat intel tracking bad guys that are propagating with Shellshock, just look at all of your Apache logs and ‘grep’ for the standard Shellshock characters that you are going to see in Shellshock requests, the standard things like that, which is right here (see right-hand image). I discovered a couple of groups that are still propagating with Shellshock, with a lot of boxes that they are using: one group in Russia, one group in the Netherlands. But mostly, the stuff that I look at is SSH, because it’s super-common on the Internet. There’s a lot of SSH that’s facing the Internet. There’s a lot of SSH that’s configured poorly, a lot of really bad credentials that are being used. So it’s a number one kind of trace for bad guys to use. Bad guys try lots and lots of passwords on the Internet. There’s a group in Hong Kong – I was actually just talking about them earlier – that I’ve seen over 100,000 authentication attempts per box per day from them. They’ll literally just sit there and just try to authenticate with everything. And this is the range that they are coming from (see left-hand image).If you ever feel like checking your SSH logs or anything like that, I guarantee you 100% you’re going to have authentication attempts from them. And the thing is, you’ll look at the passwords that they try, and they do “password 1” and then “Password 1”. They do that stuff, but they actually try some really-really advanced crazy passwords that I’m pretty sure have come from password dumps from elsewhere, or they just are actually banking on doing brute-force attacks that are real, actual brute-force attacks. They are just going to keep on doing it forever.
Usually the stuff behind the SSH people is just automated scripts. You don’t really see actual operators too often log into Kippo instances. It’s usually something to log in, run an automated “uname –a”, wget a piece of malware based on the output of that, and then it will execute it or whatever. But you do still get actual operators because, obviously, that’s not going to work all the time, and so sometimes you’ll see an actual person, a human being, that logs in and actually checks “Oh, what’s going on here?”
Again, a lot of really cool SSH data – these are the passwords that people actually have been trying (see right-hand image). I’ve seen around 24,000 instances of people trying “root” as the password and so on and so forth. There’s a bunch of SSH library versions that people use as well (see left-hand image). The most common is SSH-2.0-PuTTY. That’s kind of a weird statistic here just because that is actually just what the Hong Kong group that I’ve been talking about uses. It’s not actual PuTTY, it’s just whatever they configured the brute-forcer that they wrote to use as the banner. You can think of it kind of like a user agent. A funny thing about this is you’ll see the names of hacking tools in the library versions. It will be, like, SSH-2.0_Medusa. Well, that’s definitely not a regular remote administration tool. A couple of SSH gotchas (see right-hand image). Bad guys love using SFTP, and Kippo doesn’t include SFTP by default. So, if they try to negotiate an SFTP session or whatever, it’s going to fail by default. But some guy who’s a lot smarter than me wrote an SFTP patch. You can incorporate that into your honeypots, and you will get so much more malware when you do that. A lot of people log in to do ‘wget’, but bad guys are going to want to just do it in line with SFTP. So I actually forked over a version of Kippo. I added an SFTP patch, an option to disable this weird fake jail that Kippo does, which I hate. I added some more default creds and I got rid of the port 80 ‘wget’ limitation that Kippo has by default. And the reason the developer put a limitation on Kippo was because he didn’t want people using Kippo instances as port scanners, but I don’t care, I’d rather get more malware. You are going to see a lot of this (see right-hand image) when you start doing this. You are going to see a ton of these HFS web servers when you start looking at attacks like this. They are all in Chinese, of course. And you are going to see the filename, the size of the file in here, the date uploaded, and the amount of downloads, which is important because that can let you track how big a botnet may be. If you see that something was uploaded three days ago and you see that it’s got 9,000 downloads, then you can usually say, okay, these people probably have 8,000-9,000 bots sitting on their thing – just from this as the source. And you are going to see a ton of these (see left-hand image). I mean, they are everywhere. This one version in particular – bad guys just love this stuff. I don’t know why. It literally does the same thing as Apache. It has directory listing enabled by default, so if there’s one sample or ten samples, you can get all of them just as a result of getting access or seeing the path of one. So, some no budget tactics for this kind of stuff (see right-hand image). You can Google dork for these web servers. Google is weird about indexing things that aren’t on port 80, though, so that’s a little bit difficult. Intext:“httpfileserver” – you can look for that. If you feel like grotesquely violating the Computer Fraud and Abuse Act, HFS is vulnerable to a really bad RCE bug, and no one ever uses the updated version. So, if you do feel like getting criminal and executing code on their boxes or whatever – you can, there’s an exploit for it, it works. Reversing these samples (see right-hand image) is a talk in and of itself. Actually, it’s reverse engineering the malware samples that you find. And I’m not the best reverse engineer, so don’t listen to anything that I say about it. If you are also bad at reversing, check out malwr.com or virustotal.com, because they are two malware sandboxes. Malwr.com is awesome because they don’t share anything unless you let them. Virustotal.com has a cooler, prettier engine, but it will share your stuff; it’s owned by Google, so you can put two and two together. So, I was getting hit a lot by one particular IP address (see left-hand image). It was hitting my sensors a lot using a lot of different passwords, and whenever they would get in they would try to pull a lot of malware. They were running a pretty big campaign. They guessed one of my good passwords that I had configured, and they logged in and they ‘wgot’ a malware sample, so to speak. The same web server that they were grabbing it from had directory traversal. Obviously, they were grabbing an ELF binary, because it was a Linux box, but I noticed that that same web server had directory listing enabled, so I grabbed all of the malware samples that were on there. There were a couple of Windows samples. And so, I ran it and I was doing some manual reversing, and I found that it was passing a ton of IP addresses in ASCII, which was weird, over this custom binary protocol over port 36000 (see right-hand image). I was executing malware on the machine that I was analyzing it on, it was reaching out and trying to talk to the C2, and the C2 was just passing this weird binary protocol back. It wasn’t IRC, it wasn’t HTTP, it wasn’t anything like that, and in it was a bunch of IP addresses. And those IP addresses were DDoS targets (see left-hand image). So, you know, that’s pretty standard: it’s reaching out and the server is just giving it back “Hey, here’s all the DDoS targets, we want you to DDoS this box, we want you to DDoS this box.” And it was sending that out to all of their bots. The C2 was architected to pass this to everyone. If I wrote malware, that’s not how I would do it, but I guess it’s a good thing I don’t write malware. The bots receive the IP addresses and they start spraying traffic at them. If you reverse the malware samples, you are going to actually find the function names, like SYN flood or UDP flood or whatever.At this point, I’m trying to reverse a bunch more stuff. If you have ever seen #MalwareMustDie on Twitter, which is either a person or it’s a group of people – all I know is that he is insane. I hit him up and I’m like “Hey man, can you help me reverse some of these malware samples?” He’s like “Yes! Send me all of them!” And I was like “Ok, dude…” And so, he helped me reverse a bunch of these malware samples, and I was like “Dude, this is awesome, can I donate, like, 20 bucks to you guys via PayPal or something?” He’s like “No! We don’t do this for money – we do it because we hate malware!” I thought, oh my god, I wish I loved anything as much as you hate malware, man.
So, in a bunch of materials that he was publishing I saw this (see right-hand image). He published a video of all the crazy stuff that he does, and this was a video of him recording a screen sharing session of one Chinese operator training another Chinese operator how to use a product that he had developed and sold him, which generates ELF binaries and has the C2 package and all that stuff. I don’t know how about you guys, but I don’t speak Chinese. But I did notice in there, a little bit closer, in the bottom right, the port number is 36000 by default (see left-hand image). And I was like “Huh, I bet that’s the same family of malware as this Windows binary that I found, and I’ve actually been seeing other things that have been speaking in the same protocol or whatever.” By the way, if you already know what I’m talking about and you know where this is going, if you ever copy this software, I would love for you to send it to me. I’ve been trying to find it everywhere, but I don’t want to buy it from the Chinese dude because I’m not trying to be on watchlist and stuff. So, I realized that this C2 was one of lots of different C2s. I fingerprinted the C2 network service and I wrote a scanner for it (see right-hand image). It’s on my GitHub page. I also wrote an NSE script for it, but I don’t actually know Lua. I mean, it works and it gets the job done if you want to use it. I also stared at Wireshark for what felt like an eternity, and I basically built a scanner that logs into these C2 servers and it reports back to me all of the IP addresses that it’s currently targeting. It turns out it’s actually really hard to write a client for a server that you don’t control (see left-hand image). I’m trying to write a client for this malware C2 thing, and the server is going up and down and it’s using a protocol I don’t understand. I had to cycle through a couple of different C2s to actually write the client out. It’s like trying to learn Spanish when you got two Spanish dudes in the room with you, but you don’t know Spanish and they are just talking in Spanish to each other, and then they keep walking out and you have to go and find more Spanish dudes. That’s basically what it’s like. But yeah, I wrote a scanner, I was going to demo it, but all the C2s that I’m looking at right now are down, so I couldn’t. But this (see right-hand image) is a screenshot of what the scanner looks like. It logs in and it pulls all of the DDoS targets. I guess the reason why this is important is because it’s really cool, as an outsider or as someone running your no budget threat intel – organization or company, whatever – you can actually see who these bad guys are targeting, which can be cool for a number of different reasons. You can see who they are sending the DDoS attacks to. It can help you identify who they are. It can help you make the world a better place. You can warn them, you can do a lot of stuff using this information. And there’s a thousand ways that they could do it better so that you couldn’t, but they don’t, thank god.Threat Reporting Automation
So, now I’m going to talk about automating a lot of the stuff that I have been talking about so far. There’s this thing called the Animus (see right-hand image), and it’s kind of an automated threat reporting system. I’m building this thing, and it’s basically taking the sources of all my data, it’s aggregating it in a certain way, and it’s publishing out for everyone to look at for free. This is the GitHub page, I literally just put this up yesterday. I’ve got a lot of data. It was on a development branch, or whatever, on my GitHub, and so I changed it to an actual organization GitHub page as opposed to mine. You can find it at github.com/animus-project. Currently, I’m only publishing SSH threat reports, because that’s the only one that I’m doing really well so far. I’m building it out to try to do a bunch of other stuff, which I’ll talk about in a little bit.Currently, it only includes the following information: you’ve got the attacker IP addresses, which I’ve got a shitload of; credentials that are being attempted, which is actually pretty cool if you work in offense. I’ve got some pretty sweet wordlists that you can use, I mean, they are great for password cracking. They are awesome because they are tried and true by these bad guys – they are using these passwords for a reason. They are using these passwords because they work. So, if you are in offense as well, take the wordlists from the passwords that I’ve looked at, take the user lists and use those for what you are doing, maybe. And the SSH library versions that are used – that’s not really useful from a defense perspective, but it’s kind of cool data.
The Animus threat reports that I’m building look like this (see left-hand image). This is the daily report generated by the Animus system on January 17, 2015. We had about 250,000 attacks. These are the top ten attacker IP addresses that we saw during the day. And then further down it’s going to list the passwords being used. I’ve got data going back to October now (see right-hand image). At this point, I’ve got about 5500 unique attacker IP addresses that I’ve collected since October. And the number has been actually increasing pretty fast, both between attackers discovering my infrastructure and attacking it more, and them scaling up their attacks. I have seen stuff increase in the last couple of months. So, if you want to go back and look at the historical data that I was seeing – how it was different in October to how it is now – you can use these stats. I’ve been adding more infrastructure, because when you start doing this stuff it’s so addicting. Unrelated fun fact about GitHub – I didn’t know this, this has nothing to do with anything, but I just learned that GitHub trusts your client’s clock (see left-hand image), so when you are checking something in the GitHub you can commit changes that happened “in the past” by changing your clock. I didn’t know that, but that was really cool. I was trying to figure out how to get my reporting engine to go through and publish my reports for data that happened yesterday and the day before and the day before. So I wrote a for-loop that, basically, changed my system’s clock by one day backwards, and then it ‘grep’ed’ the logs for that date, and then it published it. GitHub has this little blox of how often you can make code, so I was expecting to see one really-really dark-green block from a hundred commits or whatever, but it actually went back and committed these all throughout October. And I’m like, oh, the more you know. Anyway, it’s a fun fact. The Animus system that I wrote is constantly mass-scanning the Internet to locate these Chuilang C2s that I was talking about before (see right-hand image). Once a C2 has been located, it will connect to it and start logging the DDoS targets that it’s looking at. So, as they put their infrastructure up, it’s going to find it and it’s going to connect to it. There’s a really easy way to get around that, I’m not even going to say it because I’m afraid they are going to find it, but it has to do with default port numbers. I published an alpha NSE script for looking at Chuilang C2s, so if you have some boxes that you are already looking at, you can incorporate this into your Nmap and you can look at stuff like this. I built this thing called Threatbot (see left-hand image). He, or it, has a GitHub page. I’m kind of attached, so I call it “he”, don’t judge me. You can tweet to @threatbot on Twitter with one or more IP addresses, and he’ll tweet back to you if that IP address has ever conducted any attacks that I’ve seen. Well, he’ll tweet back to you no matter what. It will tweet back with a little quick report. It will say “Hey, we’ve seen this many attacks from that IP address; we started seeing attacks on this day; the most recent attack we’ve seen is this day.” Right now Threatbot is only hooked up to my last two weeks of data, so it’s not a ton of stuff. You can tweet at him and you can check. If it’s a big attacker, then he’ll report back to you and you may be able to see something. I have about six months of data that I haven’t incorporated into the same database that he’s looking at. I need help with that. I need to find somebody who knows MongoDB and Mnemosyne better than me. If you fit this description, then please hit me up because I suck at that kind of stuff. He also tweets daily statistics of how many attacks we’ve seen and the IP address of today’s top attacker. So, if you are interested in that kind of crap you can follow him on Twitter or whatever. Here’s what the reports look like (see right-hand image). That’s HK-47 from Star Wars: The Old Republic, if you are as nerdy as I am. You can tweet at him and he’ll tweet you back. I actually built kind of a cool regular expression thing for IP addresses. You can tweet at him with, like, five IP addresses, or you can write gibberish – he’ll filter it out and he’ll do the queries and all that stuff, so that’s kind of cool.Defensive Strategies
So, a couple of defensive strategies (see right-hand image). It’s, basically, standard threat intelligence stuff, whatever you want to do with the data. I mean, you can check for connections to or you can block known C2s, which is a pretty standard thing; flag connections to known-malicious subnets, same thing; look for connections to malware distribution web servers – those HFS boxes that I was talking about before, you should never talk to those under any circumstance. If something is talking to one of those, it’s probably bad, you probably got a compromise. You can check standard indicator stuff, presence of files with MD5s or Yara signatures of any of the malware that’s collected. Defending against attacks on SSH – this is so easy (see left-hand image). It’s stupid-easy. Use SSH keys, disable password authentication. If this is not possible for whatever reason, then you can audit: use strong passwords, audit against John the Ripper with the wordlists that I’m providing. You can blast it with Medusa if you want. Blast your own environment with Medusa, with the password list that I provided and see if any bad guys could potentially get into your stuff if it’s using password authentication.Roadmap
So, roadmap of some of the stuff I want to do in the future, and there’s a lot of this. This is my to-do list for this stuff, and it just keeps getting bigger and bigger. The problem that I’ve actually run into is I did these slides in, like, five minutes, because I keep working on it. I wrote Threatbot about two days ago, and everything just keeps blowing up bigger and bigger, and there are so many different things that I’ve been looking at, and I’m like, man, I really need to start doing my slides more than two days in advance.
Recap – real quick just what we’ve talked about (see right-hand image). We’ve talked about going from sensors to attacks, to malware artifacts (malware samples), to DDoS target leaks, to mass scanning the Internet. We don’t even need to capture samples to find C2s anymore, at least for this family. Conventionally, it’s going to be like, oh hey, I have a piece of malware, and I want to find the C2 for that piece of malware. And you still need samples to find C2s and stuff, but with this family, with this one thing that I’ve been looking at obsessively, you don’t even need to have a malware sample that talks to that C2. You’d scan the whole Internet, it’s easy, people do it every day. You could scan the Internet in, like, five minutes if you use mass scan and don’t mind responding to abuse complaint reports that you’re going to get a lot of. Here are some stats of stuff that I’ve seen to date (see left-hand image). I’ve seen 6,279,676 authentication attempts – this is over the course of six months, but I’d need to graph it out properly. I’ve seen 2 million in the last two weeks. I’ve seen 5,573 unique IP addresses since October of 2014. I’ve seen over 500,000 unique passwords being used. I’ve located a total of 30 Chuilang C2s. I’ve identified 27 malware samples – that’s right, that number is less than the number of C2s, think about that. And I’ve leaked 750 different DDoS targets belonging to 40 different organizations, and that’s just in one month, because that is when I’ve started to get the Chuilang logger working. Future plans: I want to build more signatures to identify different types of C2s (see right-hand image). I know there are more C2s that use other binary protocols and things that can be identified. Obviously, you’re going to have HTTP stuff, you’re going to have IRC stuff. It’s going to be a little harder, but it’s still doable. I want to expand Threatbot’s capability. I want you to be able to email him, I want you to be able to Pinterest to him, I want you to be able to do whatever you want. And I want to build that so that he’s got more data and he’s got more things that he can report back to you with more useful information, especially doing more things other than just Twitter. That’s just because maybe you don’t want people to see the IP addresses that you’re looking at, and I’m aware of this.I want to deploy more sensors, because hell yeah, I want more data. I want to build automation for warning that DDoS attacks are coming. I want to build something that reaches out to the DDoS targets’ abuse contact thing, and I want to let them know: hey, you are probably about to get a DDoS attack, it’s going to come from these people. I want to expand more stuff with Shellshock and Heartbleed and other vulnerabilities that are being executed. I’m focusing really heavily on SSH right now, just because the capability is already there, the attacks are happening, and it’s really cool stuff. I want to build an HFS web server watch script, something that, when I find known HFS malware hosting repositories or whatever you want to call it – I want to build something that reaches out and looks at those all the time, and it’s always refreshing that and it’s checking those, and whenever new malware goes up I want to grab it.
I want to improve the mass scanning and the dorking for HFS. So, I want to be able to find more of those HTTP file servers that host malware without necessarily looking at an operator log into my box and reach out. It’s 2015, you can mass scan the crap out of the Internet, so I want to find it like that. I want to improve the automated signature generation. I want to be able to build Yara signatures and things like that from malware that I get as it happens, as they come in. And I want to build more useful information into the Animus threat reports, because right now it’s kind of trivial information that can help you, but it can’t really-really help you. And there’s so much more data to collect. I mean, there are so many different IOCs that I could look at, there are more web servers storing malware, there are more C2s sitting out there, there’s a ton of stuff.
So, I just wanted to give some credit to ThreatStream. This dude, Jason, has answered all of my questions with everything. I’d like to also thank the Kippo developers for developing something so awesome. HD Moore helped me with a couple of things; I reached out to him and he responded. If you ever email him, he emails you back in, like, three minutes. It doesn’t matter when you do it. It’s like “Dude, what do you do with your life?” Brian Baskin – this guy helped me reverse some samples. Johnny Vestergaard – he is the developer of Mnemosyne, the thing that sits between MHN and MongoDB. HTTP feed talks to it, it’s an awesome database, and it’s really cool.@MalwareMustDie – that dude is awesome. Rob Blody is a buddy of mine, he helped me reverse some samples. I wanted to really say thanks to ShmooCon for having me. I came here for the first time when I was 18 years old. I was so stoked, and because of it I got my first job in security and all that stuff, so that’s the reason why I’m here right now. And the Linode abuse team – they’ve been pretty nice to me with everything. They would literally email me four times a day, and they’re like “Hey, we got more complaints, gonna have to block these people.” And I’m like “You got it!” So, I want to say thanks to all of them, and thanks to all of you guys for coming here to my talk. Do you guys have any questions?
Question: Do you do any log analysis on TTY logs to identify if this is a human or if this is a bot?
Andrew Morris: No, I don’t. That’s a really good idea. I’ll look at doing something like that.
Question: Can you share some of the names of the targets?
Andrew Morris: No, I’m not going to do that. Sorry. But it’s a good question. Any other questions?
Question: Are you doing any IPv6 yet?
Andrew Morris: I don’t have that capability quite yet. That’s on the to-do list. Thank you, though. But really, you are not going to mass scan IPv6, I don’t know… But that’s a good question, I should look at doing something like that.
Question: What are you doing with the source IP addresses that are coming in? Are you doing any verification to see if the things that are coming in are compromised or anything like that?
Andrew Morris: No, I’m not doing anything like that. I know somebody else who runs a project, where he’ll actually just turn right back around and he’ll ‘nmap’ the box and he’ll look for a web server and stuff. And he’s got good results doing that. That’s something that you can do and I could do, I just haven’t implemented it yet. This is all still pretty half-baked stuff that I’m working with.
Okay, thank you guys so much for coming!