Delving further into techniques to keep payloads undetected by antiviruses, David Maloney analyzes the efficiency of several popular obfuscation methods.Alright, so how do we get around the problem of the EXE Template? Well, like I said, the default template with no payload is 42 detections. We have the ability for users to supply a different EXE Template, and any of the exploits that do executable drops actually have an advanced option on there called EXE Templates. So if you do Show Advanced in Metaspolit Framework you will see that option. You can point it to a totally different template. The problem is this still only cuts the detection rate in about a half. Part of that is because of other things that we are doing to the executable template, and some of it is because of the stager and so forth. So the first thing I set to do was fix our injector method (see left-hand image). The way our injector works is it creates a new segment in the executable, which then calls VirtualAlloc to create RWX memory. We have a couple of different ways of building EXEs, one of which is just to inject, which preserves the original functionality of the executable. And what we did before was we would have this new segment and the segment would call VirtualAlloc to create RWX memory and stick our payload in it and execute it.
Well, that was getting flagged pretty quickly. So this is actually now in Metasploit Framework, this is the only piece of this talk that is already implemented in Metasploit Framework. A lot of this stuff is not going to be directly implemented but that’s why I am sharing the concepts with you guys.What we do is we use Metasm to create a new RWX stub inside our executable. So executables are broken down into sections, one of which is the text section that contains the actual code that is executed, so we just basically create a new text section in that executable (see right-hand image). Then we take our existing payload and just basically decode it back into assembly instructions and shove it into that section (see left-hand image). We have the ability to set a buffer register which is part of the encoders for polymorphic encoding. This basically tells our encoder where the payload is in memory. And then we just write a little assembly stub around it, and all it does is create a thread, execute our payload in it and then jump back to the original entry point of the executable (see right-hand image).
What does it look like if you run this executable? Let’s take calc.exe, it is actually executing a totally different section of code that we have created but all it does is spawn off a little thread in the background that you are probably not going to notice, our payload runs in it, and then it jumps right back to the original part of the code the developer wrote. So your calculator pops up and everything works as normal, but as long as calculator is up you’ve got a Meterpreter shell running in the back. Pretty fun stuff.There are some more technical details, but this cuts the detection rate from 41 detections for the injector method down to 14 (see left-hand image), pretty nice, pretty nice, if you are not dealing with one of those 14 AV vendors. It gets us a little bit closer but doesn’t really solve the problem.
So, what I set about doing was to cut through and just do some basic stuff. What’s getting flagged? Is it the RWX stub creation, is it executing the memory, what specifically is getting detected here?I ran through and created a bunch of different executables. The first one just creates read-writable memory at runtime and that actually got flagged, which seems a little odd. Then we created read-writable executable memory, still did nothing with it, we are not writing anything in there, we are not executing anything; still gets flagged. What if I write non-executable data into that memory but don’t try and execute it, don’t do anything with it? Well, all the serious detections drop off. We still get a couple of warnings, these are, like, suspicious programs, potentially unwanted programs, that kind of thing, but the actual detections start to drop off.
What if we write some actual shellcode in there? Well, we get detections but not that many, right? Let’s try an encoder – that dropped the detections down but we are still getting detected a little bit. When we actually start executing it though – then we see those detections spike back up into the areas that we were expecting to see around nine, ten, plus some warnings.So, from this we can conclude (see left-hand image) that creating RWX memory is not the primary thing that it’s flagging on. And there are safe ways to use VirtualAlloc and there are unsafe ways to use VirtualAlloc in terms of getting detected. It seems to be very contextual, which says that we are not actually getting flagged by pattern matching, we are actually getting flagged by heuristic analysis, so actually running or emulating the run of that executable and detecting behaviour. I was going to have a whole section on this but it turned out to not be all that useful. For anyone who doesn’t know run-time dynamic linking (see right-hand image), Windows provides the ability to, at runtime, load a DLL, grab a function address and call it without having to have that function in the import address table.
This is a way of obfuscating what Windows API calls that we are actually making for static analysis. Unfortunately this only dropped one detection across the board. I applied run-time dynamic linking to all those previous methods and it uniformly dropped one detection. So it turned out to not be all that useful. But it does clue us in that the old signature-based detections that we used to worry about that pattern matching, where you say: “If you just make the code look a little different, you will get past it,” – that obviously doesn’t hold true anymore.So my next thought was: “Maybe it’s just recognizing the shellcode in memory,” because we are just shoving all the shellcode in straight order into memory somewhere, whether it’s in a new code segment, whether it’s in a variable (see left-hand image). What if I scramble that shellcode up? So go back to putting in a variable for a minute, and instead of writing the bytes in order, we write them all out of order, and then we create an index map that just tells us where the bytes actually belong, and then we just unscramble it right before we execute it (see right-hand image). It’s basically an obfuscation technique. The shellcode can’t be pattern matched because it’s just a random series of bytes at the moment. I also experimented with: “Well, maybe if we use a Jmp instead of a Call,” so what we typically do is we write our shellcode and we dereference it as a function and then we call that function. What if we instead use an assembly Jmp instruction to just jump into it as if it weren’t a function? Does anyone think that made any difference? No. It didn’t make any difference.
Not starting to feel too smart at this point, really. Some of these ideas worked a lot better about a year ago when I first started doing this and have since pretty much dried up. So the index map made no difference, the Jmp versus the Call made no difference, again it’s not pattern recognition that is working here, that’s pretty much a given.
Read previous: AV Evasion 2: Hurdles for Metasploit Payload Execution
Read next: AV Evasion 4: Encoders and Fuzzy NOPs Fail