Security researcher and former hacker Samy Kamkar delivers a speech at Defcon called “How I Met Your Girlfriend” where he introduces a PHP-based method of compromising a Facebook account for strictly personal purposes. Below is the adapted text version of his talk.
So this is a discovery and execution of entirely new classes of web attacks in order to meet your girlfriend. So, before we begin, a little bit about me. I am security researcher – ‘narcissistic vulnerability pimp’ is what they are being called these days, right? I do not do security professionally, I do it for fun, like most of you guys, or some of you guys. I am known for the ‘Samy worm’1, the worm on MySpace a couple of years ago. I co-founded ‘Fonality Inc.’, an IP PBX company. And I love Lady Gaga.
You probably wonder, like, why I haven’t heard of this guy, he has done nothing for a couple of years, so why is that? They didn’t let me touch computers. It’s true. A few years ago I was raided by the Secret Service, electronic crime SAS Force. They came into my home, they took all of my computers, well my laptop, they took my phone, they took any CD’s, DVD’s, they took my Xbox. The court forbid me to literary touch computers, I was banned for life. A couple of years later I fought, and I fought, and I am now back. I can touch computers. But I am not allowed on MySpace.
Alright, so what are we going to talk about today? Talk about the web. Why the web? Honestly, you know, I got bored of the web a couple of years ago. You know, it’s really cool, there is so much you can do with it, but security is so much broader, right? There is so much cool stuff going on here at Defcon, and just in security in general, right? You have reverse engineering, you have network security, you do have web application security, there’s hardware hacking – all this cool stuff. Some people have even made ATM suspense cash, you probably haven’t heard about that, it’s really cool, new thing that they’re talking about.
But the web is actually really cool in another way. Everyone has a web browser. If you have a computer and you have an operating system, you have a web browser. So, it’s like the one piece of software that allows me to deliver code to you, and for you to execute it. It is basically code delivery mechanism that I can attack anyone, everyone has the Internet today.
It’s kinda like when the App Store came up for the iPhone, right, at that point you could deliver any sort of content that you wanted, of course Apple bans malicious content, and fortunately they give us freedom from porn.
So the web browser is just like that except no one’s guarding it. There is no one checking to make sure that your site’s not malicious, I mean obviously there are companies that are doing this in software, that’s working on this, but for a long time there hasn’t been.
So, this is my home page (see screenshot to the right), it’s probably for any of you. Anna Faris, she is amazing, I am in love with her. So I was checking out, you know, just pictures of girls on a social network as I typically do before I get in a lot of trouble. And I found her, and I think, man, she is amazing, you know, she is the kind of girl I wanna get to know.
So I am looking at her profile, looking through her pictures. I can’t really see too much, she is not my friend. Thought about that for a second. And then I thought, oh man, you know, I should message her, but then I saw she is in a relationship. Not an open relationship, it’s not complicated, she is in a relationship. So, who is this guy and how am I gonna best him. So I look into him a little bit.
Alright, so this guy is a certified information security specialist professional, chief executive officer of ‘SecTheory LTD’, co-author of ‘XSS exploits’, oh no, author of ‘Detecting Malice’2, co-developer of clickjacking – really cool technology with the really awesome Jeremiah Grossman – runs ha.ckers.org and sl.ackers.org, if you guys have been there, and is a certified ASS, which is an application security specialist.
It’s pretty impressive resume. A man who needs no introduction – Robert ‘RSnake’ Hansen (on the photo). So here is the problem: I wanna attack this guy. You know, we all know we can attack random people on the web. You know, you have a little bit of malicious content on there and you’ll get some sort of hit rate. And if you have enough visitors you will be attacking random people. But I want to do a targeted attack. To someone who is secure. You know, someone like you people, who understands security, who is probably running with a lot of technology to help secure himself.
So how do we do it, how do I attack him? You don’t, you do not attack that person. Attack indirectly. Girlfriend? That’s what I am trying to attack. So he is on Facebook. Facebook is an awesome website, it’s a social network, it’s the cool one these days. Now if we go to Facebook, we’ll see something in URL bar: index.php. And now you’re thinking, it’s PHP, it’s a computer language, right, typically used for the web. It’s an extremely common web language. I am sure all of you have at least heard of it, and many of you, I am sure, programme in it. It’s great because it’s extremely common. So it’s well understood.
The code is open source. You can all go look at it and see what’s going on in there. It basically has very good session management that everyone uses. Every single person who does PHP and uses sessions typically uses the built-in session management. If you’re using frameworks, like CakePHP or Kohana or other things – they are also using this session management.
So PHP sessions – what are they? They are basically a random string that’s generated. It’s passed either in the URL or cookies. So what are cookies? A cookie is basically a persistent piece of text that remains with your browser, I am sure all of you are familiar with that, just contains data. Typically it will contain session data. Session data is basically a random string, so that when you go to any page on that website, it can identify you with other information the server has stored locally.
So when you go to Facebook, and you log in with the username and password – they provide you a random string that is assigned to that username. If you ever go to any other pages later on, they look at that random string that you’re sending them, and they say, oh, I know this guy, this is Samy. So it authenticates you.So let’s try to attack a session. Let’s look at PHP session code (see image with the code snippet). It’s open source, so we pull up ‘session.c’. This is the function ‘session_start()’. This is what creates a session in PHP. Basically what happens is – it creates your random string right here in the snippet of code in this spprintf. It’s looking at a couple of things. It’s looking at the IP address of the person authenticating or getting this session. It looks at the epoch, which is basically a time from January the 1st, 1970, the number of seconds. It’s looking at the microseconds that that person acquired the cookie. And it’s looking at a random, just a random number that’s created.
So if we take all of that, that’s a 160 bits of entropy. When we get a little deep here, just for a little bit, so 160 bits is a lot if I wrote a Brute-force3. Let’s say I wanna become RSnake on Facebook, what I would do without brute-forcing session, that random string. But 160 bits – that’s a lot.Now, bits can be a little confusing. Let’s do a real quick primer. You know, 64 bits is not double 32 bits. Every time you add a bit – you are doubling. So just a quick primer: what we can do is a little trick, for every 10 bits you can add 3 zeros. So 10 bits is a thousand, 20 bits a million, 30 bits a billion. Also, if you can just remember the 10 bits, 0 through 9 equals – 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. You can take that number to figure out what you want. So 25 bits, we know 5 is 32, and 20 bits is 6 zeros. So 25 bits is 32 million in that scenario.
So 160 bits is essentially 10^48. If we could brute-force at a 100 trillion values per second, it would take 900 quadrillion eons to brute-force. I didn’t even know what an eon was, I had to look it up, it’s 500 trillion years, that’s a lot.
So, again, 160 bits, we’re not gonna brute-force this, doesn’t matter how fast a computer you have. So let’s take a look at this a little bit closer. Well, microseconds isn’t really 32 bits. Microseconds, there are only a million microseconds per second. Well a million, if you remember, is only 20 bits, right, because there are 6 zeros. So we actually just reduced, without doing anything, 160 bits and we reduced 12 bits, and got it down to 148 bits, which doesn’t help us, that’s a lot.
So let’s take a little closer look. If you’re familiar with Facebook, it has chat (see image). When you go online, when someone logs in, and you look in chat window, you actually see that person come online. How is it happening? It’s happening with AJAX1, your client is continuously checking with Facebook to see if someone else is logging in or someone is going offline, just to understand and get an updated status.
Well, if you use something like Live HTTP Headers (browser add-on) or a packet sniffer2 or something, you can see the HTTP requests going back and forth. And you can recreate those if you’d like. And what you can do is you can just send that request “Is there anyone new online?” every single second.
As you’re sending it every second, one of these times RSnake is gonna go on because he wants to check if anyone poked him. The cool thing about this is that if you see here in the red, the date in red is sent from the server, the server sends us their local time. Our local time doesn’t help us too much because we don’t really know the difference between our local time and the server’s local time. That local time helped create that cookie if you recall.
So, that 32 bits, if we are checking every second we can now reduce that 32 bits as soon as RSnake comes online, just by watching that every second, write a program to do that. We see him come online, we take that date, convert it to epoch – we just reduced 32 bits. We’ve now reduced the 160 bits by 44 bits down to 116 bits. That’s awesome, still a lot.
Let’s go further. So he comes online, we can send him a message. Why not send him to my blog namb.la? So you send him there, and then what you do is you just track the IP address. Don’t worry, there is no XSS3, there is nothing on there, if he is running Nosript or something that’s protecting him, there is nothing malicious on my website. So he goes there, nothing happens to his browser, he sees a really cool blog post about how I did a Defcon talk and everyone loved it.
And what we do is we track the Apache logs and we see his IP address. There is another 32 bits of that cookie. So we now downed it to 84 bits from 160, basically half, well it’s not really half, okay, bits – it’s so confusing.
So what’s left here that we don’t know is 20 bits of the microseconds and we’re not gonna guess that. There is no way we’re accurately gonna guess the microseconds that someone logged in on a remote system. You might be able and try really hard if you got system really close and time things really accurately, but it’s not worth it.
So the only other thing left here is this random ‘lcg_value’, 64 bits. What is this? An LCG4 is a Linear congruential generator, it’s a pseudo-random number generator. It’s been studied for years, since like 25 years ago or something, it’s older than I am. So it’s really well studied and really well understood. You can actually look up information on how to reverse that. LCG used here is actually 2 LCGs that are combined, so it’s a little bit harder. I didn’t understand that too well, but I looked over the code anyway.Well, as soon as the LCG – the random number generator – is called, it’s seeded. The seed basically provides the actual random data that provides every single random number from here or now. Now the seed is critical to the randomness of the PRNG5.
So let’s take a look at the seed function here – this ‘lcg_seed’ at the bottom side of the screen. What you’ll see is… there are 2 parts of the seed, and it’s 64 bits. It’s 64 bits of entropy, and every random number is also 64 bits. It’s split into 2, called s1 and s2, each 32 bits long.
Now s1, as you can see, is the thing called ‘tv_sec ^ ( ~tv_tv_usec), what that is… that’s epoch, as soon as it’s seeded, exerted with the ones complement of the microseconds that PRNG was seeded. s2 is the process ID.So let’s just take a look at s1 a little bit more. What you are seeing right now on the image is 32 bits of entropy. The interesting thing about this is the seconds that the PHP was seeded, was probably when the web server started. We don’t necessarily know when that happened but we can, we can potentially make an estimate. We can also send thousands and thousands of requests to web server to get it to reset, and to get, to figure out when it started.
One of the issues is that we wanna know which request caused that reset, but we can make an assumption that it happened the last hour, if we send enough requests. Now, what they do to make this harder to guess is they exert with the microseconds. There is no way we’re gonna know the microsecond that the web server started, but they’re exerting the most variable data of the epoch with the most variable data – microsecods. The fixed data like what year, what month, what week it started – remains the same. Basically it means the fixed data remains fixed.
So we end up with 12 bits that we can guess if we know within a 12 day period of when PHP started. And again, if we don’t, we can send enough requests to get that. The other 20 bits is just microseconds exert with the other variable data of epoch, we don’t know that, whatever, that’s 20 bits. We just reduced 12 bits of entropy in the random number generator.
So let’s look at s2 – process ID, get process ID. That’s 32 bits of entropy. Well, process IDs on Linux are only 15 bits long. So immediately you reduce 17 bits. And if you can execute PHP through any function, if you can acquire, if you can execute a program like ps, if you can hit an Apache servering for page or something like that, you get the entire 32 bits. We’ve now reduced 64 bits down to 20 bits in the PRNG.
We are now at a total of 40 bits of entropy from a 160 bit cookie, for every cookie in PHP, that’s awesome. But wait, there is more. We can take, normally we think like 20 bit and 20 bits – that’s 40 bits. Well, we can actually calculate the PRNG, the LCG value, the 20 bits that we didn’t know – separately. Separately means we calculate that 20 bits, and then the other 20 bits – that’s only 21 bits.
We can calculate with the time-memory tradeoff6 and code I’ve created. We can calculate that 20 bits in a matter of seconds. That reduces the other 20 bits, and now it’s exactly 20 bits. The 20 bits of entropy from the microseconds that the user authenticated, the cookie. On average, we will be able to log in as him with 500,000 requests, which we can easily do in a day. So I’ve done it. I’ve become RSnake!
So what can we do at this point? Well, first let’s understand how do we fix this. Make sure you are running a new version of PHP (PHP 5.3.2 or later). I sent this over to PHP, and they quickly released the patch, they added some more entropy. Or create your own session values. Use your own randomness. One of the great things about PHP is that it’s very fast, it’s meant to be fast. OS is basically cross-compatible. So they don’t do too much that’s very system level, like they’re not gonna access /dev/random because that’s all on Windows for example. So create your own session values or seed your own random number generator.
You don’t need to understand crypto, just use a strong seed that your system comes with. If you’re running on Linux there is BSD or something like that, you know, use /dev/random for your seed. Don’t use the process ID. The attack is difficult to execute. It’s much easier on social networks, where I spend most of my time unfortunately.
One thing to note, Facebook is actually not vulnerable, this is not an attack on Facebook. If you’re familiar with Facebook, they created their own version of PHP called HipHop. It’s sort of compiled with C++, supposed to be much faster. I love Facebook. If you could plant some crops for me in my Farmville, I would appreciate that.
So at this point, what do we do? I am logged in as RSnake, how am I gonna meet this girl, you know? She is happily ‘boyfriended’, I don’t know what the word is. So using his cookie, I can now message her as him. So what do I say? Oh, here is the thing, why don’t I send her to a malicious URL: namb.la? We’re gonna attack her network now.
We’re gonna learn a little bit about a network and a NAT here. So a NAT – what it is it’s basically a system that allows you to run multiple systems behind one public IP, in a nutshell. All of your computers behind the NAT will typically run in private IP space. Well, typically your cable modem provides only one IP, so use router which contains NAT software. And that will allow all your network devices to run behind the NAT.
It also is some kind of a firewall. It prevents people from accessing services and ports that you have running on your computer, whether you know it or not. So when you go behind the NAT and you’re running, let’s say, Apache on port 80, no one can connect to you, except internally on your network, unless you go to the NAT and you enable port forwarding or DMZ7 or something else.
Well, let’s talk about something that some of you may have heard recently called Cross-Protocol Scripting (XPS). Cool thing about this is HTTP servers can run on any port. This means the browser will allow you to communicate to other HTTP servers on any port. But HTTP is a newline-based protocol. What that means is each line has some data rather than some weird, let’s say, XML formatted data, or a binary string of some sort. But there are other protocols that are also newline-based. So what we can do is we can actually communicate with a different newline-based protocol, like IRC1.
IRC is a great place, it’s good people. So take a look at the image to see what an IRC connection looks like. I telnet to a reputable server like efnet.org, I log in – my username is Samy, I respond to a PING request, and then I join a channel, and I find out, you know, where can I get WinNuke2. Well, it doesn’t work anymore. So if anyone has a version that works, please send it to me.Let’s see how we do an IRC client on the web. What’s interesting about this is I create a malicious page that has this code running on the page (see image). You visit my malicious web page. Now your client connects to the IRC server, your web browser thinks it’s an HTTP server. And what it does is it sends HTTP request with the postdata of my IRC data. And the IRC server says “Well, I don’t understand this HTTP request, I don’t understand this line, line, line. Oh, I understand this, I understand ‘join #hackers’, I know what that means, I’ll interpret that, and I’ll just ignore all your other stuff”.
At this point, I am making your IP address connect to the IRC server. Now, this can be used for SMTP3 for example. Spammers have actually been using this for years and years, and it hasn’t been really well-known. They’ve been basically making people’s browsers become spam servers. You visit a page and, without you ever seeing, on the backend there is a form that’s connecting to an HTTP server on port 25 and auto-submitting that form, and basically you’re now sending Viagra spam. Why are you going to Viagra site – I don’t know, but it’s what they have on the back.So, you can see on the image what an HTTP post looks like. You see basically all the HTTP headers that your browser is sending, and then you see the IRC data. Again, the IRC server ignores the data it doesn’t understand, until it hits this data it understands. So I’m bringing this up.
Let me talk about something called NAT Pinning. So it’s like XPS4 over times 9000. So what is NAT Pinning? Well, here is the thing: your web browser was confused. It thought it was communicating with an HTTP server but it was communicating with an IRC server. Now NAT Pinning takes this one step further, and basically it makes the router also confused and thinks that it’s communicating with a different protocol.
So now your router thinks it’s communicating with IRC, your browser thinks it’s communicating with HTTP, and they start doing different things. What can we do with this? Well, let’s analyze a malicious server. So you have your systems, your network devices behind your NAT. You have the malicious server that you’re going to hit a website, you’re hitting a web URL on that malicious server.
Now, if you’re familiar with IRC, there is something called the DCC. It’s basically how porn is sent over IRC, it’s great, it’s great protocol. Basically what a DCC is, it’s a direct client connection. So when you’re communicating with an IRC server, when you’re chatting with all the other really cool people, you say “You know what, I wanna send you this file, so connect directly to me, there is no point for the server bridging this file or this chat, so connect directly”.
The way that works is you send a message to that person and you say “Hey, I want you to connect back to me on this IP address, on this port. Now, years ago, routers didn’t understand this message, it was just TCP traffic. And what would happen is if you didn’t have that port open or forwarded to yourself, then the connection would never establish. People complained, you know it broke all sort of things, it broke IRC, DCC, it broke FTP, it broke SIP. So routers got smart. They started developing software that would actually watch the traffic, look for messages like this: ‘PRIVMSG samy : DCC CHAT samy’. And if they saw that message, then they would say “Oh, a client on my network is trying to get a file sent, so I’ll port-forward that port back to them”.
So this is really cool. Now, once the XPS stuff basically became big, the browsers start working on blocking certain ports. They say “You know what, you shouldn’t be communicating on port 6667. If you are running a web server on there – you’re stupid, choose a different port”. So they start blocking ports. So the port is 16 bits, that’s the size of a TCP or UDP port.
Now, if the browser says “You know, I am not gonna allow connection on port 6667. Does this port match 6667? No, just overflow it”. So if you add another bit and you add 65536 to it, you get this bigger number, right, 72203. Your browser says that’s not 6667, this will go just fine, it gets sent it to the TCP stack which then shortens it, and now you have 6667. I did not think of this, it was actually the respectable security group Goatse Security. They came up with this. Very good people. Very awesome, very awesome.
So at this point, Anna has clicked on my link. And now I have attacked her ports. So what did she have open? Well, a lot of OS X systems have a web server running by default. She was working on a website, so cute. So I connected back to her port 80, and saw she was making her own website about Team Jacob from ‘Twilight’. I love ‘Twilight’. Now I know how to get her. Here is the thing, I am actually on Team Edward, but I am not going to tell her that. So when I see her I’m gonna say I am on a Team Jacob.
So, how do you stop NAT Pinning? Well, there are so many ways that you wanna… well, you wanna have multiple layers. So you want a strict firewall, try to make it as strict as possible if you can. You know, if you don’t expect people using IRC on your network – block it off entirely. If you don’t expect people being able to send stuff – block off, you can actually turn off UPnP5 and other protocols that allow this type of thing to happen. Client-side, run up-to-date browsers. WebKit was vulnerable to the port overflow. I believe that is resolved now. Other browsers might still be vulnerable, I am not sure. Make sure you’re running up-to-date browsers. Use NoScript if you are using Firefox, that will block all types of things. When I released NAT Pinning, NoSript a day later added it to production. Run a local firewall if you can, like ‘Little Snitch’.
I mean, really we all understand security is not just one level of security. You basically have to use multiple layers of protection, like I would with Anna. So at this point, I know what she is into, I know how to win her over, I think. So I am gonna send her a message to get her on another malicious website (see image), and basically say: “You know what, this guy Samy, he is a really good friend of mine, he is gonna come over and take care of you, check out his Twitter”.
Alright, well, who cares, it’s a MAC address – like, what’s the big deal? So briefly, what is a MAC address? Basically, every network device on your network has a MAC address – kinda like an IP address – it’s in hardware, you can’t change unless you’re spoofing it, and it’s how everything communicates with each other on your LAN.
So why the MAC address? Why do we want to acquire it? What’s so interesting about it? I’ll take you through the steps. Just Bing it: open you browser, type ‘www.bing.com’ in your URL bar. When the search box comes up, type in ‘Google’ and hit Enter. So why Google? Oh yeah, because they know everything, really. So, some of you may be familiar with the ‘Google Street View’ service.
So what you see on the image is the Street View car. Some of you may have seen it; some of you, well, lot’s of you probably know what it is. It’s the car, it’s the one guy who drives around America taking pictures, drives on every single street. Now, we understand the street view is really cool. You can go onto ‘Google Maps’ and you can see all the different streets, the people flashing the cameras, marriages proposals – all sorts of awesome stuff. What you may have not known is that they are collecting data. Now, recently there’s this big thing about Google collecting unencrypted Wi-Fi data. Well, this has nothing to do with that. And soon they don’t have any of that Wi-Fi data which they’ve already deleted in a lot of places.
They are still collecting other Wi-Fi data. So what are they collecting? Well, as they are driving around, not only are they taking pictures, not only are they mapping GPS coordinates, but they are also looking at just Wi-Fi packets in general: not the data portion – they are looking at the headers.
Now, what’s interesting about the headers? They contain MAC addresses, the hardware MAC address of your router – the same MAC address that we acquired just with the XSS. Alright, so why is that interesting? Well, with Wi-Fi you can detect strength. As they are driving down, they are actually detecting my network at home. They‘re driving on the street and say “Oh, I detected network, it’s about 10 out of a 100 strength wise. It must be close”. Driving a little further, taking the pictures – “Oh, it’s stronger now, it’s 50 out of a 100”. Gets to its maximum, say, 85 out of a 100, and then it start to basically go down. Well, they’ve just triangulated my position.
Not only are they actually going on that street, but they are going on every other street around it, and now they get even more accurate data. They may see it actually goes up to strength 95 on the street parallel to mine. Now they know that I am actually closer to that street than I am to the other one that was at 85 strength. They are literally triangulating your network. It doesn’t matter if you are encrypted, it doesn’t matter if you are using WEP, WPA, WPA2. The packets are flying and the MAC address is in there unencrypted.
So how is this interesting? You know, we can’t really access it, they visit our malicious website, but we don’t want them to see that, and they are not gonna click it. Well, Firefox is making an HTTPS connection to Google and asking them for this information. Why don’t we make that connection? I can write a program on the backend, so when you visit my website, I use XSS to acquire your MAC address, and then I send it back to my little program running in the background. It’s not running on your browser anymore.I then connect to Google. I send your MAC address and this post request, and it sends back your location, your coordinates. Now, just to understand how accurate this is: this is an actual router I’ve exploited, and I knew the address beforehand, because Anna was there. So, I went over there, and I did a ‘Google maps’ request. And I said “Take these coordinates and drive me to the location, the address. The image shows how far it was. Driving directions – 30 feet. Router was 30 feet away. Seriously, that’s what it said – 30 feet. That’s how accurate the coordinates are. It’s on that router that I was exploiting.
I think Mark Zuckerberg said it best: “Privacy is dead”. Thank you!