A Forensic Analysis of Android Network Traffic

0
643

Lake Missoula Group’s Director of Research Eric Fulton introduces his Defcon 19 talk about Android privacy risks and security vulnerabilities emanating from smartphone apps.

Eric Fulton

Hi there! My name is Eric Fulton, I work for a consulting firm called Lake Missoula Group, in beautiful Missoula, Montana. I know you might be thinking: “Do you guys have public transportation in Montana?” Yes, we do. And we also got hackers up there, which is a lot of fun. So we can hack in the morning and hike in the afternoon, as I like to say.

I also help run ForensicsContest.com. We actually run a Network Forensics Contest Puzzle during Defcon, which is pretty sweet.

During this talk, I really would like to say thank you to Sherri Davidoff and Jonathan Ham. They are absolutely amazing ninjas with packets, and they are actually writing a book, and I was able to use in advance a copy of their book to do some of the analyses that I am going to kind of show you guys today.

Presentation structure
Presentation structure

So what I am going to show you today is I am going to start with some definitions and testing methodology, kind of going into how I analyzed the packets, what I was looking for, what I found, some fun findings I found through all of this; and then I’ll come to a conclusion (see preview).

But what I am trying to cover here is some distinct topics, privacy especially; I mean, obviously – it’s in the title. Privacy in our lives is important. I think that our privacy that we have is eroding, and we don’t exactly know what’s happening. Maybe at home you’ve got a smartphone and you don’t realize that every day it’s leaking your location, your apps, and some other interesting facts.

So that’s what I wanted to cover. I also want to touch a little bit on network forensics, because this is what I used to help discover what’s being shared over your Android phone.

Basics and Definitions

So what is network forensics? The Wikipedia article says: “Network forensics is a sub-branch of digital forensics relating to the monitoring and analysis of computer network traffic for the purposes of information gathering, legal evidence, or intrusion detection.” But basically it’s sniffing packets on the wire.

You’ve got traditional forensics, you know, we take your hard drive, you ‘dd’1 it, you make an image, you analyze it, and then you try and say: “Hey, what’s on this hard drive?”

But there is also network forensics, and that’s where you’re going: “What is going over the wire? What is my computer leaking? What is going on?” I mean, with traditional forensics, unless you pull the memory, you won’t ever realize that there is something loaded in the memory leaking any number of things.

So listening on the wire gives you a definite perspective, and it lets you really understand what your phone, your laptop, your server, etc. is actually sharing with your network and the world. Or network forensics can be called: listening to the wire for fun, ???, profit and lulz.

Potential impact of network forensics
Potential impact of network forensics

So, how does network forensics affect us? All of us use network devices: we use laptops, phones, etc. And everything is network based. I mean, back in the day before the advent of the Internet, and the beauty that is network communications, people just had a single computer that was not connected to anything else. Everything you did was on that terminal.

But now we send all sorts of things to everyone. We send usernames, passwords, hashes, URLs, lolcat pictures with your Grandma. But we send all of these amazing things over the Internet, and we think to ourselves: “Oh, I am sending my password to this service”. And a lot of people don’t think of all the third parties that affect that.

When you log into Twitter for example, most people think: “Oh, my computer, Twitter – that’s all that’s happening”, but they don’t realize that they are going from most likely their laptop, iPhone, iPad, iDevice etc., to probably a wireless router, which then connects to the ISP, which then is routed over the Internet to Twitter service. And along there, whether or not I have access to your actual computer, I might have access to your network traffic, and through that, I have access to lot of fascinating information.

I mean nobody wants to be handing out their usernames, passwords or anything else. And a lot of companies are really good at protecting that, that’s why they hash your passwords, but the simple fact that I may look at is that it’s a huge not only privacy risk, but security vulnerability.

Some of our applications send: licensing and registration data, update information, demographic data etc. And all this data can be filtered, logged, and analyzed by third party. You don’t know what your ISP is doing. Most people just sign that contract and assume that the ISP has their best interest at heart. And I am not saying the ISPs can be evil, but if they wanted to be, there is a good chance that they can do a lot and a lot of damage.

Or your roommate who is also on your wireless network can do a lot of damage, assuming how close you are with your roommate; or the guy next door if you are using WAP.

Phone-related private data
Phone-related private data

So essentially, what I am trying to say is that there are a lot of ways our phones could f**ck us (see image). What I am really specifically focusing on is Android application security. A lot of people have done a computer thing, a lot of people have done laptop forensic analysis etc., but something we don’t realize is we’ve got this essentially super computer in our pocket. I mean, when I got my first computer, and though I cannot say what that was, but it was like a tenth of the processing speed of my current phone on my pocket.

And people don’t realize all the fascinating things that they have on their phone, and what their phone knows. Our phones have a lot of things. If you are doing GPG encryption and try to decrypt your emails – that’s assuming you are a naturally secure thinking person – you have to have your private key on your phone to decrypt your emails. If you are not a private thinking individual, or secure thinking individual, you are just sending your emails over your network connection. You have emails, usernames, GPS, etc., and more.

When I first got to this research I thought: “You know what would be really cool, let’s build an evil application”, which I think some of the other presenters at Defcon have done, which is awesome.

Report on Android trojan intercepting victim’s communication
Report on Android trojan intercepting victim’s communication

But when you make that application, it’s ultimately silly because as long as you can get the user to press OK – you’re done. I mean, how many people with smartphones scroll through their phone and they’re like: “I wanna play this game”, scroll, scroll, scroll, OK.

I mean, I don’t know how many of you guys watch South Park, but there was a whole episode on HUMANCENTiPAD, where someone didn’t actually read the EULA. Oh, God forbid someone read 39 pages on a short level of dense legal text. That doesn’t happen.

So anyone can build an evil app, put it on the market and say: “Hey, you should download this”. And anyone could execute it and it could export a lot of bad information. And we know this is bad, right. There are a lot of companies out there that do a lot of great things trying to prevent malware, evil applications etc.

But then I got thinking: “Okay, we know evil applications are evil, but what about regular applications?” I mean, when you get your Android phone you think: “The first thing I wanna do is I wanna stream music through Pandora”. Right? I mean, it’s really awesome having an unlimited Internet radio station on your phone.

And you play with it a little bit longer, then you’re like: “Oh, sweet, I forgot about Angry Birds”. Who of you loves Angry Birds? I am not gonna lie, I use it. You are sitting in the meeting, you are sitting in your office, you are on the phone with your boss, and you are just playing. Not that they know you are playing it, but…

The fact of the matter is, you are thinking: “All of these apps, I’ve paid for and downloaded them from the Android application market, and it’s a game. What I’ve downloaded is a game”. But what you don’t realize is you’ve downloaded a little spy in your pocket.

Previous research on the subject matter
Previous research on the subject matter

Now, some previous research has been done by the Wall Street Journal and by a man named Aldo Cortesi. And I’d like to meet this man and say a lot of great thanks to him. I was even going to call my original presentation ‘Android – the spy in your pocket’. And the Wall Street Journal has done a great research thing on what these applications are sharing, and how they are sharing it.

To get back to the privacy side of it, in terms of privacy we don’t realize how much we share about our lives. We think: “You know, all these companies have anonymized data”. And one of the best examples is with Apple, where you have UDID2. It’s basically an anonymized number that says: “Oh, I am me, but I am not actually Eric Fulton. The company doesn’t know who I am, they just know my number”.

Well, that’s cool. But there are companies out there in the world, where their whole idea is de-anonymizing who you are, like figuring out: “Oh, this guy that lives in Montana, that loves ‘Dorido’s’ and ‘Mountain Dew’, and also travels to these places – is this number”. And they can easily tag it as Eric Fulton, because I love ‘Mountain Dew’.

And so, as part of this I thought: “Alright, let’s start looking at these applications – these applications that I blindly trust, that I think, well, yes, I totally believe these people”.

Complete cycle of the research
Complete cycle of the research

So Scientific Method to the rescue: what I wanted to do was create a kind of reproducible project that someone else could look at, and I could do standards that would display what our applications are sharing.

So we’ve got all of the basic things there. And the question I asked was: “To what extent do participants in the cellular ecosystem (OS creators, app creators, carriers, etc.) respect user privacy?” Now, my research has only gone as far as Android, but I hope to get to Windows phones, and to BlackBerries, and to iPhones.

Hypothesis specification
Hypothesis specification

But right now we are going to be focusing on Android phones. So my hypothesis was, in terms of respecting privacy, what do they do? And I thought, you know, the software applications and operating systems transmit your private information. I mean, to a certain extent it’s built-in, that’s what they are supposed to do. When you log into your Facebook, you kind of have to give Facebook your username and password.

But what do they give to third parties without your knowledge? What do they give to advertising partners? And more so, what are the advertising partners that these companies blindly trust, collecting about you?

And so I thought: “I bet they are sharing the standard data: you know, usernames and passwords, and things that personally identify you”. I mean it’s a part of the application. But when you think about it, why does Google need to know your location when you are searching for something on Google? To a certain extent, that’s being done for a business purpose, it’s helpful. They need to know that I am in Las Vegas right now when I search for Petiscos restaurant. They know: “Oh, restaurant in Las Vegas”. But at the same time I have no real option of turning that off. I mean, I know Google says: “Hey, if you want you can turn off your location data, your GPS etc., we won’t collect it”.

But what we don’t realize, and what I found out later was – they kind of do, maybe not to the GPS extent, but to a different extent.

Experiment workflow
Experiment workflow

So for this experiment I built a lab. And for this lab I want to install, use apps on Android phone. I want to capture their packets, analyze these packets, and then profit, or at least give a Defcon presentation (see image).

So I built a lab, I thought to myself I’ve got this great idea, I’ve got this great hypothesis, what do I need? Well, I bought a Verizon Femtocell1, an original A855 Motorola Droid, a WRT54GL wireless router with DD-WRT2 on it, a sniffing laptop, and Internet connection. And it was like: “I am ready!”

Turns out you don’t need all that stuff to do this analysis. As I went along, I found out I could have done a lot of it in an emulator, which would have taken nothing. But it allowed be to buy some cool shit, using the office company card.

So I bought the Femtocell thinking: alright, when I am using my phone I want to collect the cellular network traffic in addition to the regular network traffic. Because, you know, if I am an app creator, I don’t want people tweaking with my stuff. And generally, I’d rather use the word ‘generally’, cellular networks are safe.

And so, if I were an app creator, I would be like: “Oh, no, no, no, I won’t send sensitive data over Wi-Fi. I’ll make sure it’s over the cellular network because it’s a lot harder to tap”. And so I thought I’m gonna buy a Femtocell, I’m gonna intercept that. And then I bought an Android phone because I believed this was cheap on eBay. And then I already had the router, and the laptop, and the Internet.

Well, it turns out, after doing a bit of research (and I didn’t dig too much into this), that app creators aren’t that shiesty yet. I needn’t register with the cell network, all I needed was just to be able to pop open my phone, turn on Wi-Fi, get to the Android Market, and start playing around, which was absolutely great.

Essence of the methodology applied
Essence of the methodology applied

So I created this amazing testing methodology (see image), where I would take the applications, I would purchase and install them, I would have initial usage, regular usage, and then uninstalling the application – for each of the applications. Because then I would know what traffic is going on and at exactly what point.

And then during the operating system tests, I would have first usage (when you first install it on your phone), light usage, then regular IDLE time, and then I would re-set the phone. And it seems I would cover just about every aspect of every application and OS, so that I would make sure I wouldn’t miss any shiesties that went on.

I thought to myself: you know, if I am an OS creator, every 30 minutes I would wanna know where you are at. Or if I am an application owner, I might want to know every 15 minutes who you’ve called in last 15 minutes.

So this was my amazing original testing methodology. What actually happened was I just took a massive PCAP file for each app, SSLStripped it, TCP-dumped it, and made a drinking game out of it.

Apps to test within the experiment
Apps to test within the experiment

And so for the apps that I tested, I thought, alright, I’m gonna do a mix. I’m gonna do Angry Birds, as I used it earlier; this really sketchy Chinese app – I don’t read Chinese, and I was like: well, that looks sketchy. It’s kind of my sketchy test. And then random applications that no one uses, I would scroll all the pages down. I got the main ones – Facebook; I got just browsing the Web on Google, and that actually happened by accident, but I found some fascinating things, so I decided to keep it in; Intelli Pilot, which is for airline pilots using log books; Mousetrap, which is a game; Pandora; Red Phone, which is an amazing application created by Moxie Marlinspike – and if you guys don’t know, it’s a little app for an Android phone, where you can have secure conversations with other people. And I thought, you know, I like Moxie but I kind of want to see if he is doing anything there – we will find out more about that later. And then Words With Friends and Zynga Poker, because I am absolutely addicted to Words With Friends, and if any of you guys play Scrabble, you’ll know.

So it’s obviously a work in progress. I have a lot of applications I’d like to test, I have a lot of different operating systems I’d like to test, and basically what I’ve been trying to work towards is a standard methodology, so that I could kind of hammer through it when I am not working, which seems to be rare.

Things to work with
Things to work with

So what I have to work with is a bunch of PCAP files and SSLStrip outputs (see image). And the reason I did this was because I figured if I am an attacker, it’s really easy just to run SSLStrip. So let’s just assume SSL is useless. And so I decided to take all the information that someone who would be attacking you would have.

Now, later on I want to see absolutely everything sent back to the company. I want to add a root certificate to the phone, and just collect all the information. But for right now, I’ve got a bunch of packet captures and SSLStrip outputs, and that alone has proved very interesting.

Analyzing packets with Wireshark
Analyzing packets with Wireshark

So let’s start analyzing. With each packet capture, I first peered around within Wireshark; I analyzed some of the conversations, some of the IPs being addressed; I ran strings, ran ‘grep’1 – pretty easy Linux stuff; and then I did some DNS play, and I did some Argus2 flows.

So first – Wireshark (see image). If you guys haven’t done any network analysis, Wireshark is kind of the ‘de facto’ GUI tool. It’s really nice just to kind of poke around and scroll. And you can just visually look at and see: oh, this is HTTP traffic, DNS traffic etc. It’s a good starting point, kind of gives you a feel for the lay of the land.

Further analysis workflow with Tshark
Further analysis workflow with Tshark

But command line tools are more powerful, and so I moved to Tshark (see image). So I wanted to basically read the packet captures, look around, see what was happening, look at the conversations that were happening. And so I ran Tshark and I tried to see who these applications are talking to; what services they are using; who they are sharing it with. And then I Whois’ed like a mofo.

Servers Zynga communicates with
Servers Zynga communicates with

So if we would take one specific example, we could look at Zynga. How many people here know what Zynga is? Oh, nice, I should have assumed this is Defcon, you guys are smart. And for those who have not raised their hands, Zynga is kind of the new mogul, if you will, for Android games. Zynga makes a large number of those idle time games. You might need those games when you kind of sit down, as I stated earlier, you know, you haven’t got anything to do and you want a game that you can play for 5 minutes, or you are on a conversation that is really boring, and you can play for 5 minutes. And they are widely popular because people have a lot of idle time.

And so I took a look at Zynga, and I was like: “Who is Zynga talking to?” (see list above). Well, if you look at the image – and I am not gonna read each one out – there is a lot. There’s TapJoyAds, Mydas, Facebook, Macromedia, Adobe.

And when you look at this, there are a couple on there that you are curious. I mean this was for Zynga poker. And so, you are playing poker on your phone, and you don’t really expect to call out to Mydas.mobi. What does this company do? What does Mkhoj do? What does TapJoyAds do? What information is being sent to these third parties that you have absolutely no idea?

And this is where we get to the privacy element. When you downloaded that application for poker, did you really understand that you were gonna be sending your statistics, your Android version, your location potentially, to Zynga Poker? And why do they need to know it?

So this is kind of a really big question: what is being sent on your phone without you knowing? I brought this question up because I was thinking about what applications I have.

Using strings to analyze packet capture file
Using strings to analyze packet capture file

Well, one of the easiest, and quick, and dirty ways to look at a packet capture file and see where it goes – is the strings. String just basically outputs text strings that are inside a packet capture file, or any file for that matter.

Basically what I did was I looked for interesting things. And you see on here, one of the first things I did was the HTTP, trying to see what websites are being contacted. And then I had a couple of key phrases. And I did this for a couple of reasons. One – I don’t wanna have to go through every packet capture file, trying to figure out what password was going through.

Apps exposing password and email
Apps exposing password and email

I made some basic things to look for. I made ‘w00tdefcon’ my password. I made my username droid.net.foren@gmail.com. And for those of you thinking: “Oh, he left the password, I’m gonna go log in” – yes, I did, I don’t care. I am not using it anymore.

Basically what I did was I put a kind of cookies within the packet capture files and get it to instantly grep for w00tdefcon, and I could instantly see where my password was shown. I could instantly see that, rather than trying to figure out what the password field is called, or whether it is in the GET parameter, the POST parameter, whatever. I just got w00tdefcon going over the wire. I also did it for my email address.

Well, when you look at it, w00tdefcon is definitely going over Facebook, obviously. I mean you have to log into your Facebook to actually get the alerts that you wanna see about your best friend and the update.

But what we did realize was that Facebook, Words With Friends and Zynga Poker also know my email – again, that can be assumed. But beyond that, any attacker can capture this. And this is why I am really tied on the privacy element. And this is where privacy kind of intercedes with what I am doing.

So we have it to where as an attacker, or as a ‘man in the middle’, I now know potentially your password for your Facebook, your Facebook URL domain name, etc. – all because you are playing poker. I now know potentially where you are located and potentially what you are doing.

User data collected by 'Words With Friends'
User data collected by ‘Words With Friends’

And when we dealt with Words With Friends, we could see very interesting things (see image). And so this is an output that I got from running strings on Words With Friends. And again, this is all very simple stuff. I mean, I am not doing extremely advanced packet analysis. This is quite simple. If you have a Linux VM, or Linux Box, you can all do this. So I ran strings on the capture file that I had, and I found this: I found Words With Friends is sending a couple of interesting things. One – they are sending the network that I am on. So they know whether I am using AT&T, T-Mobile, Verizon, etc. So now they know that my phone is Verizon. And they know that I am a Millennial, which I found was kind of weird. I think they are guessing. But they also know what my build version is for my Android, they know what apps I am using. Some of these are hypotheses, and some of these are facts. And I am guessing they know when I am located based on my distance to the ad server. They know what screen resolution I am using, what language I am using, etc.

And for my testing some of this didn’t quite show up because I hadn’t fully set up the phone, so they were not able to send a couple of things because it didn’t have anything in there. But it definitely let you know what they are sharing.

Other data 'Words With Friends' knows
Other data ‘Words With Friends’ knows

And so I continued on. Okay, they have got my email, they also have my device ID. They also know that my last word was ‘about’, and I got 18 points for it. But that’s pretty obvious. But in any case, they know the time I was accessing it, they know my email, they know my device ID etc. (see image).

What’s important about this? Well, I can only assume that my device ID is only my device. I can also assume, or I feel safe assuming, that Zynga has a number of different applications, and in every application, they know that my device is using it. They know that I am using their game 1, 2, 3, and 4.

But then we tie this to some larger eco-system issue – advertising, and it is one of the largest eroders of privacy, because they want to know as much as possible. They want to know exactly who you are so they could market directly to you.

So we take it from Zynga, and we move to a higher level of the advertising agencies that Zynga cooperates with. Now that they have my device ID from Zynga, they can also tie it to their other partners’ sites if they can pull my device ID. And then they can tie all these separate pieces of information, that I never really thought someone else would be collecting, and they are starting to put it all together.

Nearby Wi-Fi access points determined
Nearby Wi-Fi access points determined

Continuing on the theme of strings – and this is the one that I had no idea, and I really do not appreciate – when you on your Android phone go to Google, and if your Wi-Fi is on, Google instantly knows, and it sends back to home all of the Wi-Fi access points around you. These are the people that live around me; they are creative people (see image).

I mean, how many of you knew that every time a user is going: “Oh, what’s around me? I wanna Google for something” – boom, opening up a web browser, – “Oh, my Wi-Fi is on, Google actually knows all the Wi-Fi access points that are beaconing”. No one really thinks of this.

And I think: “Oh, well, that’s fine, what’s up with Wi-Fi access points?” But if you heard of Skyhook, what Skyhook basically does is it uses Wi-Fi to geolocate people. And Google is trying to essentially squeeze Skyhook out of the market, or at the very least not pay them, because they are going: “Okay, if you are at this location, and these wireless access points are around you; if you are using an application and someone else is using this application, and they can see this Wi-Fi, they also know where you are at”. In the meanwhile, you don’t even have your GPS on.

Let’s say you are a super paranoid person, and you’re like: “No, no, no, my GPS is off, Google will not find me”. Well, now they know wireless IPs are around you. They also know exactly, because of those wireless IPs, where you are located. That’s kind of scary.

Google determining device location
Google determining device location

What is also sent to Google (and I totally anonymized this, the X’s are me) is my exact address (see image). I was looking through the captures, and I was like: “DevLock, what’s that? Is that, like, my phone is locked?” It’s actually device location. And when I have my GPS on, and I am just browsing to Google, they instantly know where exactly I am pinpointed to a dot. I’m not kidding. I mean, I remember back when GPS was kind of sketchy, they could figure out you are in this area, but now they know you are standing right here, where those X’s are the latitudinal and longitudinal lines.

They are also sending a bunch of other information that I haven’t decoded yet but I plan on looking through. But at the same time, there is a lot of easy stuff to be picked up right away. I mean, why does Google need to know my specific exact location when I am browsing? And again, you could say it’s useful because they need to know when you search for pizza – what pizza is nearby. Completely agreeable, but then we have to move a layer higher in terms of privacy.

Well, because Google is collecting my location, who are they sharing it with? Who else knows where I am located when I am browsing for pizza? Do they share it with their advertisers? Do they share the time that I search for it? And then you starting to think this is getting a little creepier.

Advertisers now know when I’ve got a hankering for food, or whenever I search something. They potentially know where I search for that. They know what time I search for it. And they can start to build a profile about you.

In terms of privacy, I personally think that we shouldn’t have advertisers that know your most intimate detail without you even understanding what are you sharing. Google does not instantly say: “Hey, if you don’t have your GPS on, we’ll just send the Wi-Fi access points around you. If you turn off GPS location assistance for web applications, we’re gonna figure out your Wi-Fi to try to guess where you are at”. They don’t allow you to turn that off.

Other data harvested by Google
Other data harvested by Google

We also continue through, and we get a little bit more interesting information as well (see image). We have the lan_mac address, the wan_mac address, the wl_mac address, and the lan_ip, what type of wireless you are using, what type of protocol it’s using, what the active wireless is, and I could keep reading through it. And just for no reason Google knows how long the uptime has been on my device, the actual IP of it, the load average, etc. They know all of this just because I popped open my web browser. It’s quite crazy and it’s insanely disturbing in terms of privacy because you think: “Why do they need to know this?”

Reasons for collecting data
Reasons for collecting data

Now we’re gonna look at why data is collected, and I am hypothesizing here. We’ve got advertising. We’ve got statistics, because obviously they want know whether you are using an application, what you are using it for etc. – we have advertising. We have legitimate business purposes, so maybe an application needs to know what version of Android you are using, so it’s affective – we have advertising again. We have things that can increase the value of a service, so it’s helpful when you search for pizza, where you get results for pizza near you – we have advertising… I hope I’ve made my point here, I’m repeating advertising over and over because advertising is, again, the number one reason why they collect this information. And maybe they could collect it without advertising, but it’s number one reason that they use.

Why do they need to know where you are locate? To give you the correct ads. Why do they need to know Wi-Fi around you? Well, it helps find your location, which helps you get proper ads. Why do they need to know your device version? Well, if they’re gonna run an ad on your screen, they need to know the resolution. It’s creepy.

So in terms of this, what about man-in-the-middle attacks? Traffic can be intercepted. You can use SSLStrip, exploits, etc. And so just from sniffing your traffic from you hopping on my Wi-Fi point, I know you have applied your latest carrier upgrade. I know you decided to root your phone and put Gingerbread on it from certain community. I know exactly what device you have, where you’ve been, etc.

This is all very fascinating information. If I know that you are using a phone that your carrier decided not to upgrade and that there are active vulnerabilities in it, I also know that I can screw you. I know that if I have one of the exploits probably released at Defcon or that I made myself, targeting Gingerbread, I know I’m gonna have 100% effective rate.

And I know this just because you are playing Angry Birds on my wireless network or, not that have I done this yet, you happen to be within a certain foot range of my Femtocell, and are on my cellular network. But that’s a whole other talk.

And so I am going to go back to the original question I asked: to what extent do participants in the cellular ecosystem (OS creators, app creators, carriers, etc) respect user privacy? My answer is – not very much.

And the reason for this is that no one has really called out for it. I mean, we are at Defcon, I think a lot of people here really believe in privacy. We’ve got the Electronic Frontier Foundation who fights for our privacy. And yet, for convenience we sacrifice our privacy. For the ability to Google something out of your pocket, to run a little GPS location on your phone and find out where are you going, to do any of these things – you are sacrificing your privacy.

And that’s fine. If that’s something you wanna do and you are comfortable with, that’s fine. But myself, I don’t like Google knowing my neighbors have very creative wireless access point names. I don’t like Google knowing exactly where I am located when I browse a website. I don’t like it that when I use turn-by-turn navigation, Google knows exactly when I am taking those turns. I don’t mean to pick on Google, they just happen to have the phone that I was able to obtain. You can only postulate what’s on an Apple iPhone, what’s on a BlackBerry etc.

Let’s assume a beautiful perfect world where all of these companies believe in your privacy, which is definitely false. But let’s say they do. Well, aside from that, what about the people that have access to your traffic? As I stated before, I did all of these, I ran strings and collected all these packets so long ago on my own network. And I was able to analyze this. But how many people are able to write filters, put out a Wi-Fi point, put out a Femtocell? And as soon as you walk by, you’ve instantly shared so much information about yourself. If you just happen to walk by a store, and they happen to know certain details about you, they could change their advertising.

All this private information is available, and companies are not protecting it, this is all sent in clear text. And I hope I am not giving these people ideas, this is just from my own head: imagine an idea where your Android is sharing all this information.

You happen to wander past a supermarket. And all of a sudden you are saying: “Oh, I really do feel hungry for Mountain Dew; I do really want some chips”. And all of a sudden I see an advertising that says: “Mountain Dew, really cool!” And I think to myself: “Oh, perfect timing. I’m gonna get myself some Mountain Dew”. But is that exactly right? I mean, I may have bought a Mountain Dew beforehand, but it’s almost abuse of trust and abuse of your privacy – to take a look into your private thoughts and your phone and share it out with the world.

Let’s apply this towards politics. All of this information is bought, sold, and traded. All of this information is being shared on your phone, which you don’t quite realize. And so I am into politics, and I am like: “You know, I want to be the perfect politician”. Well, I’ve got these advertisers over here that allow it on all these applications that you download and use. And you’re like: “Oh, sweet! I wanna use the free version, it’s ad supported rather than paying $1.99”. But when you do that, you give away a little bit of privacy. It’s not just that you are giving away “Oh, I’m gonna ignore that ad” – you are giving away your privacy.

And they take this information. They take what device version you have, where you are located, where you’ve been, what you like to buy, what you like to search for – and they correlate it all together. It’s their goal to find out who you actually are. These companies might say: “Hey, we don’t collect your real name”. But when someone else buys this data that’s in your unique ID, they correlate it with other public data, and they kind of jumble it all together. They know who you are, they know where you live, they know your favorite color.

So taking this along the political idea, imagine the future where politicians know every constituent in their district. They know this because their cell phones are in that district. They know this because all of those cell phones have exposed what everyone does. They know what people search for, they know whether they read ‘The Huffington Post’ or ‘Fox’. They know a percentage of people who do this. They know what grocery stores you shop at.

And they can take this data as it correlates along all these different avenues, they can combine it, and they can go: “Oh, hey, my district is 68% likely to vote Democrat, or Republican”.

Let’s get into this a little closer. It’s like: “Oh, my district is 55% likely to vote Republican. But some of those people, like 10%, are not likely to vote, which means I probably need to pitch myself towards the Democratic side. Well, okay, if I am pitching myself towards the Democratic side, I see that most of the people on this side are value shoppers. They like to shop for the value brands. Well, now I shop for the value brands. I talk about value when I talk to my constituents. I make them think: “Oh my gosh, this politician is me, I believe in him, I can affiliate with this person, I am going to vote for them!” But what they don’t realize is that whoever this person is, they have tailored themselves meticulously to look exactly like the person that these people are, that these people want to see. This is the power that correlating data has. This is the power that just using these applications on your cell phone by sharing out your device ID, your locations, your Wi-Fi access points – can share.

So, kind of going back to my hypothesis, I said software applications and operating systems transmit private user information to the author or third parties without the user’s knowledge and consent. Throughout this talk, I’ve stated personal data / identifying data is sent. Whether it’s encrypted or not – it can be SSLStripped.

I promised you a little bit about some of the applications I did. I did test Red Phone. I did take a look at “Hey, I know Moxie believes in privacy, but does he walk the steps that he talks?” And I actually couldn’t intercept his traffic. Fascinating… I and was like: “Well, let’s look into this”. Apparently, Moxie, having broken SSL, knows how to secure shit. And he does.

So it’s definitely doable. These companies can make your information private, they can make it so that I cannot intercept it on the wire. But the problem is – they don’t, they view it as not important data, but maybe not quite ‘not important’, but not sensitive. They don’t take the time to protect it. They don’t want to invest in servers that can encrypt it over the wire. And so I thought, okay, even if they do, it is still exploitable. For a Facebook application you can use SSLStrip, username and password – boom, done! Applications send usernames, passwords, contact lists, location data, usage statistics, timing of activities, and other content.

Privacy breach hypothesis confirmed
Privacy breach hypothesis confirmed

Were we right? Yes. We were right on all of those counts, all of them (see image). And this is only using very basic packet analysis on these applications. And when I say basic, I didn’t want to make this talk overly technical because I was hoping to make a bridge between kind of more technical field of network forensics, and the non-technical field of privacy, and kind of merge them together, so that there is a little bit for both sides.

But if you are a privacy advocate, I would highly recommend you taking a look at network forensics, being able to look and see what the applications are sharing, what these operating systems are sharing. When I go to Google.com, do I know my Wi-Fi access point is showing, do I know my IP address is showing, etc? And that is all with very basic testing.

Concluded facts
Concluded facts

And so to kind of conclude, I don’t think a lot of people realize – your smartphone erodes your privacy, and you agreed to it. And that’s the worst part – you agreed to it, it’s allowed. And until people start saying: “Hey, companies, we don’t want information shared, you don’t need to know the wireless access points around me when I am trying to look for something, specifically you don’t need location data sharing” – this will keep happening. But the problem is you agreed to it. You scrolled through the pages and pages of stuff and said ‘OK’.

And even beyond that, a lot of people don’t understand the importance of the data they are sharing. They don’t understand that when they are sharing this information, they are sharing it with everyone.

Essentially, what I wanted to say is that what can be seen as benign information that companies collect, can be intercepted, it can be correlated, it can be tied to you, and it can be used for nefarious purposes. And you should be aware of this.

Future research to carry out
Future research to carry out

And if you are curious about more applications, what I am trying to do is I am trying to build out from my original research. Essentially, what I did was a very manually intensive, time intensive process. I am working on automating that process. I would like to have an emulator that downloads and installs every application on the Android market, runs it and analyzes its packet capture data for passwords and other shiesty-looking, important information. And that’s what I’m gonna be working on.

What I’m also gonna be working on is advertising. It’s kind of another region, and maybe it is just me, but I don’t quite realize the fact that there are tons and tons of ad networks on every page, looking at everything you do. And you might think when you browse from Engadget over to Slashdot, those are two separate websites. But what you don’t realize is that one advertizing company has a cookie or an ad on both of those websites. And they are able to see: “Oh, when he was done on Engadget, he hopped over to Slashdot, this guy is a nerd. I am gonna advertise to him nerd products”. And it’s effective, there is a reason they do it. They do it because it’s more effective, and they make money out of it. And to a certain extent, having targeting advertisement is useful. But to another extent, it just gets creepy because of the way that information can be used.

And so in terms of all this, what I also would like to do is I would like to map out these ad networks. I would like to find out who is talking to whom, where the service is located at, who is accessing and what information, and what can happen from that. So that’s where I am hoping to go.