The second part of Peter Eckersley’s Defcon talk called “How unique is your browser?” is dedicated to describing the ‘Panopticlick’ experiment set up by the Electronic Frontier Foundation. The speaker describes different sets of browser fingerprint data, outlines the methods applied for measuring those values and analyzes some of the results obtained.
It will turn out that the 2 most problematic ones – they are all kind of problematic – but the 2 most problematic ones of these plugins here are these fonts (Flash/Java) at the bottom.
So there are a lot of things we didn’t collect but which you could use to make these fingerprints even nastier. And in fact, it turns out we’ve seen some companies in the private sector that will sell you a fingerprinting system that doesn’t just do the kind of 8 things we did but actually also it does a lot of other stuff.
One particularly nasty thing is you can measure the clock skew of the quartz crystal – that is, how much faster or slower than another clock your computer is. And that’s very hard to hide and it’s unique to your hardware rather than just to your software. You can measure the characteristics of the operating system’s TCP/IP implementation. You can measure the order in which the headers show up.
And some of these things we didn’t collect because we just didn’t have time to implement them. Some we didn’t collect because we didn’t know about them. Once we put up the site, we got a lot of emails saying: “Hey, you could collect these things as well”. And lastly, some things like the CSS history, we were not sure would be stable enough to include in a fingerprint without some kind of fuzzy matching code that we didn’t have.
So, the point here is that all of our results should be taken as a kind of optimistic story about how much privacy you have: with a really good fingerprint, the fingerprint is more powerful and more revealing.
So the way we handled the data of our site is we set up a 3-month persistent cookie and we stored an encrypted IP address with a key that we threw away. And we used those primarily to avoid counting you twice. If you came back we wanted to know if you were the same person with the same fingerprint or another person with the same fingerprint, because those two are very different things.
And we had an exception for that, which is if at a particular IP address we saw one cookie A and then cookie B, and then cookie A again, we thought that’s probably, or at least potentially, evidence that there are more than two computers behind that IP address that pass behind the NAT firewall, that have the same fingerprint. And that could be important because it could be a sign that there is a corporate network there and that there is some sysadmin who clones all the same code out to all the machines everyday. And so there are genuinely multiple machines with identical fingerprints there, and that gives the people in that office some protection. So we decided not to treat those as repeat visits from the same browser if they were interleaved.
One thing to note is a lot of people are confused by the numbers we have on our site. Those use the cookies but not the other methods of avoiding double counting, because we had to compute that stuff on the fly for millions of visitors. The data set I am presenting in this talk has more fancy control for that stuff.
So we got a pretty big data set. We had 2 million hits, about a million distinct browser instances there. The people that we were measuring were not representative of the entire web user base. People who come to EFF site are mostly people who know about and care about privacy. However, as I said before, you have to jump through three hoops to not be tractable on the web. So we think this is kind of a relevant data set to ask. The people who block cookies and know about IP addresses and how to hide them and so forth – end up being tractable by their fingerprints instead. And another point is that data in this talk is all based on the first 470,000 instances – the first half of the data set.
83.6% had completely unique fingerprints
(entropy: 18.1 bits, or more)
94.2% of “typical desktop browsers” were unique
(entropy: 18.8 bits, or more)
There is an interesting statistical question you might ask, which is, okay, sure you saw 94% or 84% uniqueness in your data set, but that was only 500,000 people. Would people be less unique if could get data for the whole 1 to 2 billion people who use the web? And this is an interesting statistical question. I have a theory about how to solve it involving Monte Carlo simulations1: you try a hypothesis probability distribution, you run it through a simulation, you see if it produces a graph that looks like the one I showed.
But we didn’t try to do this because in a sense our data set, which is just a measurement of privacy, is not meaningfully representative of 1 to 2 billion browsers in existence. So if someone else has a less biased data set, you could do this statistical question. We didn’t try.Now, any graph that tries to show you everything that’s going on in this data set is gonna be really complicated because it’s half a million data points, but I’m gonna try with a couple of them. This one shows for each category of browsers (see graph), so Firefox, MSIE, Opera, Chrome, Android, iPhone, Konqueror, BlackBerry, Safari, and then I lump together a collection of Lynx and other text mode browsers. For each of those, how good or bad was it from a uniqueness trackability point of view?
All of the desktop browsers aside from Firefox are like that, but without the little tail of non-unique people. So generally desktop browsers are bad. One of the browsers that did well is iPhone. The iPhone does very well. It’s not very fingerprintable, it’s this purple line, and there are quite a lot of iPhones that are not unique. That’s perhaps not surprising because there aren’t yet plugins and font variation on iPhones. Really, all you are talking about is what time zone you’re in, what language you have and maybe which version of the iPhone OS you have. But there’s not very much to fingerprint iPhone with.
Android does almost as well, not quite as well because there are more iPhones than Androids, but those phone browsers look pretty good. In practice of course, they have really bad cookie settings, so people who use them probably get tracked by the cookies. But this was a good result for the phones.Now, if we look at the variables that we measured, we had these 8 measurements (see image). So which ones were the problematic ones? This table measures the first order which they were. So User Agent is pretty bad, it’s 10 bits of information. Every time you log in and the web server logs your User Agent, you’ll expect on average that you are narrowing the population down to one thousandth of what it could have been if someone wants to browse anonymously. The things that were worse were plugins – 15.4 bits, fonts – around 14 bits. So these things that your browser publishes are very revealing. If you wanna ask, okay, what does the distribution of different values for all for these things looks like, you end up with this crazy graph (see graph). It’s in our paper as well, I can try to explain it. It says how many people, for each of these measurements separately (there are 8 different measurements), fell into an anonymity set size of k for each of the measurements.
Anonymity set size of 1 means you are completely unique because of your fingerprints or your font, or your plugins. So you see up here (in the top left section) there are a lot of people who are unique: 200,000 – 250,000 people who are unique just because of the plugins they have installed on their browsers. There were 200,000 (see vertical axis from the top down) who were unique just because of their fonts. 25,000 were unique just because of their User Agent etc.
As you go from left to right, these are less identifying values. And then in the top right part of the graph, we see that having cookies enabled wasn’t a very revealing fact.
1 – Monte Carlo simulations (or Monte Carlo methods) are a class of computational algorithms that rely on repeated random sampling to compute their results.
2 – Torbutton is a Firefox browser extension that enables anonymizing one’s web-surfing.