Peter Eckersley moves on with his instructive talk on browser uniqueness. The final part of his speech explicates the issue of browser fingerprints that are constant over time, lists the browsers best coping with this trackability, explains the difference between fingerprintability and debuggability and outlines some defensive measures for avoiding trackability.
Another really interesting question you might have is: “Sure, you can identify people, but don’t these fingerprints change over time? Are they a stable way to track someone if they could upgrade their browser or install a new font and certainly the fingerprint would be different?” And so we decided to check this.This graph shows the set of people who visited Panopticlick exactly twice. So we wanted to throw away people who might have been playing with the site, trying to optimize their uniqueness or tweak things. We just wanted people who came exactly twice, with at least an hour or two in between the two visits they did. So they didn’t just hit ‘Reload’, they came back later.
And then we said, as the function of how much later they came back, what was the probability that their fingerprint had changed? And you can see as more time has passed, the likelihood that the fingerprint was different when they came back goes up. We measured this with cookie, so there’s a cookie that you could reliably use to see the same person, and then you can see if the fingerprint changes.
So actually, fingerprints don’t last very long, the half-life of these things is 4 – 5 days. So perhaps that’s actually really good time. Perhaps fingerprints, while they are instantaneously identifying, aren’t a stable way to track people over time. Unfortunately this turned out not to be true. So the way we did that is we said: “Okay, your fingerprint has changed, can we do some kind of fuzzy matching algorithm that will see if your fingerprint later, after the change, was uniquely tieable to your fingerprint beforehand?”
And I implemented a really hacky algorithm to do this. It just says: if only 1 of those 8 measurements has changed, and it hasn’t changed very much, and that maps to a unique fingerprint from beforehand, then let’s guess that it’s you. And it only tried to do this if you had something quite revealing like Flash or Java installed. So this algorithm guesses about two thirds of the time. But when it does guess – it’s 99% accurate, so it has a 99% chance of correctly guessing which fingerprint you changed from, and less than 1% chance of getting it wrong.
guessed 2/3 of the time
(99.1% correct; 0.9% false-positive)
If you use Torbutton, it zaps the plugin list. The Torbutton developers knew about a lot of these attacks and anticipated them in various ways, so Torbutton – you haven’t use Tor, you can just use the little Torbutton Firefox extension, and you are in pretty good shape.
If you use an iPhone or an Android and you manage the cookie problem – you are in pretty good shape.
And lastly, you know, this small percentage of systems that were behind firewalls and appeared to have the same fingerprints: we saw about 3% of IP addresses that had multiple visitors coming from them exhibiting that kind of behavior. So that 3% of systems maybe has some kind of anonymity, although it’s a bit hard to distinguish that from the browser’s private browsing mode. And it would also be the case that if you implemented the clock skew hardware based fingerprinting, you could probably tell people apart even if they have a firewall and a clone fingerprint.
So currently there aren’t very many web browsers that do well. We also saw some other really interesting things. One interesting thing was that sometimes privacy enhancing technologies are the opposite. Something that’s designed to hide your identity turns out to be the unique thing that tracks you. If you install a Flash blocker for instance, that has a unique signature that you can tell, okay, this browser has Flash installed but we’re not gonna get an answer back from the Flash plugin when we ask it for fonts. So people who have done that were all pretty much unique.
The noteworthy exceptions to this problematic rule about privacy enhancing stuff to feeding your privacy were NoScript and Torbutton which both are fingerprintable, but the amount you gain from having them turned on outweighs the amount you lose from them.
Another lesson here is that if you are designing an API1 that’s gonna run inside the browser, you should never ever offer some call that returns a gigantic list of system information about the machine you are running on. So this was true both for the plugin list where you just ask navigator for plugins and you get back a list of all of them, and the version numbers of all of them. That’s gonna make a lot of people unique.
Similarly, don’t return the list of all the fonts. If you really need to show people a particular font, make them ask about the specific font rather than being able to ask about all fonts at once. Perhaps an even better solution to this would be to not have your system fonts display in your browser at all. Perhaps if a website wants to render some rare font like ‘Frankenstein’, it should have to give you the TTF file along with the website.
The problem here is that even if we block the bits of Java and Flash that give font lists back, there are some nasty websites out there showing that you can detect fonts using CSS which is almost unblockable. You just render the font inside an invisible box and then measure how wide it is, so if the user has the font installed the box will be width A, and if they don’t have it installed it will be width B. And there is a little cute website called flippingtypical.com that demonstrates this. So this stuff is hard to block.Another lesson is that fingerprintability trades off against debuggability. So if you look at the User Agent string, just a hypothetical one here, this is typical stuff – it has in there my operating system, I am running X particular hardware platform, my language, precise date that my Gecko2 was built on all this stuff. Why is that in there? Like, why does every website I go to need to know what date my browser was compiled on? The answer is some people thought that maybe one day we’d want to debug something. And when we do, it would be awesome if we’d already logged on the server side all the stuff we would possibly want for debugging a client side issue. And, okay, fair enough – maybe occasionally there is some glitch somewhere where having this version information is useful. But there is a trade-off between privacy and debuggability going on here. And right now browsers are all configured right up at the extreme debuggability and extreme non-privacy end of this spectrum.
And at least, perhaps when you enter private browsing mode in your browser, it should be making that trade-off the other way around. The same is true for plugin list. Right now, if you look in the plugin list and say what version of Flash do you have, it’s not Flash or Flash 10, it’s Flash 10.1 r53. And so all these little facts, you know, you have 10 plugins and you get this version information about all of them – it adds up.
Right back at the start I said there were 4 kinds of attacks you could use fingerprinting for. One is global uniqueness, and in a lot of cases it looks like browsers are globally unique, not all cases but a lot of them. The second case where you have an IP address plus a fingerprint – then you are almost guaranteed to be able to track someone. And you can definitely undelete cookies that people have deleted, and you can definitely link these things across websites.This is a serious privacy problem. Right now, the only things you can do are things the power users are gonna do: you can do NoSript, you can do Torbutton. But you can’t tell your Grandmother or whoever else, like just people who aren’t really comfortable with complicated user interfaces, to use these plugins.
So everyone else is gonna need to wait for the browsers to find a solution to this. Fortunately they’ve sort of started. We have been talking to the Mozilla people and they were interested in attempting to fix some of the stuff, at least in private browsing mode. Google is maybe a step behind but also there’s someone interested in trying to tackle this. So perhaps we can come back in a year or two and say we’ve made a small amount of progress on the problem.
1 – API (application programming interface) is a source code-based specification intended to be used as an interface by software components to communicate with each other.
2 – Gecko is a free and open source layout engine used in many applications developed by Mozilla Foundation and the Mozilla Corporation (notably the Firefox web browser), as well as in many other open source software projects.