Read previous: Faces of Facebook: face recognition technologies
In this part of the presentation, Alessandro Acquisti explains facial taxonomy components, analyzes DB types in statistical re-identification, and introduces the experiments that were held in this domain.
– Face detection
– Face recognition
– Facial identification
– Facial ‘inferences’
Imagine that there are, as often in literature on statistical re-identification, two databases: an un-identified database and an identified database. In Latanya Sweeney’s example, the identified database was the voter registration list for Massachusetts voters. The un-identified database was a sensitive database of medical discharged data – obviously un-identified because hospitals wanted to share that information but they didn’t want to share it with the names of the people suffering from the different diseases. Latanya showed that you could connect the two.
So in our story the un-identified database could be images that you find on maybe match.com – a dating site; or maybe AdultFriendFinder – a dating ‘plus’ website; or prosper.com – a financial site where people look for micro funding, micro loans and often use photos because some researchers showed that profiles with photos are more likely to get funding, but they don’t use their names because they often reveal sensitive information about themselves such as their credit scores. Of course un-identified faces are also yours and mine when we walk in the street, to a stranger we are an anonymous face.
Then there are of course the identified databases, and I would argue that for most of you in this room, for most of us, there are already somewhere up in the Internet identified facial images. They may come from Facebook, if you have joined Facebook and put a photo of yourself and your real name. They could come from LinkedIn; they could come from organizations and governmental databases and so forth.
Face recognition finds a match between two images of possibly the same person, and because of this we can use identified database to give the name to a record in an un-identified database. But not only that, the story we are telling is that with the personal information that you find online, once you have a name, you can look for more information such as maybe the social security number, or maybe the sexual orientation. The paper which came out a couple of years ago showed how you could predict sexual orientation of Facebook users based on the orientation of their friends, or maybe credit scores, and because of this, if you are able to infer information for the identified database, and if you are able to find the match through recognition to the un-identified database – you close the circle, you end up connecting the sensitive data to the un-identified up till then anonymous person.
So this is the story, in a nutshell. The story is that your face is truly the veritable link between your offline persona and your online persona or personas, your many online identities. Eric Schmidt said that maybe when we turn eighteen we should be allowed to change our names to recreate from scratch our reputation. Well, but the problem is it is much harder to change your face. Your face is a constant which connects these different personas, and as I mentioned, for most of us there are already identified facial images online.
So the question is whether the combination of the technologies which are described is going to allow these linkages, linkages where it is not just that we give name to an anonymous face, but we actually blend together this online and offline data. And I want to stress this point because the SSN application that we will talk about in a few seconds is just one example, which will be shown here because we thought it would be effort to drive home the point. But it is really just one example of what is going to happen, this blending of offline and online data, this emergence of ‘personally predictable’ information that in a way is written on your face, even if you may not be aware of it.
This may democratize surveillance, and I am not saying this with a good sense, I am saying this with concern. We are not talking simply of constricted and restrained Web 2.0 applications that are limited to consenting/opt-in users, such as maybe Picasa or currently Facebook tagging. We are talking about the world where anyone in the street could in fact recognize your face and make these inferences, because the data is really already out there, it is already publicly available.
So, what will our privacy mean in this kind of future, in this augmented reality future? And have we already created a de facto ‘Real ID’ infrastructure? Although Americans are against Real IDs, we have already created one through the market place.
As I move on to describing the actual experiments that we ran to debate these questions I put here, I want to stress some of the themes I have already highlighted, because I really hope that what will remain after this talk will not be simply the numeric results of these experiments but what they imply for the future. Because I feel that what we are presenting is in a way a prototype, is a proof of concept, but five or ten years out this is really what is going to happen.
So key themes are – your face is a conduit between online and offline world, the emergence of personally predictable information, the rise of visual and facial searches where search engines allow the search for faces. Google recently started allowing pattern based searches for images, but not faces yet. Next key themes are the democratization of surveillance, social network profiles as Real IDs, and indeed the future of privacy in a world of augmented reality.
So let me talk about experiments. We did three experiments. The first one was an experiment in online-to-online re-identification. It was about taking pictures from an online database, which was anonymous or, let’s say, pseudonymous, and comparing it to an online database which was ostensibly identified – and seeing what we get from the combination of two. The second was offline-to-online: we started from an offline photo and we compared it to an online database. And the last one, the last step was to see if you can go to sensitive inferences starting from offline image.
Experiment 1: Online-to-Online Re-Identification
Experiment 2: Online-to-Offline Re-Identification
Experiment 3: Online-to-Offline Sensitive Inferences
And in fact, most users do use primary profile photos which include the photos of themselves. Not only that: many users, according to our estimate – 90% use their real first and last name on Facebook profile. So you see where I am going, the story of the Real ID, de facto Real ID.