During his USENIX talk “How Does Your Password Measure Up? The Effect of Strength Meters on Password Creation” Blase Ur, computer security and privacy researcher with Carnegie Mellon University, presents a thorough study of password strength meters in terms of their effect on password creation process.
Hi, I’m Blase Ur from Carnegie Mellon University, and I’ll be telling you about password meters. We look at password meters as any kind of visual feedback of password strength given during the creation time of the password.So, for instance, if I go to PayPal and start making an account, they’ll ask me to make a password. And as I start making a password, comes out and gives me some little orange indicator and it’s telling me: “Your password is just fair” (see left-hand image). And the idea of these password meters is to encourage users to create stronger passwords.
Password meters are widely used: sites from Google to WordPress to USENIX’s own site have password meters for when you’re making an account. The open question here is: “Well, as community, we don’t really know what, if anything, these password meters do?”Furthermore, password meters come in all different shapes and sizes (see right-hand image), so these are just a couple of screen grabs from Alexa’s Top 100 global websites. You can see there’s really a wide variety of visual appearances of password meters. Is one of these better than the others? Are some just outright not very good? So, we really wanted to look into this.
In particular, our two main research questions were:
1) How do password meters affect the composition, guessability, creation process, and memorability of passwords as an addition to user sentiment?
2) What elements of meter design are important?
So, we present the first large-scale experiment on how the visual and scoring aspects of meters affect password properties.
Then, 48 hours after part 1, participants received an email to return to the second part of the study, in which they would re-enter the password and then answer a survey about how they remembered the password, or, as the case may be, didn’t remember the password.Now I’d like to go into our 15 different conditions. The easiest way to think about our 15 different conditions is actually in 4 main groups. First: our control conditions, and I’ll go through all of these in detail in a moment. Then we had conditions with visual differences, some with scoring differences, and some with both visual and scoring differences. Let me start with our two control conditions, which are the conditions to which we compared all of our others. Our first control was having no meter, that is, no feedback on password strength. And our second control condition was having a baseline meter, a standard password meter.
And we created a standard password meter, this baseline, based on meters used in the wild, on Alexa’s top 100 global websites. I’ll show you exactly how this meter looks and works in a few moments. We designed this meter so that a password consisting of 8 lowercase letters – that is essentially minimally meeting our stated requirements – would only fill one-third of the meter.Higher scores were possible in two main ways: first, having a longer password, so, in many of our conditions having a password with 16 characters, no further restrictions, would fill the password meter. Secondly, you can have a password with more different character classes: so, for instance, in many of our conditions an 8-character password with an uppercase letter, lowercase letter, digit, and symbol, would also fill the meter. Here is how our baseline password meter looked (see left-hand image): this is the page on which participants created their passwords, it’s based on Windows Live’s account creation page. So, they start typing in the password, and you’ll notice there is a visual bar, and this bar is non-segmented, that is, the bar could be filled to any degree, and, basically, after every key press there could be a big change or a small change in the bar. You’ll also notice, above the bar we have a word corresponding to how much of the bar is filled. And this word goes from bad to poor, to fair, to good, and then to excellent. Next to the word we have a suggestion for making the password stronger. For instance: consider adding an uppercase letter or making your password longer (see right-hand image). So, to do a little bit more, we perform a dictionary check against OpenWall’s mangled wordlist, which is a cracking dictionary; and if it’s in this cracking dictionary, we tell them: your password is in our dictionary of common passwords (see left-hand image). And you’ll notice by now: the meter’s been changing color. It gradually changes from red to orange, to yellow, and eventually to green, to the point where they’ll eventually fill up the meter and receive the word “excellent” (see right-hand image).