Participating in the USENIX Security Symposium, software engineer and security researcher Marti Motoyama presents an in-depth study of automated and human-based CAPTCHA-solving services on the market.
Good afternoon, Ladies and Gentlemen. My name is Marti Motoyama. The title of my talk is Understanding CAPTCHA-Solving Services in an Economic Context. This is the work I did with my co-authors, me and my advisors over at University of California, San Diego.A number of Internet services today are offered for free with the hope that the advertising revenue generated by the sites will produce a profit. But the key assumption behind the business models employed by these sites is that human eyes are viewing those advertisements.
Unfortunately, the free services offered by these sites can be monetized using unscrupulous means (see left-hand image). For example, web-based email accounts can be used for spamming, social engineering attacks can be mounted via identity hijacking. The attackers had even used cloud services for botnet CnC (Command and Control).
In order to make these attacks profitable, the attackers must abuse these services at scale. The Spamalytics study in 2008 observed that over 100,000 spam emails must be sent before achieving even $1 in revenue. And to achieve this scale the attackers will generally rely on automation.As a response to automation, web service providers will typically employ CAPTCHAs. The goal of our work is to evaluate CAPTCHAs as a security mechanism by looking at the CAPTCHA-solving ecosystem. Our approach is to explore a range of data sources; we briefly look at CAPTCHA solvers and evaluate their role in the solving ecosystem.
Next, we characterized third-party human solving services from the perspectives of a customer and as a solver employee. In our role as a customer, we purchased CAPTCHA solves from the human solver services, and as a solver employee we solved CAPTCHA for several services, including our study. Most of our work is concentrated here, as this solving methodology seems to be taking hold as the preferred means of bypassing CAPTCHAs.
Lastly, we also conducted an interview with a human-based solving service site operator, who I’ll refer to as Mr. “E” in the remainder of this talk. Mr. “E” was not a core component of this study, but we used a lot of his feedback to confirm our findings.So, first let me introduce CAPTCHAs. If you have any sort of web presence, you probably solved a CAPTCHA at some point in your life, and shown here are just various examples of them (see right-hand image).
CAPTCHAs have several properties that make them useful to web service providers: they can easily be solved by humans, meaning that you’re not going to scare away your legitimate users; they can easily be generated and evaluated; and lastly, they cannot be easily solved by a program, which would, presumably, prevent automated attacks against these websites.CAPTCHAs were conceived in roughly 2000, but attackers are not simply going to go away just because you’ve erected this barrier in the form of a CAPTCHA. As an initial response to CAPTCHAs, people wishing to abuse these services would typically turn to software-based CAPTCHA solvers. Because the texts on the CAPTCHAs were not greatly obfuscated, the attackers could use various vision techniques to identify the text present on the CAPTCHAs (see left-hand image). This approach seemed to be among the few used up until around 2006, roughly. We looked at several software solvers during the course of our study – more details can be found in the paper. I’ll briefly talk about our experiences with XRumer. XRumer is a market leading forum spammer. In order to post at forums, guest books and bulletin boards, the solver often needs to create an account, thereby necessitating a CAPTCHA solve. Thus the authors of this tool have built in significant CAPTCHA solving abilities. Shown here (see right-hand image) are lots of examples of CAPTCHAs that the tool claims it can break.
We purchased a legitimate copy of this tool for $540 along with several popular forum software products. We tested different versions of the forum software against XRumer. We discovered that in fact XRumer was capable of solving the default CAPTCHAs in contemporaneous versions of the most widely used forum software products.In response, however, the new forum software versions have modified their default CAPTCHAs to not be solvable by XRumer. With version 5.0.9, which was released in August of 2009, XRumer moved to support third-party human solving services, and in particular, Antigate and Captchabot, both of which we study in our particular paper.
What did the authors of XRumer seemingly throw off their hands? To start with, I’d like to begin with an analogy to traditional security. In traditional security the bad guys obfuscate, the good guys recognize. This dichotomy exists between virus software and AV vendors, spammers and email providers. What can be seen is that recognition is inherently the harder task.Contrast is the CAPTCHAs, where the roles are reversed, and the good guys have the easier task of producing harder CAPTCHAs, while the bad guys must use advanced vision techniques to decipher text on a new CAPTCHA (see right-hand image). And today, as far as we know, there doesn’t seem to be a general CAPTCHA solver on the market.