Having singled out the key metrics, Marti Motoyama now proceeds with an evaluation of the 8 major human-based CAPTCHA-solving services by these criteria.We just went ahead and signed up as a customer on each of those 8 human solver services, and then we submitted a CAPTCHA every 5 minutes over the course of 4 months. We rotated among the various CAPTCHA types between each CAPTCHA submission, meaning that at minute 0 we submitted a Microsoft CAPTCHA, and at minute 5 we submitted a Google CAPTCHA.
The services in general had a fairly high availability, meaning that the service provided an answer to a CAPTCHA that we submitted. And in particular, BypassCaptcha and Antigate had close to 100% availability. During this time period two of the services we studied actually went offline. We suspect that this was due to increasing competition between services. So, the end takeaway from this slide (see image above) is that services have a fairly high availability: over 80% of the time they were able to process our requests across all services. They’re also very competitively priced.
But the answers to the CAPTCHAs are actually pretty useless unless they’re correct. However, we in fact do not know the correct answers for the CAPTCHAs that we submitted. Thus, to assess correctness, using the answers we received for the same CAPTCHA, we assumed that the correct answer is the most popular solution, or unique plurality, if one exists. Otherwise we can certainly assume that all the answers we got were wrong.So, for example, if we got 3 answers back for one CAPTCHA and 2 of those agreed, then we assume that the agreed-upon solution is the correct answer. However, if we got 3 solutions back and they’re all different, we just assume all the answers are wrong. We validated our methodology by randomly selecting about 1025 CAPTCHAs. We labeled them by hand and compared our answers to the ones we got using our methodology. We had a roughly 7% error rate, and we suggest that our methodology is a reasonable approximation of the correct answer (see image).
Using this methodology, we can then begin to break down how accurate each of these services is. Also, we can see whether price affects the quality of the service, that is, if we pay more money, are we getting better responses?
Next, we measured capacity by subjecting each of the services to a varying load of Yahoo captures. Each thread we started would submit a CAPTCHA, wait for a response, and then subsequently submit another CAPTCHA immediately right after. We assumed that a service is maxed out capacity-wise when we started to receive a large volume of error messages, which is what services typically do when they’re overloaded.Using this methodology, we were unable to max out Antigate, which processed our requests at a rate of 41 solutions per second. If we extrapolate by assuming that CAPTCHAs take anywhere between 10 to 13 seconds to solve, then Antigate has somewhere between 400 and 500 workers. And the numbers shown here represent the capacity available to us at off-peak hours (see right-hand image).
And combined, as a conservative estimate, all services can solve over a million CAPTCHAs per day combined. And what’s scary as well is that this capacity can grow at a drop of a dime: Mr. “E” says if he gets a large volume of CAPTCHAs, he’ll just call more workers to jump online and start solving it for him.
So, what we have done so far is characterize the services, showing that they’re capable of solving a large volume of CAPTCHAs accurately and within a reasonable amount of time. Now we’ll start to take a look at the people involved in the solving.