Understanding CAPTCHA-Solving Services in an Economic Context 3: Evaluation of the Human-Based Services

Having singled out the key metrics, Marti Motoyama now proceeds with an evaluation of the 8 major human-based CAPTCHA-solving services by these criteria.

Service availability stats

Service availability stats

We just went ahead and signed up as a customer on each of those 8 human solver services, and then we submitted a CAPTCHA every 5 minutes over the course of 4 months. We rotated among the various CAPTCHA types between each CAPTCHA submission, meaning that at minute 0 we submitted a Microsoft CAPTCHA, and at minute 5 we submitted a Google CAPTCHA.

Service Availability

The services in general had a fairly high availability, meaning that the service provided an answer to a CAPTCHA that we submitted. And in particular, BypassCaptcha and Antigate had close to 100% availability. During this time period two of the services we studied actually went offline. We suspect that this was due to increasing competition between services. So, the end takeaway from this slide (see image above) is that services have a fairly high availability: over 80% of the time they were able to process our requests across all services. They’re also very competitively priced.

Response Time

Evaluating response time

Evaluating response time

The next metric we looked at was response time: we wanted to learn how long the services take to return responses, and then using that information we can estimate how many CAPTCHAs one worker can solve on a daily basis. Shown here is the CDF of the response time from our study (see left-hand image). We see that the median response time between the 8 services is 20 seconds or better. For most of these services over 80% of the responses are returned within 30 seconds. Using these numbers, we can estimate that one worker can roughly solve between 1000 and 1500 CAPTCHAs in an 8-hour workday, assuming they work 8 hours.

Accuracy and Quality of CAPTCHA Solves

But the answers to the CAPTCHAs are actually pretty useless unless they’re correct. However, we in fact do not know the correct answers for the CAPTCHAs that we submitted. Thus, to assess correctness, using the answers we received for the same CAPTCHA, we assumed that the correct answer is the most popular solution, or unique plurality, if one exists. Otherwise we can certainly assume that all the answers we got were wrong.

Validating the answers

Validating the answers

So, for example, if we got 3 answers back for one CAPTCHA and 2 of those agreed, then we assume that the agreed-upon solution is the correct answer. However, if we got 3 solutions back and they’re all different, we just assume all the answers are wrong. We validated our methodology by randomly selecting about 1025 CAPTCHAs. We labeled them by hand and compared our answers to the ones we got using our methodology. We had a roughly 7% error rate, and we suggest that our methodology is a reasonable approximation of the correct answer (see image).

Using this methodology, we can then begin to break down how accurate each of these services is. Also, we can see whether price affects the quality of the service, that is, if we pay more money, are we getting better responses?

The quality of the services is not really dependent on the cost.

Accuracy and quality rates

Accuracy and quality rates

Shown here (see left-hand image) are the median error rate and response times among the CAPTCHA types for each service. We can see that the services all had error rates below 20%. BypassCaptcha actually had the worst error rate at close to 20%, while the defunct service CaptchaGateway had the worst response time at about 21 seconds. Antigate and CaptchaBot were both cheap, fast and accurate, while the ImageToText was expensive, fast and accurate, which really seems to suggest that the quality of the services is not really dependent on the cost of that service. BeatCaptchas and Decaptcher have very suspiciously similar characteristics, and we have further evidence to suggest that BeatCaptchas, which costs roughly three times as much as Decaptcher, just resells Decaptcher solves.

Service Capacity

Next, we measured capacity by subjecting each of the services to a varying load of Yahoo captures. Each thread we started would submit a CAPTCHA, wait for a response, and then subsequently submit another CAPTCHA immediately right after. We assumed that a service is maxed out capacity-wise when we started to receive a large volume of error messages, which is what services typically do when they’re overloaded.

Average processing capacity

Average processing capacity

Using this methodology, we were unable to max out Antigate, which processed our requests at a rate of 41 solutions per second. If we extrapolate by assuming that CAPTCHAs take anywhere between 10 to 13 seconds to solve, then Antigate has somewhere between 400 and 500 workers. And the numbers shown here represent the capacity available to us at off-peak hours (see right-hand image).

And combined, as a conservative estimate, all services can solve over a million CAPTCHAs per day combined. And what’s scary as well is that this capacity can grow at a drop of a dime: Mr. “E” says if he gets a large volume of CAPTCHAs, he’ll just call more workers to jump online and start solving it for him.

So, what we have done so far is characterize the services, showing that they’re capable of solving a large volume of CAPTCHAs accurately and within a reasonable amount of time. Now we’ll start to take a look at the people involved in the solving.

Read previous: Understanding CAPTCHA-Solving Services in an Economic Context 2: Software and Human-Based CAPTCHA Solvers

Read next: Understanding CAPTCHA-Solving Services in an Economic Context 4: Labor Demographics

Like This Article? Let Others Know!
Related Articles:

Leave a comment:

Your email address will not be published. Required fields are marked *

Comment via Facebook: