CAPTCHA does more than you think

Even though you may not have heard of CAPTCHA before, you probably have used it.  CAPTCHA stands for ‘Completely Automated Public Turing test to Tell Computers and Humans Apart’ and is used by companies to make sure whatever transactions are done on their website are done by a person and not a computer generated spam bot request.  The whole purpose of a CAPTCHA is to ensure that a real person is interacting with your site.

reCAPTCHA is Google’s free CAPTCHA program and is considered effective because of the use of two distorted words that only humans can read plus the software detects the IP address of the person solving the CAPTCHA; too many CAPTCHAs solved from the same IP address in a small period of time and IP address is flagged for review.  But it turns out the reCAPTCHA also has a audio service for visually impaired users and instead of using millions of words like the readable CAPTCHA, the audio version only uses 58 words.  Hackers created a program called Stiltwalker that was able to get through the audio CAPTCHA over 99 percent of the time (reference 2).  But Google is constantly improving its reCAPTCHA program and the Stiltwalker hack was quickly shut down.

The most interesting thing about reCAPTCHA is that it is also used to digitize books!  I know that doesn’t make any sense at first, but Google was brilliant in doing two things at once with this project.  There are about 200 million CAPTCHA’s solved every day which is equal to approximately 150,000 hours of work, so Google decided to put that productivity to good use.  There are efforts to digitize old printed books and newspapers;  the books are scanned and using OCR (Optical Character Recognition) they are turned into digital books.  But because the books are old, there are many words that can not be correctly recognized by the OCR.  These unreadable words are made available to reCAPTCHA as images so that when two words are shown to a person via reCAPTCHA – one word is known and one word is an image from an old book.  When the known word is input by the person correctly, it is assumed that the person also types in the word from the image correctly – and now we have humans helping to digitize books!  That unknown word gets put into a number of reCAPTCHA’s so that there is a high level of confidence that the word is truly reflecting the original word in the book.  Currently the reCAPTCHA system is helping to digitize older editions of the New York Times and also Google Books (reference 1).

I have used the reCAPTCHA system many times when ordering products online or filling in forms online, but I had no idea I was also helping to digitize books!  Such a simple solution that solves two problems is just plain smart.  That is my opinion and a survey of one.

Reference:
(1) http://www.google.com/recaptcha
(2) http://www.h-online.com/security/news/item/Google-s-reCAPTCHA-briefly-cracked-1586689.html

About surveyof1

CIO with doctorate in Computer Science, MBA with an emphasis in Information Technology and an M.S. in Physical Education - so interests range from technology to fitness. 'Survey of One' is my opinion as I research topics that are of interest.
This entry was posted in Business and tagged , . Bookmark the permalink.

2 Responses to CAPTCHA does more than you think

  1. katherine wolfe says:

    Groovy. Read ’em all. Verrrry interesting!

  2. Pingback: 2012 Top Five Blog Posts | Survey of one

Leave a comment