|
|
First Use - Alta-VistaIn 1997 Alta Vista sought ways to block or discourage the automatic submission of URLs to their search engine. This free "add-URL" service is important to AltaVista since it broadens its search coverage. Yet some users were abusing the service by automating the submission of large numbers of URLS, in an effort to skew AltaVista's importance ranking algorithms. Andrei Broder, Chief Scientist of AltaVista, and his colleagues developed a
filter. Their method is to generate an image of printed text randomly so that
machine vision (OCR) systems cannot read it but humans still can. In January
2002 Broder stated that the system had been in use for "over a year"
and had reduced the number of "spam add-URL" by "over
95%." A U.S. patent was issued in April 2001. Yahoo's Chat Room ProblemIn September 2000, Udi Manber of Yahoo! described this "chat room problem" to researchers at CMU: 'bots' were joining on-line chat rooms and irritating the people there by pointing them to advertising sites. How could all 'bots' be refused entry to chat rooms? CMU's Prof. Manual Blum, Luis A. von Ahn, and John Langford articulated some desirable properties of a test, including:
CMU's CAPTCHA ResearchThe CMU team developed a 'hard' GIMPY CAPTCHA which picked English words at random and rendered them as images of printed text under a wide variety of shape deformations and image occlusions, the word images often overlapping. The user was asked to transcribe some number of the words correctly. A simplified version of GIMPY (EZ GIMPYU), using only one word-image at a time, was installed by Yahoo!, and is in use currently in their chat rooms to restrict access to only human users. Pioneering CAPTCHA Research at PARCPARC’s research builds on its pattern and image analysis competencies to create reading-based CAPTCHAs. Principal Scientist Henry Baird, an expert on computer vision and document image analysis, also organized the first NSF-funded International Workshop on Human Interactive Proofs, held at PARC in January 2002. Baird also collaborated with Richard Fateman and Allison Coates of UC Berkeley to develop PessimalPrint, a CAPTCHA that uses a model of document image degradations that approximates ten aspects of the physics of machine-printing and imaging of text. This model included spatial sampling rate and error, affine spatial deformations, jitter, speckle, blurring, thresholding, and symbol size. Their paper, PessimalPrint: a Reverse Turing Test, was the first refereed technical publication on CAPTCHAs. Bracing for the Arms Race
Most CAPTCHA research to date has been limited to academic applications. Far more powerful algorithms will be required for commercial CAPTCHAs. As CAPTCHAs become more prevalent, bot programmers are expected to unleash armies of bots bent on breaking them. Most research programs focus on either building CAPTCHAs or breaking them through, e.g., dictionary and computer-vision attacks. PARC research is unique in that it does both: we play both offense and defense. From exploring how to break them, researchers are discovering new techniques for building CAPTCHAs that are less vulnerable. For example, BaffleText uses non-English pronounceable character strings to defend against dictionary-driven attacks, and Gestalt-motivated image-masking degradations to defend against image restoration attacks. User-focused studies
PARC’s user-focused approach makes BaffleText algorithms more commercially viable by ensuring they are not too frustrating for people to use. Drawing on PARC’s long tradition of workplace studies that merge insights from both social and computer sciences, researchers have conducted usability studies to confirm the human legibility and user acceptance of BaffleText images. PARC is seeking corporate partners interested in using PARC CAPTCHA technology inside their own products and applications. To learn more, please contact Julie Chen, Business Development, 650-812-4758.
|
|
Send mail to markluk@parc.com with
questions or comments about this web site.
|