(Lehigh U.) on image understanding, web security and CAPTCHAS

Download Report

Transcript (Lehigh U.) on image understanding, web security and CAPTCHAS

Image Understanding
& Web Security
Henry Baird
Joint work with:
Richard Fateman, Allison Coates, Kris Popat,
Monica Chew, Tom Breuel, & Mark Luk
1
A fast-emerging research topic
Human Interactive Proofs (definition later):
–
–
–
–
first instance in 1999
research took hold in CS security theory field first
intersects image understanding, cog sci, etc etc
fast attracting researchers, engineers, & users
This talk:
–
–
–
–
–
A brief history of HIPs
Professional activities, so far -- incl. the 1st Int’l Workshop
Existing systems -- w/ my critiques
Next steps for the field
In detail: PARC’s PessimalPrint & BaffleText
H. Baird & K. Popat, “Web Security & Document Image Analysis,”
in J. Hu & A. Antonacopoulos (Eds.), Web Document Analysis,
World Scientific, 2003 (in press).
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
2
Early rumblings…

90’s: spammers trolling for email addresses
– in defense, people disguise them, e.g.
“baird at parc dot com”

1997: abuse of ‘Add-URL’ feature at AltaVista
– some write programs to add their URL many times
– skewed the popularity rankings

Andrei Broder et al (then at DEC SRC)
– a user action which is legitimate when performed once
becomes abusive when repeated many times
– no effective legal recourse
– how to block or slow down these programs …
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
3
The first known instance:
Altavista’s AddURL filter
An image of
text, not ASCII

1999: “ransom note filter”
– randomly pick letters, fonts, rotations – render as an image
– every user required to read and type it in correctly
– reduced “spam add_URL” by “over 95%”

Weaknesses: isolated chars, filterable noise, affine deformations
M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for
Selectively Restricting Access to Computer Systems,” U.S. Patent
No. 6,195,698, Issued February 27, 2001.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
4
Yahoo!’s “Chat Room Problem”
September 2000
Udi Manber asked Prof. Manuel Blum’s group at CMS-SCS:
– programs impersonate people in chat rooms,
then hand out ads – ugh!
– how can all machines be denied access to a Web site
without inconveniencing any human users?
I.e., how to distinguish between machines and people on-line
… some variation on ‘Turing tests’ !
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
5
Alan Turing (1912-1954)
1936
a universal model of computation
1940s helped break Enigma (U-boat) cipher
1949
first serious uses of a working computer
including plans to read printed text
(he expected it would be easy)
1950
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
proposed strong test for machine intelligence
6
Turing Tests
How to judge that a machine can ‘think’:
– play an ‘imitation game’ conducted via teletypes
– a human judge & two invisible interlocutors:

a human

a machine `pretending’ to be human
– after asking any questions (challenges) he/she
wishes, the judge decides which is human
– failure to decide correctly would be convincing
evidence of machine intelligence (Turing asserted)
Modern GUIs invite richer challenges than teletypes….
A. Turing, “Computing Machinery & Intelligence,” Mind, Vol.
59(236), 1950.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
7
“CAPTCHAs”:
Completely Automated Public Turing Tests
to Tell Computers & Humans Apart
(M. Blum, L. A. von Ahn, J. Langford, et al, CMU SCS)




challenges can be generated & graded automatically
(i.e. the judge is a machine)
accepts virtually all humans, quickly & easily
rejects virtually all machines
resists automatic attack for many years
(even assuming that its algorithms are known?)
NOTE: the machine administers, but cannot pass the test!
L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using
Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw,
Poland, May 4-8, 2003 [to appear].
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
8
CMU’s ‘Gimpy’ CAPTCHA

Randomly pick:
English words, deformations, occlusions, backgrounds, etc



Challenge user to type in any three of the words
Designed by CMU team: tried out by Yahoo!
Problem: users hated it --- it was withdrawn
L. Von Ahn, M. Blum, N. J. Hopper, J. Langford, The CAPTCHA
Web Page, http://www.captcha.net.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
9
Yahoo!’s present CAPTCHA:
“EZ-Gimpy”

Randomly pick:
one English word, deformations, degradations, occlusions,
colored backgrounds




Better tolerated by users
Now used on a large scale to protect various services:
Well tolerated by users
Weaknesses: a single typeface, English lexicon
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
10
PayPal’s CAPTCHA



Nothing published
Seems to use one typeface
Picks, at random:
letters, overlain pattern

Weaknesses: single typeface, simple grid,
no image degradations, spaced apart
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
11
Cropping up everywhere…

In use today, defending against:
–
–
–
–
–

…have you seen others?
Coming up over the horizon: they can discourage…
–
–
–
–

skewing search-engine rankings (Altavista, 1999)
infesting chat rooms, etc (Yahoo!, 2000)
gaming financial accounts (PayPal, 2001)
robot spamming (SpamArrest, MailBlock, 2002)
also: Overture, Chinese website, CD-rebate, TicketMaster, …
password guessing
denial-of-service attacks
ballot stuffing
…many others
Similar problems w/ scrapers; also, likely on Intranets.
D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing
Office, Stanford Graduate School of Business, Stanford Univ., 2001.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
12
The Known Limits of
Image Understanding Technology
There remains a large gap in ability
between human and machine vision systems,
even in reading printed text
The performance of OCR machines has been systematically studied:
7 year olds can consistently do better!
Researchers have developed
stochastic models of document image degradation:
so we can generate challenging
word images pseudorandomly
S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the
Frontier, Kluwer Academic Publishers: 1999.
H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke,
& K. Yamamoto (Eds.), Structured Document Image Analysis,
Springer-Verlag: New York, 1992.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
13
Can You Read
These Degraded Images?
Of course you can ….
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
but OCR machines cannot!
14
Experiments by PARC & UCB-CS


Pick words at random:
–
70 words commonly used on the Web
–
w/out ascenders or descenders (cf. Spitz)
Vary physics-based image degradation parameters:
blur, threshold, x-scale -- within certain ranges

Pick fonts at random from a large set:
Times Roman (TR), Times Italic (TI),
Palatino Roman (PR), Palatino Italic (PI),
Courier Roman (CR), Courier Oblique (CO), etc

Test legibility on:
– ten human volunteers (UC Berkeley CS Dept grad students)
– three OCR machines:
Expervision TR (E), ABBYY FineReader (A), IRIS Reader (I)
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
15
Results:
OCR Accuracy, by machine
Times R
Times I
Courier O
Palatino R
Palatino I
total
1
0.9
0.8
0.7
fraction 0.6
of words 0.5
correct 0.4
0.3
0.2
0.1
0
Expervis'n
ABBYY
IRIS
OCR machine
Each machine has its peculiar blind spots
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
16
OCR Accuracy:
varying blur & threshold
They share some blind spots
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
17
PessimalPrint:
exploiting image degradations
Three OCR machines fail when:
– blur = 0.0
& threshold  0.02 - 0.08
OCR outputs
~~~.I~~~
~~i1~~
N/A
N/A
– threshold = 0.02
& any value of blur
… but people find these easy to read
A. Coates, H. Baird, R. Fateman, “PessimalPrint: A Reverse Turing Test,” Proc. 6th
IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
N/A
~~I~~
18
Jan 2002:
High Time for a Workshop!
Manuel Blum proposes it, rounds up some key speakers
Henry Baird offers PARC as venue; Kris Popat helps run it
Goals:
Invite known principals: theory, systems, engineers, users
Describe the state of the art
Plan next steps for the field
Organization:
–
–
–
–
–
–
~30 attendees
abstracts only, 1-5 pages, no refereeing, no archival publication
100% participation: everyone gives a (short) talk
“mixing it up”: panel & working group discussions
2-1/2 days, lots of breaks for informal socializing
plenary talk by John McCarthy ‘Father of AI’
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
19
NSF 1st Int’l HIP Workshop
Jan 9-11, 2002, Palo Alto, CA
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
20
HIP’2002 Participants
CMU - SCS, Aladdin Center
Manuel Blum, Lenore Blum, Luis von
Ahn, John Langford, Guy Blelloch,
Nick Hopper, Ke Yang, Brighten
Godfrey, Bartosz Przydatek, Rachel
Rue
PARC - SPIA/Security/Theory
Henry Baird, Kris Popat, Tom Breuel,
Prateek Sarkar, Tom Berson, Dirk
Balfanz, David Goldberg
UCB - CS & SIMS
Richard Fateman, Allison Coates,
Jitendra Malik, Doug Tygar, Alma
Whitten, Rachna Dhamija, Monica
Chew, Adrian Perrig, Dawn Song
RPI
George Nagy
Stanford
John McCarthy
Altavista
Andrei Broder
Yahoo!
Udi Manber
Bell Labs
Dan Lopresti
IBM T.J. Watson
Charles Bennett
InterTrust Star Labs
Stuart Haber
City Univ. of Hong Hong
Nancy Chan
Weizmann Institute
Moni Naor
RSA Security Laboratories
Ari Juels
Document Recognition Techs, Inc
NSF
Robert Sloan
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
Larry Spitz
21
Variations & Generalizations

CAPTCHA
Completely Automatic Public Turing test to tell Computers and
Humans Apart

HUMANOID
Text-based dialogue which an individual can use to authenticate
that he/she is himself/herself (‘naked in a glass bubble’)

PHONOID
Individual authentication using spoken language
Human Interactive Proof (HIP)
An automatically administered challenge/response protocol
allowing a person to authenticate him/herself as belonging to a
certain group over a network without the burden of passwords,
biometrics, mechanical aids, or special training.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
22
Highlights

Theory
– some text-based CAPTCHAs are provably breakable

Ability Gaps
–
–
–
–

vision: gestalt, segmentation, noise immunity, style consistency
speech: noise of many kinds, clutter (cocktail party effect)
intelligence: puzzles, analogical reasoning, weak logic
gestures, reflexes, common knowledge, …
Applications
– subtle system-level vulnerabilties
– aggressive arms race with shadowy enemies
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
23
Disciplines
Participating





Cryptography
Security
Document Image Analysis
Computer Vision
Artificial Intelligence
Needed





Cognitive Science
Psychophysics (esp. of Reading)
Biometrics
eCommerce, Business
….?
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
25
Weaknesses of Existing
Reading-Based CAPTCHAs

English lexicon is too predictable:
– dictionaries are too small
– only 1.2 bits of entropy per character (cf. Shannon)

Physics-based image degradations vulnerable
to well-studied image restoration attacks, e.g.


Complex images irritate people
– even when they can read them
– need user-tolerance experiments
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
26
Strengths of
Human Reading
Literature on the psychophysics of reading is relevant:




familiarity helps, e.g. English words
optimal word-image size (subtended angle)
is known (0.3-2 degrees)
optimal contrast conditions known
other factors measured for the best performance:
to achieve and sustain “critical reading speed”
BUT gives no answer to:
where’s the optimal comfort zone?
G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading:
I. normal vision,” Vision Research 25(2), 1985.
AJ. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word
Recognition,’ Perception & Psychophysics 47, 1990..
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
27
Designing a Stronger CAPTCHA:
BaffleText principles

Nonsense words.
– generate ‘pronounceable’ – not ‘spellable’ – words
using a variable-length character n-gram Markov model
– they look familiar, but aren’t in any lexicon, e.g.
ablithan

wouquire
quasis
Gestalt perception.
– force inference of a whole word-image
from fragmentary or occluded characters, e.g.
– using a single familiar typeface also helps people
M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,” Proc., SPIE/IS&T
Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
28
Mask Degradations
Parameters of pseudorandom mask generator:
– shape type: square, circle, ellipse, mixed
– density: black-area / whole-area
– range of radii of shapes
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
29
BaffleText Experiment at PARC


Goal: map the margins of accurate & comfortable
human reading on this family of images
Metrics:
–
–
–
–

objectiive difficulty: accuracy
subjective difficulty: rating
response time
exit survey: how tolerable overall
Participation:
– 41 individual sessions
– >1200 challenge/response trials
– 18 exit surveys
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
30
BaffleText challenge webpage
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
31
BaffleText user rating
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
32
User Acceptance
% Subjects who say they’re willing to solve a BaffleText…
17% every time they send email
39% … if it cut spam by 10x
89% every time they register for an e-commerce site
94% … if it led to more trustworthy recommendations
100% every time they register for an email account
Out of 18 responses to the exit survey.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
33
Subjective difficulty
tracks objective difficulty
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
34
How to engineer BaffleText

When we generate a challenge,
– need to be able to estimate its difficulty
– throw away if too easy or too hard

Apply an idea from the psychophysics of reading:
– image “complexity” metric: how hard to read
– simple to compute:
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
perimeter** / black-area
35
Image complexity
predicts objective difficulty
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
36
Image complexity
predicts subjective difficulty
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
37
Engineering guidelines

For high performance, image complexity
should fall in the range 50-100; e.g.
50

100
Within this regime, BaffleText performs well:
–
–
–
–
–
100% human subjects willing to try to read it
89% accuracy by humans
0% accuracy by commercial OCR
3.3 difficulty rating, out of 10 (on average)
8.7 seconds / trial on average
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
38
G. Mori & J. Malik, “Recognizing Objects in
Adversarial Clutter,” submitted to CVPR’03,
Madison, WI, June 16-22, 2003.
The latest serious
(known or published) attack…

Greg Mori & Jitendra Malik (UCB-CS)
– Generalized Shape Context CV method
– requires known lexicon – else, fails completely
– expects known font (or fonts) – else, does worse
Results of Mori-Malik attacks (Dec 2002) with
foreknowledge of both lexicon and font:
CAPTCHA
EZ-GIMPY
Attack success rate
83%
Yahoo! + CMU
PessimalPrint
40%
PARC + UCB
BaffleText
25%
PARC + UCB
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
39
BaffleText:
the strongest known CAPTCHA?

Resists many known attacks:
– physics-based image restoration
– recognizing into a lexicon
– typeface targeting
– segmenting then recognizing

Exploits hard-to-automate human cognition powers:
– Gestalt perception
– “semi-linguistic” familiarity
– style consistency
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
40
PARC’s Leadership Role

Published 1st refereed paper on CAPTCHAs:
A. L. Coates, H. S. Baird, R. Fateman, “Pessimal Print: a Reverse Turing Test,” Proc., 6th
IAPR Int’l Conf. On Document Analysis & Recognition, Seattle, WA, Sept. 10-13, 2001.

Hosted first professional event:
1st NSF Int’l Workshop on HIPs, Jan. 9-11, 2002, Palo Alto, CA.

Plays both offense & defense:
– attacks CAPTCHAs; builds high-performance OCR systems
– builds strong CAPTCHAs

Validates using human-factors research:
– human-subject trials measuring accuracy & tolerance
– PARC’s interdisciplinary tradition: social + computer sciences
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
41
The Arms Race

Will serious technical attacks be launched?
– ‘spam kings’ make $$ millions
– two spam-blocking e-commerce firms use CAPTCHAs

How long can a CAPTCHA stand against attack?
– especially if its algorithms are published or guessed

Keep a pipeline of defenses in reserve:
– a long partnership between research & users
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
42
Lots of Open Research Questions

What are the most intractable obstacles to machine vision?
segmentation, occlusion, degradations, …?

Under what conditions is human reading most robust?
linguistic & semantic context, Gestalt, style consistency…?

Where are ‘ability gaps’ located?
quantitatively, not just qualitatively

How to generate challenges strictly within ability gaps?
fully automatically
an indefinitely long sequence of distinct challenges
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
43
HIP Research Community

HIP Website at Aladdin Center, CMU SCS
www.captcha.net

Volunteers for a CAPTCHA usability test?

PARC CAPTCHA experimental software tools:
– FreeType-based, C++, C for Linux etc (T. Breuel)
– Doc. image degradation generator (H. Baird)
– New Gestalt-inspired degradations (M. Chew, UCB)
– PHP4 code for CAPTCHA test web site (M. Chew, M. Luk)
… would a free GPL license be acceptable?

A 2nd HIP Workshop soon?
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
44
Alan Turing might have
enjoyed the irony …
A technical problem – machine reading –
which he thought would be easy,
has resisted attack for 50 years, and
now allows the first widespread
practical use of variants of
his test for artificial intelligence.
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
45
Contact
Henry S. Baird
[email protected]
www.parc.com/baird
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
46
47
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
48
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
49
OCR Accuracy: varying blur
Performance for threshold = 0.7
1
fraction correct
0.8
E TR
0.6
A TR
0.4
0.2
0
0
0.1
0.2
Blur parameter
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
50
OCR Accuracy: varying blur
Performance for threshold = 0.7
1
0.8
fraction 0.6
correct 0.4
E TI
A TI
0.2
0
0
0.1
0.2
Blur parameter
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
51
OCR Accuracy: varying blur
Performance for threshold = 0.7
1
E TR
A TR
0.8
fraction 0.6
correct 0.4
E TI
A TI
0.2
0
0
0.1
0.2
Blur parameter
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
52
Yahoo!’s current CAPTCHA

Randomly pick:
one English word, typeface, distortions, occlusions, background


More tolerable to users
Used on a large scale to protect various services
CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB)
53