Document Scanner
Download
Report
Transcript Document Scanner
A Document Skimmer
Overcoming the soda-straw effect
Alex Krstic
Kelly Van Busum
Suzanne Vogel
Outline
Problem Overview
Prior Work (briefly)
Our Work
Demo
Study
Follow up
Overview: Problem
Listening is slower than reading, but
speeding up decreases comprehension
Speed up only by increasing reading rate,
with NO scanning or skimming
Skip ahead only by one line or one page
Overview: Goal
Identify features to increase speed
Enable the user to adjust these features
Trade off speed and comprehension
Prior Work: Features
Scan at levels of detail (LODs)
Skip 1 segment within a level
Speech Skimmer [1] & Aster [2]
Speech Skimmer [1]
Refs
1.
2.
Speech Skimmer (Arons, 1993)
Aster (Raman, 1994)
Prior Work: Implementation
Segment document, semantically
Speech divisions: Long pauses [1]
Text divisions: Structure boundaries [2]
Filter out words or sounds within
segments
Spaces [1]
Latter P number of words or seconds [1]
Detailed (lower-level) info [2]
Our Work: Features
Hierarchy
Dropping Words/Phonemes
Spatial Sound
Our Work: LOD Hierarchy
Our Work: Dropping Words/Sounds
Dropping common words
Change text to phonemes
Remove phonemes without lexical stress
toz, suhn
computing mpyootng
Blending phonemes (Drop spaces)
what up whuhtuhp
Our Work: Spatial Sound
Hearing more than one sound source at
the same time
2, 3 or 4
Each source plays different segments of
the file
Some sources dominant over the others
Spatial orientation
Our Work: Screenshot
Copyright 2003, ASK (Alex, Suzanne, Kelly)
User Evaluations
3 informal, 4 systematic
Asked questions, navigate to answer
Hear text in various forms, then asked
questions
User Evaluations, 2
Hierarchy
Sound (Word) Removal
Difficult to explain “hierarchy concept”, underused
Removing common words was liked (29% of words)
Either really liked or hated phonemes (29%, 10%)
Spatial Sound
2 sounds worked ok, 3 or more didn’t
*Lots of different perspectives!
New Questions…
How much does voice selection matter?
How much would training help?
What is the relationship between phonemes and speed?
What is the role of prior knowledge?
How does this relate to Ctrl-F?
Acknowledgements
Peter Parente
Pointed us to programming resources (BATS;
wxPython, Python Numeric 22.0, Win32
libraries)
Gave us Python sample code for speech
synthesis and spatial sound
Experiment participants
(Informed consent requires confidentiality)
Programming Resources
BATS NCDemo – http://www.sourceforge.net
OpenAL.dll, MSVRTD.dll, pyTTS.py, pyOpenAL.py (I
think)
Python – http://www.python.org/
Win32 library for Python –
http://starship.python.net/crew/mhammond/
Python Numeric 22.0 library –
http://www.pfdubois.com/numpy/
wxPython GUI library – http://www.wxpython.org/