Document Scanner

Transcript Document Scanner

A Document Skimmer
Overcoming the soda-straw effect
Alex Krstic
Kelly Van Busum
Suzanne Vogel
Outline






Problem Overview
Prior Work (briefly)
Our Work
Demo
Study
Follow up
Overview: Problem

Listening is slower than reading, but
speeding up decreases comprehension


Speed up only by increasing reading rate,
with NO scanning or skimming
Skip ahead only by one line or one page
Overview: Goal


Identify features to increase speed
Enable the user to adjust these features

Trade off speed and comprehension
Prior Work: Features

Scan at levels of detail (LODs)


Skip 1 segment within a level


Speech Skimmer [1] & Aster [2]
Speech Skimmer [1]
Refs
1.
2.
Speech Skimmer (Arons, 1993)
Aster (Raman, 1994)
Prior Work: Implementation

Segment document, semantically



Speech divisions: Long pauses [1]
Text divisions: Structure boundaries [2]
Filter out words or sounds within
segments



Spaces [1]
Latter P number of words or seconds [1]
Detailed (lower-level) info [2]
Our Work: Features

Hierarchy

Dropping Words/Phonemes

Spatial Sound
Our Work: LOD Hierarchy
Our Work: Dropping Words/Sounds

Dropping common words

Change text to phonemes


Remove phonemes without lexical stress


toz, suhn
computing  mpyootng
Blending phonemes (Drop spaces)

what up  whuhtuhp
Our Work: Spatial Sound

Hearing more than one sound source at
the same time




2, 3 or 4
Each source plays different segments of
the file
Some sources dominant over the others
Spatial orientation
Our Work: Screenshot
Copyright 2003, ASK (Alex, Suzanne, Kelly)
User Evaluations

3 informal, 4 systematic

Asked questions, navigate to answer

Hear text in various forms, then asked
questions
User Evaluations, 2

Hierarchy


Sound (Word) Removal



Difficult to explain “hierarchy concept”, underused
Removing common words was liked (29% of words)
Either really liked or hated phonemes (29%, 10%)
Spatial Sound

2 sounds worked ok, 3 or more didn’t
*Lots of different perspectives!
New Questions…

How much does voice selection matter?

How much would training help?

What is the relationship between phonemes and speed?

What is the role of prior knowledge?

How does this relate to Ctrl-F?
Acknowledgements

Peter Parente



Pointed us to programming resources (BATS;
wxPython, Python Numeric 22.0, Win32
libraries)
Gave us Python sample code for speech
synthesis and spatial sound
Experiment participants

(Informed consent requires confidentiality)
Programming Resources

BATS NCDemo – http://www.sourceforge.net





OpenAL.dll, MSVRTD.dll, pyTTS.py, pyOpenAL.py (I
think)
Python – http://www.python.org/
Win32 library for Python –
http://starship.python.net/crew/mhammond/
Python Numeric 22.0 library –
http://www.pfdubois.com/numpy/
wxPython GUI library – http://www.wxpython.org/

Document Scanner

Transcript Document Scanner

Directory