emailviz - UC Berkeley School of Information

Download Report

Transcript emailviz - UC Berkeley School of Information

Email Viz
Future Directions
Marti Hearst
UC Berkeley
1
Outline
• Important Infoviz Principle
• Tough Data Mining Problem
– The infrequent important thing
• Interfaces tailored to user goals
– Intelligence Analysts
– Investigative Reporters
• Promising Future Directions
– Integration of task, viz, and content analysis
– Mixed-Initiative Interaction
2
Important InfoViz Principle
Distinguish between:
PRESENTATION
ANALYSIS
3
Tough Data Mining Problem
• It’s easy to see the main trends
• But often we want the rare but
unexpected and important event:
– Russian oil company example
– Schwarzenegger and Enron
– Cigarettes and kids
– Person on the periphery who is working
stealthily to influence things
• Deep throat
4
Intelligence Analysts
5
Intelligence Analysts
• Interviews wit active counter-terrorist
analysts
• Great diversity in
– Goals
– Computing environments
• Biggest problems are social/systemic
• Many mundane IT problems as well
6
Mundane IT Problems
•
•
•
•
•
System incompatibilities
Data reformatting
Data cleaning
Documenting sources
Archiving materials
7
Intelligence Analysts: Problem 1
• Look at a series of reports, images,
communication patterns;
• Try to build a model of what is going on
– Follow leads
– Compare to previous situations
• Recent problem:
– Groups are changing their behavior patterns
quickly
• Very little use of sophisticated software tools
8
Intelligence Analysts: Problem 2
• Given a large collection
• “Roll around” in the data
– See what has been “touched”
• Tools should indicate which parts of the
collection have been examined and which have
yet to be looked at, and by whom
– View data in several different ways
• Data reduction methods such as MDS, SVD,
and clustering often hide important trends.
9
Intelligence Analysts: Problem 2
– Don’t show the obvious
• e.g., Cheney is president
– Don’t show what you’ve already shown
– Only show the most recent version
– Show which info is not present
• Changes in the usual pattern
• Something stops happening
10
Intelligence Analysts: Problem 3
• Prepare a very short executive summary
for the purposes of policy making
– Really the culmination of a cascade of
summaries
– Reps from different agencies meet and
“pow-wow” to form a view of the situation
– Rarely, but crucially, must be able to refer
back to original sources and reasoning
process for purposes of accountability
11
Investigative Reporter Example
• Looking for trends in online literature
• Create, support, refute hypotheses
12
Investigative Reporter Example
What are the current
main topics?
Clustering
What are the new
popular terms?
Corpus-level statistics,
Co-occurrence statistics
How do they track with
the news?
Contrasting collection
statistics
13
Investigative Reporter Example
How long after a new
Star Trek series comes on
the air before characters
from the series appear in
stories?
How often do Klingons
initiate attacks against
Vulcans, vs. the
converse?
Named-entity recognition
Creating a list of terms
Apply the list to a
Subcollection
Create regex rules with
POS information
14
Integration
• TAKMI, by Nasukawa and Nagano, IBM
systems Journal 40(4), 2001
• The system integrates:
– Real tasks (CRM, patent analysis)
– Content analysis
– Information Visualization
15
TAKMI, by Nasukawa and Nagano, 2001
Docs containing “windows 98”
16
CRM
TAKMI, by Nasukawa and Nagano, 2001
17
TAKMI, by Nasukawa and Nagano, 2001
18
TAKMI, by Nasukawa and Nagano, 2001
19
TAKMI, by Nasukawa and Nagano, 2001
20
TAKMI, by Nasukawa and Nagano, 2001
21
TAKMI, by Nasukawa and Nagano, 2001
22
Mixed-Initiative Interaction
• Balance control between user and agent
– In Spotfire demo, system adjusts axes after
“other” category hidden
– EDA:
• User selects a subset of data based on interestinglooking grouping
• System then does stats on this subset in the background
while user continues to work
• Then system notifies user of interesting trends
• See the AIDE system:
– St. Amant, R., Dinardo, M. D., and Buckner, N. (2003).
Balancing Efficiency and Interpretability in an Interactive
Statistical Assistant. Proceedings of IUI.
23