Measuring Information Architecture

Download Report

Transcript Measuring Information Architecture

Interfaces for Intense
Information Analysis
Marti Hearst
UC Berkeley
This research funded by ARDA
1
Outline
• A contrast
– Search vs. Analysis
• Goals for three user groups
– Intelligence Analysts
– Biomedical Researchers
– Investigative Reporters
• Our current interface design
2
Search vs. Analysis
Search:
Finding hay in a haystack
Analysis:
Creating new hay
3
UIs for Search vs. Analysis
• Search:
– A necessary but undesirable step in a
larger task
– UI should not draw attention to itself
– UI should be very easy to use for everyone
• Analysis:
– The larger task
– UI can be more of a “science project”
– But UI should have “flow”
4
General Goals
• Support hypothesis formation / refutation
• Flow
– Easy creation, destruction, and cataloging of
connections and coverage
– Easy movement between multiple views
• Represent:
–
–
–
–
–
Multiple supporting clues
Conflicting evidence
Uncertainty
Timeliness
Non-monotonicity
5
Intelligence Analysts
6
Intelligence Analysts
• I have recently interviewed several
active counter-terrorist analysts
• Great diversity in
– Goals
– Computing environments
• Biggest problems are social/systemic
• Many mundane IT problems as well
7
Mundane IT Problems
•
•
•
•
•
System incompatibilities
Data reformatting
Data cleaning
Documenting sources
Archiving materials
8
Intelligence Analysts: Problem 1
• Look at a series of reports, images,
communication patterns;
• Try to build a model of what is going on
– Follow leads
– Compare to previous situations
• Recent problem:
– Groups are changing their behavior patterns
quickly
• Very little use of sophisticated software tools
9
Intelligence Analysts: Problem 2
• Given a large collection
• “Roll around” in the data
– See what has been “touched”
• Tools should indicate which parts of the
collection have been examined and which have
yet to be looked at, and by whom
– View data in several different ways
• Data reduction methods such as MDS, SVD,
and clustering often hide important trends.
10
Intelligence Analysts: Problem 2
– Don’t show the obvious
• e.g., Cheney is president
– Don’t show what you’ve already shown
– Only show the most recent version
– Show which info is not present
• Changes in the usual pattern
• Something stops happening
11
Intelligence Analysts: Problem 3
• Prepare a very short executive summary
for the purposes of policy making
– Really the culmination of a cascade of
summaries
– Reps from different agencies meet and
“pow-wow” to form a view of the situation
– Rarely, but crucially, must be able to refer
back to original sources and reasoning
process for purposes of accountability
12
BioInformatics Researchers
13
BioInformatics Example 1
• How to discover new information …
• … As opposed to discovering which
statistical patterns characterize
occurrence of known information.
• Method:
– Use large text collections to gather
evidence to support (or refute) hypotheses
– Make Connections
– Gather Evidence
14
Etiology Example
• Don Swanson example, 1991
• Goal: find cause of disease
– Magnesium-migraine connection
• Given
– medical titles and abstracts
– a problem (incurable rare disease)
– some medical expertise
• find causal links among titles
– symptoms
– drugs
– results
15
Gathering Evidence
stress
magnesium
CCB
migraine
magnesium
SCD
magnesium
PA
magnesium
16
Gathering Evidence
CCB
migraine
PA
magnesium
SCD
stress
17
Swanson’s Linking Approach
• Two of his hypotheses have received
some experimental verification.
• His technique
– Only partially automated
– Required medical expertise
18
BioInformatics Example 2:
• How to find functions of genes?
– Have the genetic sequence
– Don’t know what it does
– But …
• Know which genes it coexpresses with
• Some of these have known function
– So …infer function based on function of coexpressed genes
• This is problem suggested by Michael Walker and others
at Incyte Pharmaceuticals
19
Gene Co-expression:
Role in the genetic pathway
Kall.
g?
Kall.
h?
PSA
PSA
PAP
PAP
g?
Other possibilities as well
20
Make use of the literature
• Look up what is known about the other
genes.
• Different articles in different collections
• Look for commonalities
– Similar topics indicated by Subject
Descriptors
– Similar words in titles and abstracts
adenocarcinoma, neoplasm, prostate, prostatic
neoplasms, tumor markers, antibodies ...
21
22
Formulate a Hypothesis
• Hypothesis: mystery gene has to do with
regulation of expression of genes leading to
prostate cancer
• New tack: do some lab tests
– See if mystery gene is similar in molecular
structure to the others
– If so, it might do some of the same things
they do
23
Investigative Reporter Example
• Looking for trends in online literature
• Create, support, refute hypotheses
24
Investigative Reporter Example
What are the current
main topics?
Clustering
What are the new
popular terms?
Corpus-level statistics,
Co-occurrence statistics
How do they track with
the news?
Contrasting collection
statistics
25
Investigative Reporter Example
How long after a new
Star Trek series comes on
the air before characters
from the series appear in
stories?
How often do Klingons
initiate attacks against
Vulcans, vs. the
converse?
Named-entity recognition
Creating a list of terms
Apply the list to a
Subcollection
Create regex rules with
POS information
26
LINDI
File Help
Summary
Term Set
New
Merge
a
u
m
All terms: *
Diseases:
c
y
z
emphysema cancer
hypertension …
Query
x
x
Analysis
Document Set
All documents: *
WHO: organization =
world health organization
Thank you!
For more information:
bailando.sims.berkeley.edu/lindi.html
28