Visualization of AAG Paper Abstracts

Download Report

Transcript Visualization of AAG Paper Abstracts

Visualization of AAG Paper
Abstracts
André Skupin
Dept. of Geography
University of New Orleans
AAG Pittsburgh, April 5, 2000
AAG Conference Abstracts
Web Search Engine Interface
Research Motivation I
Methodology
• Geography’s role in information
visualization
– geographic concepts
• regions
• scale
– cartographic techniques
• generalization
• labeling
– GIS technology
• data integration
Research Motivation II
Application
• Developments in Academic Geography
– based on geography’s written output
– generalizable for any corpus of documents
Data Capture & Pre-Processing
• Source Data:
– abstracts submitted to AAG 1999 Hawaii
– complete abstracts as text file
– 2220 abstracts
• Pre-Processing:
– Separation into three parts:
• author information
• abstract text
• keywords chosen by authors
Keyword Component Indexing
• (1) extract keywords chosen by authors
• (2) break keywords into components
• (3) match components against content of all
abstracts
• result:
– all abstracts indexed
– overall richer then only author-chosen keywords
– vector-space model with 2220 docs & 741 terms
Spatialization
• projection of elements of a high-dimensional
information space into a low-dimensional
representation (Skupin & Buttenfield 1997)
– > project document/keyword matrix into 2D
• Technique: Self-Organizing Map (SOM)
– input: raw document/keyword matrix
– output: two-dimensional grid of neurons with weight
for each keyword
Base Map Creation
• Implementation: SOM_PAK & C++
• 1. Choose SOM Dimensions
– e.g. 85 x 115 neurons
• 2. Train Grid of Neurons
– each neuron gets weight for each keyword
– preservation of high-dim. document topology
• 3. Apply SOM to Data Set
– documents assigned to single neurons
• 4. Assign unique locations to documents
Base Map of AAG Abstracts
• Complexity
– > Generalization ?
– > Scale ?
• Labeling
– > Weighted Index ?
• Visualization
– > GIS Software ?
High-Dimensional Clusters Projected onto Map
Hierarchical
Coarse SOM
K-Means
Multi-Scale Spatialization w/ Labels
Map Design for 2D Spatialization
Visual Hierarchies
Geographic Space
Information Space
Research Directions I
Applications
• visualize trends in geography
– author trajectories through time
– subject emergence
– geography of geography
Papers by ZIP Code
Research Directions II
Techniques
• Cluster Solutions
– U-matrix (-> contiguous clusters in 2D)
– AutoClass (-> with optimized cluster numbers)
– quantify performance of cluster solutions
• Visualization
– multi-band thematic visualization
SOM Plane
“GIS”
SOM Plane
“visualization”
SOM Plane
“urban”
Color Composite
“GIS” “urban” “visualization”: Full Extent
Color Composite
“GIS” “urban” “visualization”: Zoom-In