200409ECDLCitiViz - Edward A. Fox

Download Report

Transcript 200409ECDLCitiViz - Edward A. Fox

CitiViz:
A Visual User Interface to the
CITIDEL System
ECDL 2004, Bath, England,
September 2004
Nithiwat Kampanya, Rao Shen,
Seonho Kim, Chris North, and
Edward A. Fox
[email protected]
http://fox.cs.vt.edu
Acknowledgements (Selected)

Sponsors: ACM, NLM, NSF (esp. grants CDA-9303152,
9312611; DUE-0121679, 0136690; IRI-9116991)

Faculty/Staff: Lillian Cassel, Debra Dudley, C. Lee Giles,
Lenwood Heath, John Impagliazzo, Deborah Knox, JAN
Lee, Manuel Perez, Naren Ramakrishnan, …

VT (Former) Students: Abhishek Agrawal, Supriya
Angle, Guillermo Averboch, Anil Bazaz, Dennis Brueni,
Robert France, Debby Hix, Marcos Goncalves, Aaron
Krowne, Paul Mather, Kate McDevitt, Fernando Das
Neves, Lucy Nowell, Durgesh Rao, Ryan Richardson,
Hussein Suleman, Bill Wake, Jun Wang, Baoping Zhang,
Jianxin Zhao
Outline








Envision
CITIDEL
Other Related Works
Research Questions
CitiViz Homepage, Architecture
Visualization Strategies, Examples
Evaluation
Conclusions, Future Work
ENVISION

NSF “A User-Centered Database from the Computer
Science Literature” (1991-93)




With ongoing support from ACM
Collected bib/typesetter data, converted to SGML
Scanned thousands of page images
MARIAN search engine –



also applied to the Virginia Tech library catalog
used as part of a prototype object-based DL
with tailored visualization interface (L. Nowell dissertation)
Envision Results Window
Envision – Newer Version
Envision – Newer Version – w. clusters
Computing and Information
Technology Interactive Digital
Educational Library (CITIDEL)

Domain: computing / information technology

Genre: one-stop-shopping for teachers &
learners: courseware (CSTC, JERIC), leading
DLs (ACM, IEEE-CS, DB&LP, CiteSeer),
PlanetMath.org, NCSTRL (technical reports), …

Submission & Collection: sub/partner
collections  www.citidel.org  www.nsdl.org
www.CITIDEL.org

Led by Virginia Tech, with co-PIs:
 Fox (director, DL systems)
 Lee (history)
 Perez (user interface, Spanish support)

Partners
 College of New Jersey (Knox)
 Hofstra (Impagliazzo)
 Villanova (Cassel)
 Penn State (Giles)
CITIDEL Technology Features
•Component architecture (Open Digital Library – Hussein Suleman)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community
based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Fast, in-memory processing, snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
Related Works

1st type of visualization techniques

Predefined document attributes (e.g., author,
date, …)


Semantic information (e.g., categories assigned
to each document)


Envision
Cougar, Cat-a Cone, Map.net ……
Document-query relevance

TileBars, VIBE
Related Works

2nd type of visualization techniques


Automatically derive a collection overview via the
use of text mining
Based on inter-document similarities






Scatter/Gather
Grouper
Galaxy of News
Vivisimo
Kartoo
……
Research Questions

How to combine the two different types of
visualization techniques for CITIDEL?

What text mining technology to use for postretrieval analysis?

What are the key insights, how to support them?

What interaction and navigation strategies should
be used to facilitate visual browsing and analysis?
Addressing the Questions
1.
2.
Developed clustering components to
discovery documents relationships and to
identify subject categories for retrieved
documents.
Developed a new visual interface:
CitiViz HomePage
http://feathers.dlib.vt.edu/CitiViz/index.html
System Architecture


Component based design
Communication between components is
XML based.
System Architecture
Visualizing Components
Java Servlets
Data Source Components
Clustering Components
CitiViz Visualization Strategies

Overview strategy


Aggregation by document clustering to show all the
retrieved documents
Navigation strategies





Overview + detail
Focus + context (Fish-eye view: hyperbolic tree)
Combine tree graphs with scatter plot graphs.
Integrate 2D scatter plot graph with a network of citations.
Apply the aggregate towers technique to solve occlusion
problems of document visualized in the scatter plot graph.
CitiViz Display of Detailed
Information for a Selected
Document:
A Tower of Cylinders
(to solve occlusion problem)
CitiViz initial interface
1.Show me retrieved results from ACM DL
1. Show me retrieved results from ACM DL
2. “algorithm analysis”, by “Donald Knuth”
2. “algorithm analysis”, by “Donald Knuth”
Clustering results
2. “algorithm analysis”, by “Donald Knuth”
3. “data compression”
3. “data compression”
Evaluation Tasks
1.
2.
3.
4.
Given an author and a topic, find a document
published by that author and belonging to that
topic.
Given an author and a publication year, find a
document published by that author and in that
year.
Given a title, find a document having that title.
Find the most recently published paper.
Evaluation Results
Discussion



Users performed chosen tasks faster with CitiViz
than with the standard interface for CITIDEL.
No significant difference for tasks 1 and 3 between
CitiViz when using clustering versus when using
ACM classification
Possible explanation of differences observed with
tasks 2 and 4:


the clustering yields one level towers, and
some users were confused about the multi-level
towers resulting from the ACM classification
Conclusions






Text mining + information visualization
Document clustering provides insights for users.
Overview of document attributes in the 2D scatter
plot
Overview of hierarchical concept map displayed as
a hyperbolic tree supports “focus+context”
navigation.
Integrated the 2D scatter plot space with a network
of citations.
Online tutorial and system – also animation.
Future Work

Add more Data Source Components (DSC)






Current DSC for CITIDEL = DSC for all its member DLs
DSC: send query, parse HTML to XML, cluster result data
Develop other DSCs for different DLs (e.g., NDLTD)
Improve clustering component (S. Kim)
Extend CITIDEL content
Test usability of CitiViz with broad base of users
Summary








Envision
CITIDEL
Other Related Works
Research Questions
CitiViz Homepage, Architecture
Visualization Strategies, Examples
Evaluation
Conclusions, Future Work