Transcript Notes
Document Collections
cs5984: Information Visualization
Chris North
Where are we?
•
•
•
•
•
•
•
Multi-D
1D
2D
Hierarchies/Trees
Networks/Graphs
Document collections
3D
•
•
•
•
•
•
Design Principles
Empirical Evaluation
Java Development
Visual Overviews
Multiple Views
Peripheral Views
Structured Document Collections
• Multi-dimensional
• author, title, date, journal, …
• Trees
• dewey decimal
• Networks
• web, citations
Envision
• Ed Fox, et al.
• Multi-D
• similar to
Spotfire
Unstructured Document Collections
• Focus on Full Text
• Examples:
• digital libraries, encyclopedia
• Web, homepages, photo collections
• Tasks:
•
•
•
•
search, keyword
Browse
Themes, subjects, topics, library coverage
Size, distributions
Visualization Strategies
•
•
•
•
•
today
Cluster Maps
Keyword Query today
Relationships
Reduced representation
User controlled layout
Cluster Map
• Create a “map” of the document collection
• Similar documents near
• Dissimilar document far
• “Grocery store” concept
Document Vectors
•
•
•
•
“aardvark”
“banana”
“chris”
…
Doc1
1
2
0
Doc2
2
1
0
Doc3
0
0
3
• Similarity between pair of docs =
•
• Layout documents in 2-D map by similarity
• similar to spring model for graph layout
…
Cluster Algorithms
• Partition clustering:
Partition into k subsets
• Pick k seeds
• Iteratively attract nearest neighbors
• Hierarchical clustering:
Dendrogram
• Group nearest-neighbor pair
• Iterate
Kohonen Maps
• Xia Lin, “Document Space”
• samal, ying
•
http://faculty.cis.drexel.edu/sitemap/index.html
Themescapes, Cartia
• PNL
• Mountain height
= Cluster size
WebSOM
• http://websom.hut.fi/websom/
Map.net
• http://maps.map.net/start
Cluster Map
• Good:
•
•
•
•
Map of collection
Major themes and sizes
Relationships between themes
Scales up
• Bad:
• Where to locate documents with multiple themes?
» Both mountains, between mountains, …?
• Relationships between documents, within documents?
• Algorithm becomes (too) critical
Keyword Query
• Keyword query, Search engine
• Rank ordered list
• “Information Retrieval”
Tilebars
• Hearst, “Tilebars”
• reenal, xueqi
• http://elib.cs.berkeley.edu/tilebars/
VIBE
• Korfhage, http://www.pitt.edu/~korfhage/interfaces.html
• Documents located between query keywords using spring model
VR-VIBE
Keyword Query
• Good:
• Reduces the browsing space
• Map according to user’s interests
• Bad:
• What keywords do I use?
• What about other related documents that don’t use these
keywords?
• No initial overview
• Mega-hit, zero-hit problem
Assignment
• Thurs: Document Collections
• Bederson, “Image Browsing”
» Rui, anusha
• Card, “Web Book and Web Forager”
» mrinmayee, ming
• Demo your hw3: tues or thurs
Next Week
• Tues: 3-D data
• Kniss, “Interactive Volume Rendering with Direct Manip”
» xueqi, mahesh
• Thurs: Workspaces
• Robertson, “Task Gallery”
» supriya, varun
• Upson, “AVS”
» christa, jun
• Thanksgiving break
• Tues 27: Debates
• Kobsa, “Empirical comparison of comm infovis systems”
» kunal, zhiping
Upcoming Sched
•
•
•
•
•
•
Tues: 3-D data
Thurs: Workspaces
Thanksgiving break
Tues 27: Debates
Thurs 29: How (not) to lie with visualization
Dec: project presentations
• Dec 7: CHI 2-pagers due, student posters due