Webarc-slides-archiving-09-v2 - UMIACS Wiki

Download Report

Transcript Webarc-slides-archiving-09-v2 - UMIACS Wiki

Search and Access Strategies
for Web Archives
Sangchul Song and Joseph JaJa
Institute for Advanced Computer Science Studies
Department of Electrical and Computer Engineering
University of Maryland, College Park, Maryland, USA
May 5, 2009
Archiving 2009
1
Background
• Web archives present unique resources of
complex, dynamic, and linked information at an
unprecedented scale and covering large temporal
contexts.
• Fast and cost-effective search and access
strategies for web archives are quite challenging,
especially when search and browsing are
conducted within a temporal context.
• Our work: A prototype system that enables users
to easily search and access web archives using
high level queries.
May 5, 2009
Archiving 2009
2
Full-text Search a Web Archive
(in a better way)
Q1. Search web archives using a combination of terms and time spans.
Our approach provides an intuitive interface, coupled with effective
storage organization and indexing schemes.
Q2. How to handle time-constrained search (“Sept 11” before 2001)?
Our approach provides much higher efficiency than the typical “Search all
first, then filter out” approach. We use a ranking function that is time
dependent.
Q3. How are the returned results grouped and ranked?
We develop a ranking scheme that groups similar (either spatially or
temporally) results together, and score each group separately.
May 5, 2009
Archiving 2009
3
Screenshots
May 5, 2009
Archiving 2009
4