Is Metasearching Really Better Searching?

Download Report

Transcript Is Metasearching Really Better Searching?

Is Metasearching
Really Better Searching?
STM Innovations Seminar
London, Friday 2 December 2005
Pete Johnston
Research Officer, UKOLN, University of Bath
UKOLN is supported by:
www.bath.ac.uk
A centre of expertise in digital information management
www.ukoln.ac.uk
Is Metasearching Better Searching?
• What is metasearch?
• Making metasearch work
– The NISO Metasearch Initiative
• Metasearch today
– Metasearch and Google
– Metasearch and "social bookmarking"
A centre of expertise in digital information management
www.ukoln.ac.uk
What is metasearch?
A centre of expertise in digital information management
www.ukoln.ac.uk
What is metasearch?
“Metasearch, parallel search, federated search,
broadcast search, cross-database search, search
portal are a familiar part of the information
community's vocabulary.
They speak to the need for search and retrieval
to span multiple databases, sources,
platforms, protocols, and vendors at one time.”
NISO MetaSearch initiative
http://www.niso.org/committees/MS_initiative.html
A centre of expertise in digital information management
www.ukoln.ac.uk
The search problem
• User wants to find, access, and use items
made available by multiple content
providers
• Content providers make their collections
available through their own separate
“presentation services”
• User interacts with multiple services in
succession, e.g.
– Query Resource Discovery Network (RDN) for
Web resources
– Query Zetoc for journal articles
– etc
A centre of expertise in digital information management
www.ukoln.ac.uk
The search problem
Web Sites
The search problem
• User has to
– Discover different services
– Manage different authentication/access
requirements
– Use different user interfaces for search
– Interpret different result sets
• different metadata
– Manipulate different result sets
• human-readable (HTML)
• but difficult to merge, reuse
• May still not have access to (appropriate
copy of) resource
A centre of expertise in digital information management
www.ukoln.ac.uk
The metasearch solution
• The provision of "metasearch" services that
– enable user to search across the metadata
databases of multiple content providers from a
single interface
– manage multiple result sets and present to user
– manage authentication/access
– (etc!)
• Seamless (to the user) discovery of and
access to heterogeneous, distributed
resources!
A centre of expertise in digital information management
www.ukoln.ac.uk
Approaches to metasearch (1):
cross-searching
• Metasearch service accepts user query
• Sends query to multiple content provider
search targets
• Receives responses from targets
• Presents result sets to user
A centre of expertise in digital information management
www.ukoln.ac.uk
Web Site
Metasearch:
Cross-search
Z39.50, SRW, SRU, etc
Search Targets
Approaches to metasearch (2):
harvesting
• Metasearch service periodically gathers
metadata records from content provider
repositories into local database
• Metasearch service accepts user query
• Executes query on local database
• Presents result sets to user
• Some harvesting services may also
harvest/index copy of resource
A centre of expertise in digital information management
www.ukoln.ac.uk
Web Site
Metasearch:
Harvester
OAI-PMH
Repositories
Cross-searching & harvesting
• Metasearch service may use both in
combination!
• Cross-search
– Latest results returned
– Content provider controls searches available
– May slow overall performance
• Harvesting
– Better performance for user query
– Options for normalisation etc by harvester
– Only as up-to-date as last harvest
A centre of expertise in digital information management
www.ukoln.ac.uk
A hospitable climate for metasearch?
• Metasearch service depends on access to metadata
• Web Services
–
–
–
–
–
–
Standards for providing machine interfaces to applications on Web
Based on HTTP and XML
SOAP (messaging protocol), WSDL (service description), WS-* (!!)
WS not just for search!
Service-oriented approaches, modular applications
Google and Amazon provide Web Services
• "Web 2.0"
– "The Web as platform"
– Recombining data and services from multiple sources
A centre of expertise in digital information management
www.ukoln.ac.uk
The problems with metasearch
• User requires/expects resources from
increasing range of content providers
• What if content provider doesn't implement
standard search/harvest interface?
• Some proprietary APIs, "XML Gateways"
– Scalability
• Some "screen-scraping"
– Parsing of HTML pages to obtain metadata
– Rights issues
– Scalability, volatility
A centre of expertise in digital information management
www.ukoln.ac.uk
The problems with metasearch
• Metasearch services work, but….
• For service provider
– complex, laborious
– fragile, susceptible to change by content
provider
– duplication of effort by service providers
• For content provider
– concerns over efficiency
– concerns over access management
– rights, branding, results presentation/ranking
A centre of expertise in digital information management
www.ukoln.ac.uk
Making metasearch work
A centre of expertise in digital information management
www.ukoln.ac.uk
Making metasearch work
• Effective metasearch requires agreements between
content providers and service providers
– Transport protocol(s)
– Query language(s)
• syntax and semantics
– Metadata schemas
• syntax and semantics
– Metadata quality
• presence of values, formats of literals etc
– Intellectual property rights issues
• how metadata records and resources are presented, used
– Authorisation / authentication
– Disclosure / discovery of collections and services
Andy Powell, "Metasearching: an overview",
Presentation to BCS EPSG Seminar, July 2004
A centre of expertise in digital information management
www.ukoln.ac.uk
The NISO Metasearch Initiative
• Response to concerns of librarians, systems
vendors, content providers
• Aims to enable
– metasearch service providers to offer more
effective and responsive services
– content providers to deliver enhanced content and
protect their intellectual property
– libraries to deliver services that distinguish their
services from Google and other free web services
NISO MetaSearch initiative
http://www.niso.org/committees/MS_initiative.html
A centre of expertise in digital information management
www.ukoln.ac.uk
Task Group 1: Access Management
• Conducted survey of authentication
methods in use
• Developed use cases for authentication in
metasearch context
• Ranked methods by ability to satisfy needs
of use cases
• Recommends either:
– IP-Authentication with a Proxy Server, or
– Username/Password authentication
• Liaison with Shibboleth community
A centre of expertise in digital information management
www.ukoln.ac.uk
Task Group 2: Collection Description
• Metasearch service needs information
about targets available for search/harvest
– Discover collections of potential interest
– Obtain sufficient information to identify a
collection
– Select one or more collections from amongst a
number of discovered collections
– Discover the services that provide access to
the collection
– Select a service with which to interact
– Interact with service
A centre of expertise in digital information management
Collection
description
Service
description
www.ukoln.ac.uk
Metasearch 1
Collection/Service
Knowledge Base 1
Metasearch 2
Collection/Service
Knowledge Base 2
Shared
Collection/Service
Registry
Task Group 2: Collection Description
• Collection Description Specification
– Metadata schema for collection-level
description
– Closely aligned with DCMI Collection
Description Application Profile
– Title, Subject, Size, Language, Item Type,
Owner, Collector, Audience, Rights etc
– Whole/Part relationships
– Collection/Catalogue relationships
– Collection/Service relationships
A centre of expertise in digital information management
www.ukoln.ac.uk
Task Group 2: Collection Description
• Information Retrieval Service Description
Specification
– Describe those digital services that provide
access to collections
– Zeerex
•
•
•
•
Indicates protocol used
Describes access point(s) for service
Describes authentication/authorization requirements
Lists operations/queries supported
A centre of expertise in digital information management
www.ukoln.ac.uk
Task Group 3: Search/Retrieve
• Result Set Metadata
– Metadata schema to describe result set
and record within result set
– To support ranking, branding etc
• Citation Metadata
– Metadata schema for citation components
(based on subset of OpenURL)
A centre of expertise in digital information management
www.ukoln.ac.uk
Task Group 3: Search/Retrieve
• NISO XML Gateway
– Based on SRU ("non-conformant subset")
– Query encoded in URI, transmitted in HTTP GET,
response as XML document
– Three levels of implementation
• Level 0: Any query grammar
• Level 1: Provide description record for database
• Level 3: Support CQL
– Liaison with A9 Opensearch
A centre of expertise in digital information management
www.ukoln.ac.uk
Metasearch today
A centre of expertise in digital information management
www.ukoln.ac.uk
Metasearch and Google
• Google
– Harvests full-text of Web pages by following links
– Makes indexes available for search
– Result ranking based on number of links to page
• Index coverage limited to "visible Web"
– Problems with
• Authentication controls
• Non-persistent URIs
• Non-textual resources
• Even if indexed, low ranking if few links
• No fielded searching
Metasearch and Google
• "Success is as much about what you don’t
search as what you do"
Roy Tennant, "Is Metasearch Dead?"
http://www.niso.org/news/events_workshops/OpenURL-05-Agen-FINAL.html
• Selection is important
• Relevance of results not determined only
by links, citations
• e.g. often useful/vital to select/filter by
audience, purpose of resource
A centre of expertise in digital information management
www.ukoln.ac.uk
Metasearch and Google
• Google interest in indexing "hidden Web"
– Collaborations with repository providers, OCLC etc
– Google Scholar
• Google interest in metadata-based approach?
– Google Base
• Google and Metasearch as complementary
approaches to discovery
A centre of expertise in digital information management
www.ukoln.ac.uk
Metasearch and "Social bookmarking"
del.icio.us
http://del.icio.us/
Metasearch and "Social bookmarking"
Bibliographic metadata added
to item by Connotea
Connotea
http://www.connotea.org/
Metasearch and "Social Bookmarking"
• Simple user-generated metadata
– Typically description plus "tags"
– Capture user perceptions of resources
– Some services adding richer metadata
• Social: merging of personal collections
– Bookmarking services as discovery services
• Connotea as "community-driven recommendation
system" (Lund et al)
• Metadata available via RSS or simple API
– Can metasearch services use/integrate
metadata from bookmarking services?
A centre of expertise in digital information management
www.ukoln.ac.uk
Is Metasearching Better Searching?
• Technical components for metasearch available
• User expectations of coverage mean metasearch
is a cross-domain problem
• However, quality of metasearch dependent on
– metadata quality
– metadata consistency
– …across multiple providers
• Metasearch can complement other approaches
• Metasearch as "enabler"
– supporting construction of many different services
Is Metasearching
Really Better Searching?
STM Innovations Seminar
London, Friday 2 December 2005
Pete Johnston
Research Officer, UKOLN, University of Bath
UKOLN is supported by:
www.bath.ac.uk
A centre of expertise in digital information management
www.ukoln.ac.uk