Geen diatitel - Eric Sieverts

Download Report

Transcript Geen diatitel - Eric Sieverts

searching in
Eric Sieverts
University Library Utrecht
IT Department
Institute for Media &
Information Management
(Hogeschool van Amsterdam)
Google and/or/not databases
• why people use search engines
• search engine functionality & technology
• what people miss with search engines
• why people prefer google above databases
• the library’s perspective
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
success of web search engines
• ease of use / simple interface
• fast and good results
(quality of relevance ranking)
• introduction of new technology
• very large "collections"
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
search engine technology
• improved relevance ranking
– probabilistic techniques
– use of popularity / quality as a relevance measure
• suggestions to use the correct term
• automatic categorisation of result sets
• boolean -, citation -, similarity - etc. search
• ......
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
what (many) search engines miss
 to-be-paid-for or licensed information
(bibliographic databases, full-text scientific journals, ....)
 all information hidden in searchable databases
(spiders cannot fill out database search forms)
 non-HTML documents: flash, office-files, pdf
(no fundamental problem, as google, fast and others
demonstrate)
 "real-time" data (too difficult to keep track)
 dynamically, database generated pages
(fear for spider traps; but some seem to catch also those)
 all information not (yet) discovered by the spider
 all information not (yet) on the web
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
do users miss so much with google ?
• google also indexes .PDF , .DOC , .PPT , .XLS , .RTF
• fast-alltheweb also indexes flash
• the web also contains preprints, reports, projects etc. that
are NOT in databases
• many scientists (and others) put copies of their published
articles on their personal websites
that seems fine, but you still get low recall, because:
• the web remains a very fragmented incomplete mess
(behind that simple google screen)
• it is not indexed consistently and in a controlled way
but for many users lousy recall is no problem at all .....
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
why people wouldn't use
professional databases
 apparent simplicity of search engine interface
 too many separate other search systems to address
 overwhelming choice of databases
 overwhelming choice of digital primary sources
 plethora of different database system interfaces
 interfaces crowded with "functionality"
what would you use ?
–if you did't know what's the difference
–if you did't know what you'd miss
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
the provider's perspective
• offer one-stop shopping / single interface
– ovid/silverplatter/erl
– elsevier sciencedirect / scirus
– highwire / ebsco / .....
• integration with customer's information landscape
– factiva
– lexis/nexis
– ....
Eric Sieverts
|
[email protected]
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
the library's perspective
• realise integrated access & single interface to all its precious
(and expensive) professional information resources
• realise more advanced retrieval possibilities while keeping the
advances of controlled indexing as well
central index solution
-
meta-search / portal solution
own choice of advanced local
search engine / retrieval software
problems with indexing remotely
stored data
problems with non-uniform
controlled indexing
Eric Sieverts
|
[email protected]
|
-
-
many remote and locally available
retrieval systems addressed in a
single query (via Z39.50, http, etc.)
restricted to common denominator
of classical boolean functionality
problems with non-uniformity of
fields & controlled indexing
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002
integrated system:
local central
index solution
search
central index
indexingrules for
targets
indexer
internet
full-text
links
document text files
document text files
integrated system:
metasearch /
portal solution
search
configuration
data for
targets
query-generator / result-collector
Z39.50
Z39.50
internal
api
search
search
index
index
files
files
http
internet
Z39.50
http
http xml
Z39.50
search
search
search
search
index
index
index
index
files
files
files
files
.... and how about the future ....
library based search systems
will improve
performance of web search
engines will improve as well
- improved software solutions
- automatic methods of uniform
classification and controlled
keyword indexing
- more flexible xml-based methods
for metasearch-solutions (srw, sru)
- improved access to remote data to
be locally indexed
- xml, rdf metadata & the semantic
web will improve concept- and
meaning- based retrieval on the web
- ever more information will be
available on the web
- newest technologies will continue to
be tested on the web first
- open-archive initiative makes more
scientific information accessible
competition between “
Eric Sieverts
|
[email protected]
“ and "our databases" will continue
|
http://www.library.uu.nl/medew/it/eric
|
IP-lezing 3 october 2002