PowerPoint 97/2000 format
Download
Report
Transcript PowerPoint 97/2000 format
Finding Resources
On Your Web Site
Brian Kelly
UK Web Focus
UKOLN
University of Bath
Bath, BA2 7AY
Email: [email protected]
URL: http://www.ukoln.ac.uk/
Aims of Talk:
• Review approaches
taken by UK HE and
Public Library
communities to
indexing web sites
• Discussion of findings
• Describe future
developments
UKOLN is funded by the Library and Information Commission, the Joint
Information Systems Committee (JISC) of the Higher Education Funding Councils,
as well as by project funding from the JISC and the European Union.
UKOLN also receives support from the University of Bath where it is based.
UKOLN and UK Web Focus
UKOLN:
• UK Office for Library and Information Networking
• Small research and advisory group based at
University of Bath
• Funded by JISC and LIC (MLAC from 1 April) to
advise Higher Education and Library (and Museums
& Archives from 1 April) communities on digital
networking issues
UK Web Focus:
• JISC-funded post to advise HE community on web
matters
2
Contents
•
•
•
•
•
•
3
Background
A Survey of Two Communities
Comparisons
Interesting Examples
Other Developments
Conclusions
Importance of Indexing
Design and browsing tends to be given priority
But:
• Users will search as well as browse
• Users may not understand navigation structure /
metaphors which are obvious to members of
organisation
• Searching becomes more important as web site
grows
4
Which To Choose?
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Alkaline (Vestris)
AltaVista - Search Intranet
ASTAWare SearchKey
atomz Search (remote)
BooleanSearch
BBDBot
BRS/Search (Dataware)
Compass Server (Netscape)
Cybotics
DataWare BRS/Search
DocFather (formerly
SiteSearch)
dtSearch Web
Excalibur RetrievalWare
EWS (Excite)
Excerpt (Obsolete)
Extense
FAST Search Server
• Findex (code library)
• Folio siteDirector
• FreeFind
(remote)
Indexing
software
•
5
• Glimpse
• Harvest
• ht://Dig
Can choose by
reading reviews, web
sites, etc. or by
• Magnifi Enterprise Server
looking at usage in
• Matt's SimpleSearch
community
• Microsoft Index Server
• ICE
• Microsoft Site Server
• iHound (ICATT)
• MiniSearch (remote)
• Index Search (Xavatoria)
• MondoSearch
• Index Server (Microsoft)
• Muscat
• IndexMySite (remote)
• NetResults (now SearchKey Plus)
• Infoseek - Ultraseek
• Netscape - Compass Server
• Intermediate Search
• OpenText - LiveLink
• intraSearch (remote)
• Perl Scripts
• I-Search
• Perlfect Search
• Isearch
• Phantom (Maxum)
• ITMS
• PicoSearch (remote)
• Isys:web
• Etc.
• Java Applets
• JHLSearch
from <http://searchtools.com/tools/tools.html>
• JObjects QuestAgent
Fulcrum
Which to choose? What
software may be obsolete? What does remote mean?
• Lycos / InMagic
Two Surveys
Two surveys have been carried out:
• Summer 1999: a survey of search engines
used on institutional UK University web
sites (updated recently)
• January 2000: a survey of search engines
used on UK Public Library web sites
6
Characteristics of HE Community
The UK Higher Education community:
• Long-standing involvement in Internet and Web
• Much technical expertise available (e.g. PhD
students)
• Early involvement in web by enthusiasts
• Initially little finance available, so interest in public
domain and open source software
• More financial resources becoming available as
senior managers become aware of strategic
importance of Web
7
Findings: UK HE Web Sites
Main findings of two surveys:
Software
Nos. (Jul) Nos. (Mar)
ht://Dig
eXcite
Microsoft
Harvest
Ultraseek
Other
None
25
19
12
8
7
29
60
32
17
15
6
8
29
51
Totals
160
163
—
• Article published in Ariadne issue 21 <http://www.ariadne.ac.uk/issue21/webwatch/>
• Results (including update on survey) available from:
ht://Dig
60
e Xc ite
50
40
<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>
M ic ros oft
30
Ha rv e s t
20
10
Ultra s e e k
8
0
Othe r
Nos .
None
Popular Products: ht://Dig
ht://Dig
See <http://www.htdig.org/>
• Now used at 32 (up from
25) UK HEIs
• Freely available
• New version released in
December 1999
• Own domain with welldesigned web site
• Robot to index multiple
servers
Oxford Case Study
9
131 servers
438,500 resources
Indexes MS Office, PDF,
etc. files (external parser)
Case Studies produced by Helen
Sargan (Cambridge)
Popular Products: eXcite
eXcite
• Now used at 17 (down
from 19) UK HEIs
• By-product of the eXcite
Internet search engine
• Bug announced in
January 1998. Notice
not updated since!
Time to change?
10
See <http://www.
excite.com/navigate/>
Popular Products: Microsoft
Microsoft
• Several Microsoft indexing tools available (FrontPage,
Index Server, SiteServer, …)
• Most powerful is the
SiteServer indexer
• Now used at 15
(up from 12) UK HEIs
Essex Case Study
11
16 servers indexed
11,500 resources
Constrained searches possible
Indexes MS Office, PDF, etc.
files
Popular Products: Ultraseek
Ultraseek:
• Used at 8 (up from 7)
UK HEIs
• Powerful but
expensive
• See <http://
software.
infoseek.com/>
Cambridge Case Study
12
232 servers
188,000 resources
Weightings given to meta
tags
Useful logs and reports
Popular Products: Harvest
Harvest:
• Now used at 6 UK HEIs (down from 8)
• For IR research use?
• See <http://www.
tardis.ed.ac.uk/
harvest/>
13
Other Popular Products
Output from SWISH
SWISH / SWISH-E
• Used at 5 HEIs
• Dated?
Webinator
• Used at 4 HEIs
• Useful functionality
• See
<http://www.thunderstone
.com/webinator/>
14
Output from Webinator
Use of Third Party Services
Small usage of third parties to provide indexes:
FreeFind (Used at 2 HEIs) and
AltaVista (Used at 1 HEI)
Why not more use by 50+
institutions with no search facility?
15
Benefits from services
provided by popular largescale search engine
Low cost (free?)
Incomplete coverage?
Loss of control, advertising, …
Characteristics of Public Library
Community
Public Library Community:
•
•
•
•
Relatively new to Internet and Web
Less technical expertise available
Large OPACs available
Often part of Council's web site
Note: "Well Connected: A Snapshot of Local Authority
Websites" (Society of Information Technology
Management report) found that in 1999 69% of local
authority websites did not have a search facility
16
Results
Survey carried out on 4-5th January 2000
Results for 137 web sites:
• 49% have no search facility?!
• Of those that do:
100
90
80
70
60
45% (18) use Microsoft
7.5% (3) use Domino
7.5% (3) use Muscat
40% (16) another solution
50
40
None
Microsoft
Muscat
Domino
ht://Dig
eXcite
Other
30
20
10
Comments
0
• Some sites use the general Council search facility and in some sites
the Council search facility can be used to search areas (e.g. Library)
• Some sites very small (1 page with opening hours)
• See <http://www.ukoln.ac.uk/web-focus/surveys/
17
pub-lib-search-jan-2000/survey.html>
Popular Products: Microsoft
Microsoft:
• Several Microsoft options
available
• Used in 18 public libraries
• Sometimes can
restrict searches to
selected areas
• Popularity indicative
of use of Windows NT
in public libraries
18
Popular Products: Muscat
Muscat Empower:
• Powerful licensed
product
• Agent technology
• Email alerting of
changed resources
• Foreign language
support
• Used in 2 Public
Libraries (full Council
web site only)
• Muscat FX also used
(1 site)
• See <http://www.muscat.com/>
19
Popular Products: Domino
Lotus Domino (Notes):
• Powerful, licensed web server system
• Used at 3 Public Libraries
• See <http://www.
lotus.com/home.nsf/
welcome/domino>
20
Home-Grown Solution
A small number of Public Libraries have developed their
own indexing software. Leeds Public Library have a good
example:
• Various areas can be
searched
• Multiple search terms
• Boolean operators
• Attractive interface
Software:
• Written in C++
• Interrogates file when
they are live
• Directories can be
excluded
• Operational for 3 years
21
Try Them For Yourself
• Interfaces to UK University
search engines are
available providing a single
location for evaluation
• The page also
provides a link to
organisational
search pages
• The resources
are grouped in
alphabetical order
and by search
engine
What does Aberdeen's
search facility provide?
What functionality do libraries
using Domino provide?
See <http://www.ukoln.ac.uk/web-focus/surveys/>
22
Other Developments
What else is happening to indexing of these
communities?
• eLib Hybrid Libraries
• National search engines
• Local initiatives
23
eLib Hybrid Libraries
eLib Phase 3 includes "Hybrid Library" projects:
• Help users find electronic (web, OPAC, etc.) and
"real world" resources
• Includes regional and subject-specific approaches
24
MusicOnline search of
Music Catalogues
BUILDER search of eLib
Phase 3 web sites
National Search Engines
ACDC (Academic Directory)
• (Unfunded) pilot of index of
ac.uk domain based on
distributed approach using
Harvest
• Set up in March 1996
• Lack of development effort
resulted in degraded service
(e.g. indexer not aware of
JavaScript code)
• No longer being developed?
25
http://acdc.hensa.ac.uk/
Institutional Developments
Maestro robot (Dundee):
• Indexes Scottish resources
• Volunteer effort
North East Universities (UNIS4NE):
26
• Appearance of cross-searching
• Actually interface to HotBot / AltaVista
Other Possibilities
What other developments may we expect:
• Increased indexing in institutions of other web
sites (opposition / friends)
• Development of a HE (or public sector?) national
search engine
• "Surface-scraping" of institutional search engines
• Leave it to commercial sector
• European developments
• New developments (XML / RDF / etc.)
27
Indexing Remote Sites
May see increased indexing of remote sites within
institutions:
Examples provided by Dundee and BUILDER
Feeling of ownership
Easily done
28
Can develop enhancements locally
Increased server load locally
Increased server load remotely
Increased network load
Not scalable
Unnecessary duplication
"Meta-Search" Possibility
A collection of interfaces to
search engines for UK HEIs
is available
This could be used as the
basis of a "meta-searcher":
Indexes aren't duplicated
Local site responsible for
content of its index
A hack
Problems with maintenance
29
Commercial Solutions
Could leave searching to commercial world:
No costs to institution / HE community
Results too broad
Distracting interface
Little scope for tailoring
Not integrated with
non-Web services
30
European Developments (1)
DESIRE project:
• EU-funded project with resource discovery component
• Nordic Web Index provides index across Nordic countries
(but partly discontinued due to lack of funding)
• See <http://www.desire.org/html/services/
resourcediscovery/
indexing/>
REIS:
• Pilot project on Research
& Education Indexing
Service for Europe
• See <http://www.
terena.nl/projects/
reis/>
31
European Developments (2)
Surfnet:
• Dutch Research
network service
• Use of AltaVista
search software
for national index
• But how widely
used is it?
• Is there a user
demand for this
type of service?
32
http://www.surfnet.nl/en/surfnet-searchtools/
What About Metadata?
Metadata can:
• Improve search results
• Provide structured information (for automated
processing) which can provide richer services:
– Fielded searches
– Limit searches (e.g. only Library pages on
Council web site)
– Web site administration
– Alternative browsing interfaces
Tools, standards, etc. becoming available
Expected growth area
33
Example
34
Exploit Interactive web magazine (www.exploit-lib.org)
is using metadata to provide enhanced searching:
Search for foo in:
• Issue 2 or in issue 2 and 4 (this is possible using directory
structure)
• Feature Articles
(needs metadata)
• Articles about EUfunded projects
• Etc.
• Combinations of
above
Also provides
alternative browsing
structures
JISC Developments
DNER (Distributed National Electronic Resource):
• Seamless access to national resources
• What about local resources?
• Need for "institutional portals"
RDN
• Resource Discovery
Network
• Builds on work of
eLib subject gateways
• Based on standards
(Z39.50, whois++,
LDAP, etc.)
• Lessons for institutions
35
Conclusions
Questions
welcome
To conclude:
• No clear "best buy" for indexing software
• Probably some to avoid
• In 2 years time are you likely to:
– Still be using same software?
– Have changed software / architecture?
• If changes likely, need to think about change
migration strategies, interoperability issues, etc.
• Need for user studies (not covered)
Useful Resources
http://SearchTools.com/
http://www.searchenginewatch.com/
http://www.builder.com/Servers/AddSearch/
36