HKUL Digital Library

Download Report

Transcript HKUL Digital Library

Digital Library: The HKU
Libraries’ experiences
Kam-ming Ku
HKUL
[email protected]
The presentation is about:
How to achieve delivering right information
to the right person at the right time in
anywhere?
1.HKUL resources/projects
2.Going to do…
3.Challenges
4.Overcome the challenges
5.Discussion
1. HKUL resources/projects
1.1 Staffing
1.2 Networking
1.3 Hardware
1.4 Software
1.5 DL initiatives
1.1 Systems Staff
• Systems Librarian
• 2 Computer Officers
• Assistant Librarian
• Assistant Computer Officer
• Senior Library Assistant
• 5.5 Technicians
1.2 Networking
• From 10  100  1000  wireless 
Bluetooth??
• Gigabit Ethernet backbone and Fast
Ethernet running to users. About 1000
network points.
• ACENet connection (Access Everywhere
Network; plug-in network for roaming
users); ~450 fixed points; 18 wireless
access points.
1.2 Networking (cont.)
• Libraries within Campus are connected to
Campus Backbone by Gigabit Ethernet
link or Fast Ethernet link.
• 2 remote sites, Dental & Medical
Libraries, are connected to Main Campus
by 10Mbps links respectively.
• Gigabit Firewall (Cisco PIX Firewall)
• Packeteer Network shaper
1.3 Hardware
• Compaq AlphaServer GS60E ( for library
catalogue)
• SUN Enterprise 4000, 10000
• 3 Linux, 5 Windows and 3 Novell Servers
1.3 Hardware (cont.)
• 10 CDROM Towers
4 Towers for Staff
2 Towers in Medical Library
4 Towers for Network
• 3 WinFrame Servers & 1 Thin Client server
1
1
1
1
Network CD-ROM MetaFrame Server
Standalone CD-ROM MetaFrame Server
Network CD-ROM WinFrame Server
Dell Server for 6 Thin Clients
1.3 Hardware (cont.)
Office/Staff
Counter
Student
Office/Staff
Student
PC
289
35
342
Printer
107
27
MAC
6
7
Scanner
17
12
1.4 Software
• SUN Solaris 8, DEC UNIX, Windows 2000/NT,
•
•
•
•
•
Novell Netware, Linux
III Innopac library management system
Oracle 9i database, 9iAS (Web) and Context
(full-text indexing/searching)
ERL server for SilverPlatter databases
WinFrame server for legacy and network CDROM
databases
Apache Web servers
1.4 Software (cont.)
• TRS 4.0 server
• CJN server for hosting 6000+ China fulltext journals
• Proxy server, Samba server
• Pcounter server
• Tamino XML server
• VOD server (IBM Videocharger)
• Ezproxy Server
1.4 Software (cont.)
Illiad server (Inter-library Loan)
Taiwan Newspaper database
Chinese Database Server: Sibucongkan
(四部叢刊); Sikuquanshu (四庫全書);
ekangxi dictionary (康熙字典)
1.5 HKUL DL initiatives
1.5 HKUL DL initiatives
Imaging database
1.5 HKUL DL initiatives
• 1.5.1. Digitalization projects
e.g. ExamBase
–
–
–
–
–
–
First in-house developed database
Imaging database for past exam. papers
Released in 1996
Use DMS, client-server model
Shifted to web-based soon
tiff only (on-the-fly convert to gif/jpg) , no PDF!!!
1. Hardware

High-speed flat bed scanner (36ppm)
2. Software
Kofax capture 3.0


Sophisticated software includes scanning, OCR,
verifications.
3. Logistics
a.
b.
c.
d.
Scanning
Automatic indexing
Verification and manual inputting
Data Publishing
Publish data to Oracle database
a. Scanning


Papers are scanned in batch mode (~200 pages per
batch)
Uses separation sheet to separate different
documents
(The separation sheet is printed with barcoded index
(e.g. department, course code) and fixed-sized font
text The separation sheets can be re-used.)
b. Automatic indexing

To recognize those barcoded indexes and text
printed on the separation sheet
c. Verification and manual inputting



No need to verify the barcoded indexes, as
the accuracy is > 99.999%
In-doubt OCRed text is marked in red, it is
easy to verify
Input other indexes manually (e.g. exam.
date)
1.5 HKUL DL initiatives (cont.)
• e.g. Newspaper clippings
– Full-text imaging database
– Outsource: scanning/indexing/OCR
– Oracle context cartridge as full-text search
engine (supports no Chinese!)
– Decision: keep on using? or buying a 3-rd
party full-text software??
1.5 HKUL DL initiatives (cont.)
• 1.5.2 Value-added Bibliographic databases
– Subset of library catalogue
– e.g. TOC , Thesis Online, AV materials..
– Debate:
• single point source or a number of subsets??
1.5 HKUL DL initiatives (cont.)
e.g. Table of Contents
• To automate the inputting of TOC into
bibliographic records
1. Hardware

Overhead book scanner (~4sec per image)
2. Software
Kofax capture 3.0


Sophisticated software includes scanning, OCR,
verifications.
3. Techniques
a.
b.
c.
d.
Scanning
Chinese OCR
Proofreading
Data Publishing
Publish data to Catalogue
a. Scanning
Use book scanner to scan the book’s TOC
benefits :






no need to flip the book for scanning
can scan two sides at one time
increase the speed of scanning
b. Chinese OCR
A plug-in module was written to interface with
Kofax Capture for Chinese OCR (TH-OCR 7.5)
c.


Proofreading
Use MS Word (Chinese) to do the proofreading
Macro program was written to ease the step of
assigning MARC sub-fields
d. Publish data to Catalogue


Done at night in batch mode
Use tcl/tk expect script to automate the upload
process
1.5 HKUL DL initiatives (cont.)
• 1.5.3 Subject-based e-resources
–
–
–
–
–
Redesign tag 996
A number of useful information on e-resources
Grouping of materials by subject: fulfill users’ needs
Ease of extending our further DL projects (e.g. portal)
See HKUL HP (databases, EJ, Ebooks & ENews)
• 1.5.4 Internet resources
• 1.5.5 Electronic Delivery (ILLiad)
1.5 HKUL DL initiatives (cont.)
• 1.5.6 Virtual services
– E-forms (e.g. BRO)
– Online reference
• 1.5.7 Automation
–
–
–
–
–
Increase efficiency
e.g. amend thousand of records in batch
Electronic submission
Staff intranet
Innoface
1.5 HKUL DL initiatives (cont.)
• 1.5.8 Collaboration
– Union catalogue w/ Jinan University
• 1.5.9 Authentication : Proxy, ezproxy, IP
control
• 1.5.10 Others…: for accessing legacy
CDROM databases
2. Going to do…
1.
2.
3.
4.
5.
Storage Area Network (SAN)
Abundance of servers
One-stop search
Alert service
Wireless applications
2.1 SAN
Problem a: Storage
large data size of our hosted databases
high monthly data increase rate
Databases are hosted in different
hosts/OS
2.1 SAN (cont.)
Problem b: Backup
backup drive for every machine
backup software license for every machine
Need to handle a lot of backup tapes
2.1 SAN (cont.)
Solution – (SAN)
Put all data storage into a single large-sized
expandable storage device.
The storage device is connected to the hosts by
high-speed Fiber channels
Fiber channel loop is used to connect to each
host in order to ensure high availability
Backup can be done on a single device
2.2 Abundance of servers
Problem :
Hard to monitor the status and activities
of each server
Waste time to tune the performance of
each server
2.2 Abundance of servers (cont.)
Solution – Server consolidation
Buy several powerful servers instead of many
cheap mid-range servers
Keep as minimal servers as possible
Save space and UPS power ratings , i.e. $$
saving
Save man power to administer/maintain server
performance , i.e. cost saving
2.3 One-stop search
 Before searching, one needs to know
which database suit one’s need
 To search multiple databases
simultaneously
 e.g. OAI (http://www.openarchives.org/ )
 e.g. CDL SearchLight
(http://www.cdlib.org/cgi-bin/searchlight)
2.4 Alert service
To alert users for new information
SDI
2.5 Wireless Application
A study on mobile and PDA application in
Library
3. Challenges
•
•
•
•
•
•
•
•
Changes
New Technologies
Competitors
What are the (future) standards?
Contents
Digital Vs printed
Information overflow
Lifelong education
3.1 The causes of changes
• Development of I.T.
– Network, telecommunications, digitalization, storage
format, access model, …
• Economy
– Online, e-commerce, smart card , …
• Learning environment
– Life-long learning
• Mode of communication
– Email, ICQ
3.2 New technologies
• Changing … so fast
• Acronyms
– Help: http://www.webopedia.com
• Who knows what the future would be?
– Reluctant to change
• Don’t be afraid to dig in
– See :
Editor’s notes, Computers in Libraries, vol.22, no.8, p.6
3.3 Competitors
• Who?
– See: OCLC White paper on the Information Habits of
College Students
(http://www2.oclc.org/oclc/pdf/printondeman
d/informationhabits.pdf)
• 79% use a search engine for every or most
searches!!
Technology Adoption Life Cycle
Early
Majority
Innovators Early
Adopters
Late
Majority
Laggards
Source: Crossing the Chasm, Geoffrey Moore
Crystal Ball??











Number of visits 
Usage of physical materials 
Training to users & real-time support 
Demand for subject knowledge 
Competitors 
Fast services & high productivity
Information provider and producer
Cost-effectiveness
Library workflow goes to e-business model
Partnership
Provide services that lead to income
4. Overcome the challenges
• What business are we in?
• What are our major strengths & weakness?
• Who are our competitors?
• Who are our customers? their needs?
• What factors are affecting Library?
• Do we have the skills?
4. Overcome the challenges – how?
•
•
•
•
•
•
•
•
Training - to keep abreast with new technologies
Human resources - partners
Value-added services
User-oriented mindset
Automation
Improve the social image of librarians
Co-operation
Talk with other people in order to understand the
technology different areas
• Research
4. Overcome the challenges (cont.)
• Skills?
– Librarianship & IT knowledge
– Teamwork, Commitment
– Thinking methodology – creativity, use of
knowledge
– Outlook of the world
– Interpersonal skills
– Health!!
Principles for building DL










Expect change
Know your content
Involve the right people
Design usable system
Ensure open access
Beware of data rights
Automate whenever possible
Adopt and adhere to standards
Ensure quality
Be concerned about persistence
McCray, A. & Gallagher, M. (2001). Principles for Digital Library Development,
Communications of the ACM, 44(5), pp.49-54.
THE END
THANK YOU!