No Slide Title

Download Report

Transcript No Slide Title

ALEG Project
Technical Overview
Presentation to the NLA
Kent Fitch
11 May 2001
Agenda
• Introduction to AUSTLIT/ALEG
• Overview of the data model
• Overview of the implementation
– demo
• Problems
• What’s next
• Discussion
Introduction to AUSTLIT/ALEG
• AUSTLIT
– Australian Literature database run at the ADFA library
since late 1980’s
– built on the URICA library software
– delivered by CD to subscribers
– web interface 1996
Introduction to AUSTLIT/ALEG
• ALEG
– Australian Literature Electronic Gateway
– UNSW, UQ, Monash, Sydney, Flinders, UWA,
Deakin, plus NLA, plus RIEF grant
– Aim:
To provide national & international users with a
single entry point to scholarly print and electronic
resources on Australian writers and their writing,
and the wider Australian literary culture
Introduction to AUSTLIT/ALEG
• ALEG has produced the “new” AUSTLIT










AUSTLIT: The Australian Literary Database (UNSW)
The Bibliography of Australian Literature (UQ / Monash)
The List of Australian Writers (UQ / Monash)
From Page to Stage (UQ / Monash)
A Checklist of Australian Literary Pseudonyms (UQ / Monash)
The Lu Rees Archive of Australian Children’s Literature (UC/CBC)
Western Australian Writers (UWA)
Australian Multicultural Writers Databases (Deakin)
South Australian Writers (Flinders)
Australia’s Literary Responses to ‘Asia’ (Flinders)
Overview of the data model
• Background
• Choices
Background
• Library systems are facing challenges
– recognition that existing data models are not
adequate, especially given:
• declining budgets
• growth in resources (especially electronic)
• “challenge” from web-based resource discovery
systems (eg Google)
Background
• “The web changes everything”
• users expect everything, all the time, for free
– charging model
– user interface expectations
– interconnectedness of all things…
• metadata / RDF/ Topic Maps, XLink
Background
• New data models:
– Dublin Core
– simple 15 element metadata set
– IFLA’s FRBR
– Functional Requirements for Bibliographic Records
– INDECS
– INteroperability of Data in E-Commerce systems
– CIDOC’s CRM
– object oriented reference model for cultural heritage
– Harmony ABC proposal
– common cross-realm entities (work, event, agent)
…more on these later
Background
• Where does ALEG fit in all of this?
– about literary resources, but not a library catalogue
– emphasis on resources which help interpret and
understand:
•
•
•
•
•
reviews/criticisms
setting & subject classification
biographical material
archival items
linking, relationships
…continued
Background
• Where does ALEG fit in all of this? (continued)
– central database, but decentralised operation
and separate, customisable views
– web based maintenance and access
– linking to full text
• via SETIS & others for large works
• scanned reviews/articles (copyright issues)
• electronic journals and other resources
…continued
Background
• Where does ALEG fit in all of this? (continued)
– integration with external electronic resources
• as a user:
– holdings (Kinetica?)
– manuscripts (RAAM/son-of-RAAM?)
• as a provider
– Z39.50 target
– support specialised views, deliver data in XML
format
Background
• For ALEG to be relevant
– it must be designed to integrate with and
take advantage of the systems of the
present and the future, not those of the past
– it must ‘play nicely’ in the ‘semantic web’
Background
• “The systems of the present and the future”
– web based
– common metadata based
– common data model, supporting
• unambiguous identification
• rich relationships, explicit and implied
• interoperability with other systems
Background
• Being explicit:
“Metadata not descriptions”
– identifiers, not words
– relationships, not labels
– events, not things
Godfrey Rust, Technical Coordinator, INDECS project
Metadata 2010, presentation to British Library Seminar, Sept 99
http://www.bl.uk/information/news/2709rust.ppt
Choices
• IFLA’s FRBR basic model
• plus INDECS/Harmony style events
• plus Topic Map style associations
IFLA FRBR
Work
realized through
Expression
embodied in
Manifestation
exemplified by
From:Deconstructing the Library Catalogue,
Tom Delsey, National Library of Canada
Presentation to British Library Seminar, Sept 99
http://www.bl.uk/information/news/2709frbr.ppt
Item
IFLA FRBR
creates
Work
Person
realizes
Expression
Corporate
body
produces
Manifestation
owns
Item
From:Deconstructing the Library Catalogue,
Tom Delsey, National Library of Canada
Presentation to British Library Seminar, Sept 99
http://www.bl.uk/information/news/2709frbr.ppt
IFLA FRBR
Concept
Object
subject of
Work
Event
Place
subject of
subject of
Person
Work
Corporate
Body
Expression
From:Deconstructing the Library Catalogue,
Tom Delsey, National Library of Canada
Presentation to British Library Seminar, Sept 99
http://www.bl.uk/information/news/2709frbr.ppt
Manifestation
Item
Choices
• Topic Maps
– “A thesaurus on steroids”
– “The Global Positioning System for the Web”
– Charles Goldfarb, SGML
– ISO standard, influenced by HyTime (SGML)
– a framework for defining topics, associations
scopes, occurrences and attributes (facets)
separate from an underlying data base
– lots of overlap with RDF - possible unification?
Topic Maps
• Traditional approach
• data bases are self contained
• others can’t ‘point in’ and make assertions and build
relationships about your data
• but enter:
• common markup, metadata and semantics
• universal addressability
• “the web”
• and then...
• relationships and assertions can be separated from
the base data
Topic Maps
• So what?
– Many different views and organisations of the
base data can be supported
– Building and maintaining topics and
associations can become a specialist task
• divide and conquer
• plurality of interpretations and values
– Base data can be easily reused, reinterpreted,
combined with other databases
Topic Maps
David Malouf
Voss (novel)
White,Patrick
Tree of Man
Voss (opera)
Fly away Peter
Flaws in the Glass
Topic Maps
Victoria
20th Century
Acme
Armaments
Factory
WW2
Melbourne
St Kilda
Richmond
WW1
Geelong
NSW
Blue Mtns
Richmond
Dubbo
Leura
Katoomba
Topic Maps
England
Australia
White,Patrick
NSW
London
Knightsbridge
Voss
Longmans
subject
subject
Sydney
isolation
death
grief
ALEG data model
FRBR
FRBR + INDECS
A typical set of topics/events/relationships
“I conceived it”
CREATION event
Work
“I did it”
REALISATION event
realised through...
Expression
embodied in...
Manifestation
“I produced it”
EMBODIMENT event
Overview of the implementation
• Database
– Oracle 8i with interMedia
• Server
– Apache Project Tomcat Java servlet container
– Apache Project Xerces XML DOM
– Apache Project Xalan XSLT
• Client
– Microsoft IE5.5 for maintenance
– any web browser for view
Database
• Topics
TopicId
TopicType
TopicName
• TopicRelationships
FromTopicId
ToTopicId
TopicRelationshipType
Database
• …almost
• For some topics name is meaningless or so
complicated as to be a separate topic on its
own, so:
• “Topics” is “subclassed” into specialised
topic tables:
–
–
–
–
date
text
title
name...
Database
– Some values are so simple as to not warrant a
separate topic. These are stored as identifiers
TopicId
IdType
IdValue
• the topicId is a foreign key to the Topics table
• Eg: ISBN, austlit ID, NLA Kinetica ID
Database
– Some relationships are best stored as a
thesaurus hierarchy, so we have a hierarchy
table
• places
• concepts
Maintenance
Clients
Server
HTTP
Database
Apache
IE5.5
•javaScript
•DOM
Tomcat
Custom ALEG code
HTTP
Presentation/formatting
Xerces
Xalan
XML
Business Logic
Z39.50 origins
Z39.50 targets
Z39.50 services
based on YAZ toolkit
Oracle 8i
JDBC
interMedia
Text
Demo
So far, so good….
• UI seems to be working well
– 24 active data maintainers
– 20,000 additions, deletions, updates in the first 2
months
• Database performance generally OK
–
–
–
–
–
350,000 works
55,000 agents
3.2 million “topics” of all types
4.6 million topic relationships
360,000 identifier relationships
But...
Problems/Issues
• FRBR Issues
• Data loading disasters
• UI too closely coupled with our XML
representation of the FRBR ?
• Visualisation of large amount of data
• Scalability of using RDF-style tuples
– (are big simple tables the antithesis to effective SQL
query optimisation?)
• RDF -v- Topic Maps
What’s next
• Stage 2: May 2001-Dec 2001
–
–
–
–
–
–
–
–
end user interface
customised views
more maintenance functions
concept thesaurus
link to Kinetica for holdings
full text
more and richer relationships
public launch (July)
What’s next
• Beyond
–
–
–
–
–
browsing/ranking/visualisation of relationships
copyright issues
customised services
sustainability
wider interoperability
Discussion