powerpoint - Cornell Computer Science

Download Report

Transcript powerpoint - Cornell Computer Science

Information Network
Overlay Architecture
Adding Value to Digital Content
Carl Lagoze
CS 431 – May 4, 2005
Cornell University
Overview of the Talk
Digital Libraries for search & access
Beyond Access: Adding value to digital
content
Information Network Overlay Architecture
Implementing the Architecture
Digital Libraries – Ingest Focus
Input Phase Research Questions
Indexing and search


non-textual
cross language
Preservation
Scale issues

everything becomes hard at mega-scale
OCR

especially non-Roman
Workflow

getting stuff in cheaply/reliably
Intellectual property

hard enough at intra-national level
Description

Meatadata issues
Digital Libraries – Federation Phase
Z39.50
Dienst
SDLIP
OAI-PMH
SRW/SRU
Federation Phase Research
Questions
Heterogeneity
State Maintenance
Reliability


Network level
Management level
Ranking
We have been very successful!
So, are we done?
The primary goal of digital libraries has been often
been misconstrued as providing accessibility to a
massive volume of resources. The real opportunity
is to reestablish the library as a collaborative place
where people learn from each other and organize
around ideas and knowledge.
Opportunities:
Not the same old information Flow
Suppliers
(Publishers)
Intermediaries
(Librarians)
Consumers
…Towards a participatory
information environment
Consumers
Shared
Information
Context
modeling
Wisdom
Knowledge
Information
Data
description
IP
preservation
Digital Libraries:
Beyond Search and Access
Build on foundation of near universal access
Provide context for:





Content aggregation: combining information entities in
novel ways
Knowledge integration: capturing semantic
relationships between information entities
Information reuse: allowing secondary, tertiary
products
Information transformation: combining information
entities with computational services
collaboration and contribution: blurring the line
between authors, publishers, users, experts…
Value-add,
customized
Projections
Information Foundation
NSDL Context
A bit of NSDL background
Mission: “Improve Science, Math, Engineering
education through digital libraries”
Original NSDL solicitation in 1999
Over 180 projects funded
Core integration (Columbia, Cornell, UCAR)
charged with providing organizational, technical
infrastructure
Funding through 2006
http://www.nsdl.org
Existing Metadata-Centric Approach
Services
OAI-PMH
Users
Metadata
repository
OAI-PMH
Collections
The metadata repository
is a resource for service
providers.
It holds information
about every collection
and item known to the
NSDL.
Characteristics of the Metadata
Repository
Oracle database
Qualified Dublin Core
Item records with collection association
OAI-PMH ingest and exposure
Current collection ~ 800,000
Metadata quality issues
Problems in this approach
Mere access does not equate to value

Reeves Impact of Media and Technology in Schools
Static metadata records don’t capture changing
and multiple contexts of use and applicability

Recker and Wiley Designing Instruction with Learning
Objects
Patterns of use, informal opinions, descriptions
often more useful than taxonomic classification.

Collis and Strijker Technology and Human Issues in
Reusing Learning
Requirements of a New Approach
Represent (directly or by reference) multiple entities,




standards
taxonomies
agents (user profiles and roles)
curricula
that are contributed by multiple parties,


users as actors
reuse of primary resources for secondary, tertiary produces
that are inter-related to express context,



applicability to standards
usage in curricula
usage patterns by particular groups/people
and can be integrated with services and simulations
Information Network Overlay
Client Layer
Network API
Base Web Graph
Network
Representation
Layer
NSDL Selections
Descriptive Metadata
Annotations
Branding
Collection (Semantic)
People and Organizations
Equivalence
Source Layer
Document
Repositories
Data
Stores
Publisher
Repositories
Web
Resources
Databases
Information Network Instance
API
agent
contributes
standard
http
metadataFor
metadata
appliesTo
appliesTo
resource
API
annotates
agent
resource
oai
metadataFor
contributes
resource
derivedFrom
transformedBy
service
http
derivedFrom
resource
metadata
oai
SOAP
Translate to Technical Requirements
Rich information objects


Integration of local and remote sources
Mixed genre
Dynamic information objects

Integration with local and distributed services
Graph-based information model


Nodes are information objects
Edges are relationships among those objects
Access and management API

exposing full functionality for programmatic access
Fine granularity access management
Fedora History
Cornell Research (1997-present)




DARPA and NSF-funded research
First reference implementation developed
Distributed, Interoperable Repositories (experiments with CNRI)
Policy Enforcement
First Application (1999-2001)



University of Virginia digital library prototype
Technical implementation: adapted to web; RDBMS storage
Scale/stress testing for 10,000,000 objects
Open Source Software (2002-present)




Andrew W. Mellon Foundation grants
Technical implementation: XML and web services
Fedora 1.0 (May 2003)
Fedora 2.0 (Jan 2005)
Fedora Features
Digital Object Model



Container for content and metadata
Aggregate local and remote content
Associate behaviors with objects (integrate
content and web services)
Relationships

Define and query object-to-object relationships
Repository web service


Digital object storage
Web service APIs (SOAP and REST) to manage,
access, search
Objects, Representations, Relationships
hasRep
hasRep
r
embe
hasM
info:fedora/
demo:11
info:fedora/demo:11/DC
hasR
ep
ha
sR
ep
info:fedora/
demo:10
info:fedora/demo:11/THUMB
info:fedora/demo:11/HIGH
info:fedora/demo:11/bdef:2/ZPAN
ep
sR
ha
has
Me
mb
er
info:fedora/
demo:12
hasR
ep
hasRep
info:fedora/demo:12/DC
info:fedora/demo:10/bdef:1/MEMBERS
info:fedora/demo:12/THUMB
Fedora Digital Object Model
Component View
Persistent ID (PID)
Digital object identifier
Relations (RELS-EXT)
Reserved Datastreams
Dublin Core (DC)
Key object metadata
Audit Trail (AUDIT)
Datastream
Datastream
Default Disseminator
Disseminator
Datastreams
Set of content or metadata items
Disseminators
Pointers to service definitions to
provide service-mediated views
Simple Fedora model for
aggregating static content
Representations map to datastreams
Datastreams may be local or surrogates
(redirect) to remote data
REST (or SOAP) URL’s provide uniform
client access to representations
Simple Content Aggregation
Datastreams
DC
text/xml
p
hasRe
URL1
hasRep
THUMB
image/gif
URL2
hasRep
HIGH
image/jpeg
URL3
Aggregating local and remote
content
Datastreams
DC
text/xml
p
hasRe
URL1
hasRep
THUMB
image/gif
URL2
hasRep
HIGH
image/jpeg
HTTP
URL3
Dynamic Content
Take advantage of computational services to
process content
Representations map to service-based
transforms of static data
Opaque at the access level (client sees only
representations, not how they are produced)
Motivating examples



Canonical XML metadata format – XSLT to Dublin
Core
Document source in TeX, programmatic transform to
PDF, PS, HTML, etc.
Linkage of data to analysis tools
Dynamic Representations
ha sR e
Datastreams
URL1
DC
text/xml
ha sR
THUMB
image/gif
ep
URL2
R
has
HIGH
image /jpe g
p
ep
URL3
ep
hasR
service
call
URL4
Expressing Relationships
Between Objects
Object-to-object Relationships


Ontology of common relationships (RDF schema)
Relationships stored in special datastream (RELS-EXT)
Resource Index (RI)

RDF-based index of repository (Kowari triple-store)
RI Search



Powerful querying of graph of inter-related objects
REST-based query interface (using RDQL or ITQL)
Can be used in dynamic disseminations
Uses of Object Relationships
Define collections (e.g., collection objects)
Assert semantic relationships among
objects
Enable network overlay



Surrogate objects referring to external entities
Assert relationships among them
Assert other relationships (e.g., annotations)
Fedora Relationship Ontology
(RDFS)
isPartOf / hasPart
isMemberOf / hasMember
isDescriptionOf / hasDescription
hasEquivalent
… others
Deployment Plans
Production release Phase 1 – July 2005

black box replacement for metadata
repository
Future releases


API available at public level
Relationship building
Example 1 – Branding
Provenance of Data and Metadata
hdl:1
agent
listRoles
hasRole
pid:3
metadata
provider
listMetadata
showBrand
memberOf
hasRole
hdl:2
agent
listRoles
providedBy
hdl:6
metadataFor
content
showContent
getMetadata
getMembership
pid:4
aggregator
listResources
showBrand
pid:5
metadata
getProvider
getRecord
pid:5
metadata
getProvider
getRecord
metadataFor
representedBy
hdl:6
content
showContent
getMetadata
Example 2 – Aggregations
Semantic, Management, etc.
hdl:1
content
showContent
getMetadata
hdl:2
agent
listRoles
hdl:6
content
showContent
getMetadata
listMembership
hasRole
representedBy
pid:4
aggregator
getRepresentation
getMembers
metadataFor
memberOf
memberOf
pid:9
metadata
getProvider
getRecord
pid:5
metadata
getProvider
getRecord
metadataFor
hdl:8
content
metadataFor
showContent
getMetadata
listMembership
pid:7
metadata
getProvider
getRecord
Some open questions
Scalability of this model
Management
Control – trusted actors
Cross-ontology relationships
Exposing to the user - visualization
Concluding Goals
Exploit the increasing ubiquity of digital
content
Provide the architecture for adding value
to underlying content



Aggregation
Reuse
Integration with computational services