Building a Distributed Geospatial Library

Download Report

Transcript Building a Distributed Geospatial Library

Additional text in
“Notes” view
Alexandria Digital Library Project
Building a
Distributed Geospatial Library
where we are now
where we’re going
what we’re facing
Greg Janée
[email protected]
Alexandria Digital Library Project
Goals

Digital library for georeferenced information




distributed, autonomous nodes
heterogeneous
rich services
scalable
– many providers
– collections, large and small

Standard components, interfaces
Greg Janée • ADEPT retreat • November 8, 2002
2
Alexandria Digital Library Project
The big picture
collection registry
thesaurus
collection-level search
shared vocabularies
library
content
gazetteer
item-level search,
metadata management
data access
maps placenames
to locations
collection
map
collection
item
item
item
background imagery,
layering capability
item
item
*many interconnections
between services*
Greg Janée • ADEPT retreat • November 8, 2002
3
Alexandria Digital Library Project
Library server
user
interface
metadata
mapper
harvest
loader
item
tracker
client interface (XML / Java,HTTP,RMI)
middleware
access control; query fan-out; query result caching & ranking
collection referencing & registration
collection interface (XML / Java)
internal
collections
generic
database
driver
Z39.50
driver
proxy
driver
Greg Janée • ADEPT retreat • November 8, 2002
collection
aggregator
4
Alexandria Digital Library Project
Issues
1. Finding the right participation model
I have a collection o’ stuff, how do I join ADL?
2. Providing a complete solution
I’m a map library, I want a library-in-a-box
3. Gaining adoption
How do I add spatial searching to my DL?
4. Simple, effective spatial searching
I want spatial search but I’m cheap and lazy
Greg Janée • ADEPT retreat • November 8, 2002
5
Alexandria Digital Library Project
Participation via database mapping



Assumes a relational database of
metadata
Collection described as a view of
the database
ADL provides
 template-based report generator
 mapping language
 extensible library of composable
mapping components (“paradigms”)
 offline software package to generate
collection statistics
ADL node
config
view
RDBMS
provider
Greg Janée • ADEPT retreat • November 8, 2002
6
Alexandria Digital Library Project
Sample paradigms

Spatial

 Informix Geodetic blade
 4 box coordinates


 SQL LIKE substring
matching
 Verity text engine
 IIT SIRE
Temporal
 begin, end dates
 single integer year
Hierarchical
 integer codes w/ code
ancestor relationships
 constant
Textual

Numeric, Identification, ...

Field adaptors




qualification
union
concatenation
constant
Greg Janée • ADEPT retreat • November 8, 2002
7
Alexandria Digital Library Project
A bucket mapping
"subject-related-text" : UT.Bucket("textual",
UT.standardTextualOperators,
P.Adaptor_Concatenation(
{ "tag:sio.ucsd.edu:sioexplorer/nsdl_mif_dbc/subject" :
P.Textual_LikeSubstring(
"nsdl.nsdl_mif_dbc",
"identifier",
"subject",
UT.Cardinality("1"),
P.TextUtils.mappings.
uppercaseAlphanumericOthersToWhitespace,
P.TextUtils.deleteLists.keepAll,
"UPPER"),
"tag:sio.ucsd.edu:sioexplorer/subject-keywords" :
P.Textual_Constant(
"nsdl.nsdl_mif_dbc",
"identifier",
UT.Cardinality("1"),
["oceanographic data", "Stephen’s baby"])
...
Greg Janée • ADEPT retreat • November 8, 2002
8
Alexandria Digital Library Project
A bucket mapping
"subject-related-text" : UT.Bucket("textual",
UT.standardTextualOperators,
P.Adaptor_Concatenation(
{ "tag:sio.ucsd.edu:sioexplorer/nsdl_mif_dbc/subject" :
P.Textual_LikeSubstring(
"nsdl.nsdl_mif_dbc",
"identifier",
"subject",
UT.Cardinality("1"),
P.TextUtils.mappings.
uppercaseAlphanumericOthersToWhitespace,
P.TextUtils.deleteLists.keepAll,
"UPPER"),
"tag:sio.ucsd.edu:sioexplorer/subject-keywords" :
P.Textual_Constant(
"nsdl.nsdl_mif_dbc",
"identifier",
UT.Cardinality("1"),
["oceanographic data", "Stephen’s baby"])
...
Greg Janée • ADEPT retreat • November 8, 2002
9
Alexandria Digital Library Project
Database mapping: an assessment

What’s good
 data stays close to provider
 collection-as-DB-view parallels real-world funding situation
– nobody is paid to be an ADL node

What’s bad
 high bar
– must have database, good metadata, reasonable data
modeling, appropriate indexes
 complex configuration
– multiple, different representations of same info
– requires superhuman diligence
 complex software
– generic query translator  compiler
Greg Janée • ADEPT retreat • November 8, 2002
10
Alexandria Digital Library Project
Participation via metadata transfer


Database is internal to ADL
“Universal” schema
 supports all buckets, bucket types
 automates all indexing, bucket
mappings, collection statistics
 enforces collection policies

RDBMS
config
Provider supplies metadata
 entire XML documents
 via OAI or otherwise

ADL node
Mapping to ADL metadata views
(bucket, browse, access) still
required, but...
 simpler, higher-level
 no duplication
Greg Janée • ADEPT retreat • November 8, 2002
mapper
metadata
provider
11
Alexandria Digital Library Project
Issue 2: providing a complete solution

ADL provides:
 discovery

Missing:
 ingest, editing tools
 management of...
– metadata
– data
– data services
 ...and synchronization of the above
 workflow

A reasonable goal (?):
 ADL provides complete map library solution
Greg Janée • ADEPT retreat • November 8, 2002
12
Alexandria Digital Library Project
Issue 3: gaining adoption

Adoption by other DLs has been difficult
 features (spatial search, buckets) not separable from
architecture
 nobody understands buckets anyway

The world speaks Dublin Core
 we don’t
 close doesn’t count
Greg Janée • ADEPT retreat • November 8, 2002
13
Alexandria Digital Library Project
Adoption strategies

New, compelling reasons to use ADL!
 harvesting automates collection building
 metadata mapping will support qualified Dublin Core

Our proposal to NSDL/CI:
 “search semantics” profile for qualified DC
 generic search framework that supports
– typed searches
– over federated search services
Greg Janée • ADEPT retreat • November 8, 2002
14
Alexandria Digital Library Project
Issue 4: design philosophy

“The right thing”
 1 : interface simplicity, correctness, consistency
 2 : implementation simplicity, completeness

“Worse is better”





1 : implementation simplicity
2 : interface simplicity
3 : correctness, consistency
4 : completeness
exemplified by Unix, C
(Richard Gabriel, early ‘90s)
Greg Janée • ADEPT retreat • November 8, 2002
15
Alexandria Digital Library Project
Our approach

We have the “right” interfaces





searching based on continuous geodetic coordinates
complex spatial representations (polygons, polylines, ...)
gazetteer (content & protocol) provides mapping to names
simple!
But... implementation is very difficult




polygons, etc. make life difficult at all levels
polygons require $$$ 3rd-party software
client integration with gazetteer is difficult
still don’t have a usable gazetteer
Greg Janée • ADEPT retreat • November 8, 2002
16
Alexandria Digital Library Project
Other approaches

We pay a big price for our approach
 spatial search was motivator for typed metadata
 typed metadata is responsible for much of complexity

Might other approaches be equally effective?
 simplified spatial models, e.g., boxes only
 other coordinate systems (discrete, coded, ...)
 cataloging against fixed gazetteer w/ topological
relationships
Greg Janée • ADEPT retreat • November 8, 2002
17
Alexandria Digital Library Project
Summary

Future directions





simpler participation model
collection-level discovery
remote deployment
NSDL/CI
Legacy
 production-quality software
– copiously documented
– no known bugs, omissions, or bottlenecks
 in step with MIL
Greg Janée • ADEPT retreat • November 8, 2002
18
Alexandria Digital Library Project
Cast of characters

Dave Valentine
 client, databases, testing, deployment

Catherine Masi
 MIL collection development

Rudolf Nottrott
 outreach, software development

Greg Janée
 overall design, core software development

Jim Frew
 guru
Greg Janée • ADEPT retreat • November 8, 2002
19
Alexandria Digital Library Project
Issues
1. Finding the right participation model
I have a collection o’ stuff, how do I join ADL?
2. Providing a complete solution
I’m a map library, I want a library-in-a-box
3. Gaining adoption
How do I add spatial searching to my DL?
4. Simple, effective spatial searching
I want spatial search but I’m cheap and lazy
Greg Janée • ADEPT retreat • November 8, 2002
20