Transcript Slide 1

Full implementation of GUIDs at
SERNEC institutions: A strategy that
accommodates institutions of
varying sizes and complex resource
relationships
Steven J. Baskauf – Vanderbilt
University
Thomas Sasek - University of Louisiana
at Monroe
Globally Unique Identifiers
(GUIDs),
a.k.a. Persistent Identifiers
Properties of GUIDs:
1. Globally unique (no two
alike!)
2. Persistent (lasts forever!)
3. Actionable (explains itself
to you and web crawlers
on demand!)
.
= technical detail warning
GUIDs
Good
for what
ails you
My grant got funded!
Identifiers that are persistent
should be scalable
http://lod.geospecies.org/ses/4XSQO
• This URI could represent a passive file delivery system where ses is
the name of a directory on the server and 4XSQO the name of a file
in that directory (no illegal file characters)
• ses/4XSQO could also represent an identifier passed to a serverside script that generates a file on the fly from a database
• In accordance with the principle of REST (representational state
transfer), the client (i.e. user with a web browser) doesn’t need to
know how the server produces the file it sends-the method could
change over time as needed.
• Other nice things about this style of URI
– could correspond to a user’s hierarchy (e.g.
collectionCode/catalogNumber)
– relatively short
– no characters that need to be escaped in XML
Thanks for the example, Pete DeVries .
My grant ran out.
•
•
•
Identifiers that are persistent
should be able to survive the
apocalypse
Grants end.
People quit.
People loose interest.
http://lsid.tdwg.org/urn:lsid:gdb.org:GenomicSegment:GDB132938
How can we provide actionability?
We can do this easily
with a mod_rewrite
accessing a php
script that uses our
MySQL database!
Server
Man
If this is so easy, why aren’t people using
actionable GUIDs with occurrence data???
“Adoption of Persistent Identifiers for
Biodiversity Informatics” GBIF, 2009.
The Chicken and Egg Problem of
Actionability
• Nobody is going to go to the trouble of making
their GUIDs actionable if the metadata that
the GUIDs return aren’t ever going to be used
for anything.
• Nobody is going to build a system that gleans
data from actionable GUIDs if there aren’t any
GUIDs from which to harvest metadata.
(Just like the early Internet where little content
was available for users!)
Economics of investing in GUIDs
• The use of GUIDs for occurrences will
increase when the benefits outweigh the
costs of implementation.
• If no one uses the metadata from actionable
GUIDs, then in order for them to be adopted
either:
– the cost of implementation must be very low
– there must be other benefits
– or both!
Economics 101
SERNEC (Southeast Regional Network of
Expertise and Collections): Representing
herbaria in the Southeast USA
• 125 member herbaria
• 53 survey respondents
• 43% of institutions have negligible to
no IT support.
• 40% have web pages (most are
rudimentary)
• 3-4 serve data
Data courtesy of Zack Murrell of SERNEC
Databasing technology in SERNEC
• 75% are databasing
• approximately 35% are using Excel or nothing
• Although some are institutions
with significant budgets, IT
support , some are
one-person operations with
no budgets and no IT staff
These
people
need a
lot of
help
These
people
don’t
need
help
Data courtesy of Zack Murrell of SERNEC
Costs:
1. Risk: depending on someone else’s
complicated solutions that may result in
disaster.
Costs:
2. You may invest time in something that
never happens.
Cost:
3. Unavailability of a template for generating
RDF/XML
• The TDWG, GBIF, and Linked Data guidelines say
we must use Resource Description Framework
(RDF) in XML format to describe metadata.
• What is it? RDF describes metadata properties in
a way that can be understood by computers.
• It looks like this:
<dcterms:description>Field individual of Arborus rarus</dcterms:description>
<dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
Summary:
Users having few IT resources need a simple
system:
– that requires little or no help to implement
– that can use existing database output
– that requires the least possible maintenance on
the server
The cost of complex systems is too high for small
users to implement without a very large
benefit.
Methods for lowering the cost of
implementing actionable GUIDs for
small-scale users: RAX and REJAX
Review of Linked Data rules
1. URIs of physical or conceptual (non-information)
resources must differ from the URLs of
documents that describe them, e.g.:
http://bioimages.vanderbilt.edu/vanderbilt/7-314
is an oak tree
http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf
is a metadata file describing the oak tree
2. Content negotiation for actionable noninformation resource URIs should produce:
A. a web page for humans to see
B. an RDF/XML file for semantic clients (i.e. computers)
EXtensible Stylesheet Language
Transformation (XSLT)
RDF/XML metadata
in the file
0134.rdf
XSLT stylesheet
in the file
guid-o-matic.xsl
XHTML web page
as seen by a
human being
RDF And XSLT (RAX) method
1. Client requests extension-less URI.
2. Server concatenates “.rdf” to the URI.
3. RDF/XML file delivered to client regardless of
requested content-type.
4. Web browsers use an XSLT stylesheet to
create an XHTML web page for humans from
the RDF/XML.
5. Semantic clients just use the RDF/XML.
RAX Content Negotiation
I am a computer. Send me
http://www.cyberfloralouisiana.com/specimens/lsu000/0134
I cannot send a
specimen!
GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134
Content-type: application/rdf+xml
RDF/XML file
http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf
web server
RAX Content Negotiation
“I am a human. Send me
http://www.cyberfloralouisiana.com/specimens/lsu000/0134”
Duh, what’s that mean?
He gets RDF anyway.
GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134
Content-type: text/html
RDF/XML file
http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf web server
what the web
browser shows
Static file structure for RAX
nlu000
9506.rdf
nlu009
0505.rdf
0134.rdf
http://www.cyberfloralouisiana.com/specimens
lsu000
0435.rdf
guid-o-matic.xsl
The specimen having barcode
LSU0000134
is identified by the URI
http://www.cyberfloralouisiana.com/specimens/lsu000/0134
Its RDF formatted metadata is in the file
http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf
0532.rdf
Asynchronous JavaScript And XML
(AJAX)
JavaScript
in the file
metadata.htm
RDF/XML metadata
retrieves metadata
in the files
vanderbilt/4-145.rdf (the tree)
baskauf/79687.rdf (an image)
baskauf/79695.rdf (another image), etc.
XHTML web page created
using those metadata
as seen by a
human being
Redirection, Javascript, and XSLT
(REJAX) method
1. Client requests extension-less URI.
2. Server does content negotiation based on
requested content-type.
3. Semantic clients are sent the RDF/XML.
4. Web browsers are sent a TEXT/HTML webpage
which uses JavaScript (i.e. AJAX) to open
RDF/XML files and obtain the metadata required
to construct the web page. The JavaScript can
also retrieve blocks of XSLT formatted RDF data.
REJAX Content Negotiation
“I am a computer. Send me
http://bioimages.vanderbilt.edu/vanderbilt/4-145”
I cannot send a tree! I’ll send
information about the tree.
GET http://bioimages.vanderbilt.edu/vanderbilt/4-145
Content-type: application/rdf+xml
RDF/XML file
web server
http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf
REJAX Content Negotiation
“I am a human. Send me
http://bioimages.vanderbilt.edu/vanderbilt/4-145”
Got it. I’ll send XHTML.
GET http://bioimages.vanderbilt.edu/vanderbilt/4-145
Content-type: text/html
XHTML file
web server
http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm
web page created by JavaScript
Static file structure for REJAX
66920.rdf
66920.htm
ind-baskauf
70905.rdf
70905.htm
http://bioimages.vanderbilt.edu
4-145.rdf
The tree identified by the URI
http://bioimages.vanderbilt.edu/vanderbilt/4-145
has RDF metadata in the file
vanderbilt
4-145.htm
http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf
while the file
7-314.rdf
metadata.htm
http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm
etc.
passes information to the javascript in
http://bioimages.vanderbilt.edu/metadata.htm? vanderbilt/4-145/metadata/ind/etc.
Comparison of RAX and REJAX
Similarities
Differences
Both use static files.
RAX uses metadata from a single RDF file
while REJAX inputs metadata from several
RDF files.
Both will work offline with at least some
browsers.
Both require modification of only a single
file to change the appearance of the web
page.
RAX simply displays the metadata for one
or more closely related resources while
REJAX allows the user to interact with
many resources in complex ways.
• RAX and REJAX are not programs or languages.
• They are simple content-negotiation methods
that make use of the RDF/XML required by the
Linked Data concept to create web pages.
Back to economics… Cost reduction
• Risk is lowered because they can operate on a
generic web server with no server-side scripting.
No maintenance required once set up (although a
minor server rewrite rule is required).
• Little time must be invested – existing database
can be used to provide metadata and
implementation can be immediate.
• Scalable: URIs are such that static files can be
replaced at any time by server-side scripting.
What about the RDF?
RAX (specimen record) single RDF file using hash URIs
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">
<rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />
… [metadata about the individual] …
<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265b">
<rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />
… [metadata about the determination] …
<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428">
<rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" />
<dwc:basisOfRecord rdf:resource ="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen" />
… [metadata about the specimen] …
<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#img">
etc.
What about the RDF?
REJAX (live plant image records) using multiple RDF files
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314">
<rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />
… [metadata about the individual] …
<rdf:Description>
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314#19287" >
<rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />
… [metadata about the determination] …
<rdf:Description>
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/79651">
<rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" />
<dwc:basisOfRecord>DigitalStillImage</dwc:basisOfRecord>
… [metadata about the image] …
<rdf:Description>
The importance of separation of
resources in the RDF
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">
… [metadata about the individual] …
<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428">
… [metadata about the specimen] …
<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" />
<rdf:Description>
This file is served from the herbarium’s website
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/12345">
… [metadata about the image] …
<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" />
<rdf:Description>
This file is served from the image repository’s website
See Biodiversity Informatics 7:17-44 for much more on this.
Guid-O-Matic
1. Create CSV export
containing terms that
vary among specimens.
2. Download guid-o-matic.exe (200 kB) from
http://bioimages.vanderbilt.edu/guid-o-matic
(no installation required).
3. Create a directory to
hold the RDF files.
4. Enter (one time) the
stuff about your
institution that doesn’t
change.
5. Click this button and poof!
the RDF files appear in the
directory you created.
6. Re-publish your website using WinSCP or whatever.
What’s the point???
• Appropriate design of the RDF structure allows
for both
– simple methods of generating a representation for
humans
– semantic clients drawing correct inferences about the
relationships among resources
• The human end user doesn’t care about this and
doesn’t have to know about it (they’ll just see the
web page.
• The raw data provider shouldn’t have to worry
about what RDF is or how to use it (They just
need some simple software to map their data
correctly!).
Economics: benefits to small users
• Serving the files from the user’s own web server
allows the users to brand their GUIDs by including
their own domain name rather than that of an
external host.
• Clickable attribution on websites
• Reference link in PDF publication citations.
• Instant iPhone “app” to access collection
metadata.
• XSLT can easily be modified to meet the needs of
the users, e.g. QR codes on displays.
QR code on a museum display
Try these on your portable device
(iPhone=yes, others=?)
Juncus diffusissimus specimen at the LSU herbarium
The “Bicentennial Oak” in Vanderbilt’s arboretum
http://www.cyberfloralouisiana.com/specimens/lsu000/0428
http://bioimages.vanderbilt.edu/vanderbilt/7-314
RAX example
REJAX example
Summary
• It is possible for GUIDs of the HTTP URI form to be
implemented right now, even by users with very few IT
resources.
• Restricting the format of the URIs to a simple structure
(no weird characters, short, slashes to indicate
hierarchy) prevents dependence on a particular
delivery method (you can change your mind later).
• Making HTTP URI GUIDs actionable (i.e. resolvable in
XHTML) in a simple way provides immediate benefits
to the issuer even if the RDF is never used by a
semantic client.
• Making it practical to implement resolvable GUIDs on a
large scale increases the likelihood that semantic webbased databases will evolve because the economics are
shifted toward their favor (solution to chicken and egg
problem).
References
Note: this PowerPoint will be linked from the first URL below
(QR code at right loads the URL).
• Links from Bioimages GUID page
http://bioimages.vanderbilt.edu/pages/guid.htm
• TDWG GUID/LSID applicability statement
http://www.tdwg.org/stdtrack/article/download/150/51
• Cool URIs don't change (Tim Berners-Lee)
http://www.w3.org/Provider/Style/URI
• Cool URIs for the Semantic Web
http://www.w3.org/TR/cooluris/
• Recommendations for implementation of guids in the
SERNEC collections community
http://bioimages.vanderbilt.edu/guid
• Biodiversity Informatics 7:17-44
https://journals.ku.edu/index.php/jbi/article/view/3664