What are GUIDs and Why Do We Need Them ???

Download Report

Transcript What are GUIDs and Why Do We Need Them ???

What are GUIDs and Why Do We
Need Them ???
Steve Baskauf
Vanderbilt Dept. of Biological Sciences
http://bioimages.vanderbilt.edu/
What is a GUID?
A globally unique identifier (GUID) should be:
1. globally unique
2. actionable
3. persistent
1. How do you make an identifier
globally unique? (part 1)
• Make it locally unique within your institution
• A common strategy:
– identifier (catalog number) unique within a
collection, e.g. 66920
– namespace (collection code) unique within the
institution, e.g. ind-baskauf
• Unique local identifier: ind-baskauf/66920,
ind-baskauf:66920, ind-baskauf_66920, etc.
How do you make an identifier globally
unique?(part 2)
• Make your local identifier globally unique
• Use your institution code? TENN, BOON,
bioimages?
• No! How do you know that is globally unique?
• Consensus: use a domain (or subdomain)
name, e.g. www.biology.appstate.edu,
tenn.bio.utk.edu, or
bioimages.vanderbilt.edu
Some identifiers that are globally
unique
• bioimages.vanderbilt.edu_ind-baskauf_66920
• urn:lsid:bioimages.vanderbilt.edu:baskauf:66920
• http://bioimages.vanderbilt.edu/ind-baskauf/66920
• Do these qualify as GUIDs???
– globally unique
– actionable????
• What happens if you put them in a web
browser?
2. How do you make an identifier
actionable?
• Something has to happen when the identifier
is put in a web browser.
• LSIDs
– need a special browser plugin that nobody has.
– need a special system for its resolvers to talk to
each other
• HTTP URIs
– work in any web browser
– DNS nameservers already talk to each other
Can a material or conceptual object
have an HTTP URI?
• We know web page can have a URI that the
web browser uses to find the HTML
document…
• But physical objects (specimens, living plants)
and conceptual entities (species) can also
have HTTP URIs!
CAN I HAVE A URI???
• Yes! Here it is:
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
How is my URI actionable???
If I put that HTTP URI in a web browser, does it
deliver me to the user, like a web page?
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
Darn, no transporter technology!
• What should I use for my HTTP URI?
[email protected]
https://medschool.mc.vanderbilt.edu/biosci/bio_fac.php?id3=13257
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf
http://people.vanderbilt.edu/~steve.baskauf/foaf.rdf#me
• The web server doesn’t do anything with the fragment
identifier (#me), but it makes the URI different from the RDF
metadata file. URIs for objects must be different from the
URIs of other things that represent them.
• A URI is a Uniform Resource Identifier, not a URL (Uniform
Resource Locator). It identifies me, but doesn’t deliver me.
Back to the tree…
http://bioimages.vanderbilt.edu/ind-baskauf/66920.htm
= a URI and URL for a web page about the tree
http://bioimages.vanderbilt.edu/ind-baskauf/66920.rdf
= a URI and URL for an RDF metadata file about the tree
http://bioimages.vanderbilt.edu/baskauf/66921.jpg
= a URI and URL for an image of the tree
http://bioimages.vanderbilt.edu/ind-baskauf/66920
= a URI for the tree itself
How did the web server know what do
do with the HTTP URI?
• Content negotiation=rules about what
representation of a resource a web server
should send when a non-information URI is
sent to it.
• Apache web servers can do it if set up
properly.
• Web browsers ask for HTML content
• Computers (“semantic web user-agents”) ask
for RDF/XML content
What the heck’s the Semantic Web?
• same thing as “Web 2.0”
• an idea pushed by Tim Berners-Lee (inventor
of the Web
• a way for programs like web crawlers (e.g.
GoogleBot) to know rather than guess.
• Disco=an RDF browser
• http://www4.wiwiss.fu-berlin.de/rdf_browser/
• http://bioimages.vanderbilt.edu/ind-baskauf/66920
3. What is a persistent HTTP URI?
One of my favorite websites:
http://tenn.bio.utk.edu/vascular/vascular.html
Oops. It’s now:
http://tenn.bio.utk.edu/vascular/vascular.shtml
Unchanging local file names
http://bioimages.vanderbilt.edu/baskauf/66921.htm
vs.
http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/34
56/2304
What’s in the HTML of the first URI?
<script type="text/javascript">
window.location.replace("../metadata.htm?baskauf/66921/metadata/img/3456/2304");
</script>
The first URI is also a “cool” URI (easy to
remember).
Unchanging domain names
http://www.bioblitznashville.org/
vs.
http://bioimages.vanderbilt.edu/
If I die, get fired, or loose interest in Bioimages, the
HTTP URIs could still continue to be resolved for a
long time.
How long is “persistent”?
• Forever is a pretty long time.
• The Internet is only 40 years old and the Web
only 20.
• I say if you can foresee your institution and
domain name lasting 10 years, go for it!
• Alternative? tdwg.org subdomain (but GUID
review is 188 days old!)
Why do we need GUIDs?
• They provide a convenient way to cite
ANYTHING and allow a reader to obtain
further information with only a Web browser.
• They allow metadata about a resource to
unambiguously refer to other resources at
other institutions (e.g. duplicate specimens,
live plant images and specimens)
• They make it possible to have a system that
can update itself automatically.
STOP WAITING and go for it!
• There is nothing that would stop most of us
from starting to use HTTP URI guids within a
month. Forget about LSIDs.
• If you are afraid of RDF, ignore it and worry
about it later. Rules were made to be broken.
• See http://bioimages.vanderbilt.edu/ for more
information about everything here and
examples. Also a link to Apache page on
content negotiation.