Semantic Documents - School of Information

Download Report

Transcript Semantic Documents - School of Information

Semantic Web Technologies
• Web Site syllabus still developing
- http://www.ischool.utexas.edu/~i385t-sw
• Readings Discussion
• Discussion: What isn't the Semantic Web?
• Class work: Using feed reader applications
and blog posting demonstrations
• Research Presentation Topics
Semantic Technologies Stack
Semantic Web elements
• XML
- Structured markup languages
• RDF
• DAML + OIL
• XHTML
- Universal Resource Identifiers
• URLs of course
• Structured, parsable addressing
- http://www.shadows.com/tags/semantic_web
- http://www.flickr.com/photos/tags/austin
- http://www.amazon.com/exec/obidos/externalsearch/103-39923787183068?keyword=ajax&tag=donturnbullweb&mode=b
ooks
Structure is (still) the gateway
• Web Services
- The URI describes the functional parameters
- The system does the REST
- The client is a smart interpreter of the results
• Web services have a grammar
- Defined by standards
- Initiated by the URI
• The request
- Implemented by the system
• The supplied
• Logic, Classification & Ontologies all provide
additional functionality & structure
• Never underestimate the power of plain text
- Machine readable w/o extra work
- Human understandable (for lightweight semantics)
Documents are the Structure
• XML: markup
language for
encoding semantics
• Everyone
understands XML
- Especially browsers &
Web crawlers
- Or thinks they do,
which still expands
adoption
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
…
XML: Lingua Franca for SWT
• “XML may become the primary syntax for all
enterprise data” p 27-28
-
Application independent
Standard syntax for metadata
Standard structure for documents & data
It’s already in use
• It isn’t about the CPU, it’s about being open
• Structured documents use logic for semantic
descriptions
- And it’s not all about metadata
• If it’s not easily readable, you get a legend
- Schemas, DTDs, …
The XML Philosophy
• XML is the syntax guidelines for markup
• Common structural elements are specific to each
genre of use
• Markup is based on elements
- A container with start and end tags
- Elements can have sub elements
• Roots & trees
- Roots define the structure
- Trees are the hierarchy within
- Inheritance defines the relationships
• Like HTML, but stricter with the structure (XHTML)
- Validated XML (or XHTML) means it is usable, not correct
• XML Schemas are the specific rules for validation
XML Schemas
• A “definition language” to constrain semantic
vocabulary & hierarchical structure
• Taken from database schemas, that defines
the data types, fields & tables in a DBMS
• Most are not complex
- But validation is key to making Semantics useful
• Schemas by another name:
- Document Type Definition (DTD)
- RELAX NG
- Schematron (XPath)
XML Schema Specifics
• An XML Schema defines:
-
elements that can appear in a document
attributes that can appear in a document
which elements are child elements
the order of child elements
the number of child elements
whether an element is empty or can include text
data types for elements & attributes
default and fixed values for elements & attributes
XML Namespaces
• Namespaces define the markup globals
- Building blocks: metadata & local <xsd: integer>
- Calls from others
- <xsd: schema
xmlns:xsd:http://www.w3.org/2001/XMLSchema
targetNamespace=http://www.utexas.edu/markup>
• What you commonly see:
- <html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en-US" lang="en-US">
Schemas & Instances
Document Object Model
• Part of the machine executable rules of the
markup language & schema
• Controls behavior in Web browsers too
• DOM Level 3 supports Semantics
• We’ll see more about the DOM in later weeks
- Web 2.0, AJAX & REST rely on it heavily
Resource Description Framework
• What’s not a Resource?
- That’s good & bad
• “RDF captures meta data about the ‘externals’
of a document, like the author, the creation
date, and type” p 85
- Non-text & discrete objects (images, music,
bookmarks)
- A triplet defining anything
• Subject
• Predicate
• Object
RDF Grammar
• Describing the author of a document
• http://www.utexas.edu/index.html has a author
whose value is Don Turnbull
• the RDF terms for the various parts of the
statement are:
- the subject is the URL
http://www.utexas.edu/index.html
- the predicate is the word author
- the object is the phrase “Don Turnbull”
• Describing knowledge is subtle, metadata
definition is not always easy.
RDF Barriers
• People don’t use reification well or at all (provenance
metadata)
- Inheritance is tricky & the logic must be parsed
• Containers are very flexible
- Bags allow any order
- Sequences can be more complex than alphabetical
- Alternates depend on the instance
• Syntax is varied
• Examples are “simple”, but still not completely
utilized
- Dublin Core
- RSS
• Tools will help as will industry use
- Podcasts (Media RSS)
• More on this and RDF Schemas themselves later
Xpath
• Control syntax for all manner of XML
interaction & addressing
• Allows for finding, parsing & manipulating
data in a document
- See XSLT
• Examples:
- selects the document root (which is always the
parent of the document element)
- child::para selects the para element children of the
context node
Xquery & Xforms
• A structured query language for XML
- Allows for building virtual documents from parts of
other documents
- Understands the rules of schemas, markup &
metadata to perform application-level functions on
data
- Tool support is growing including DBMS vendors
- Works with Xforms to provide RDBMS access to
URI addressable data
More Semantic Standards
• Xlink
- Conditional link syntax far beyond anchors & addressing
• Xpointer
- Allows for building (& including) aggregated, distributed
applications & interfaces
• Xinclude
- Provides “make file” syntax for building master documents
or constructing complex Semantic inheritance & interaction
• XMLBase
- Syntax for resolving & recommending relevant URIs
• Style Sheets
- XSL
- XSLT
- XSLFO
Feed Readers & blog posting
• How do you use Semantic Web technologies?
- Browsing
- Retrieval
- Sharing
• Readers
• Blogging is easy
What isn’t the Semantic Web?
• “bring structure to the meaningful content of
Web pages, creating an environment where
software agents roaming from page to page
can readily carry out sophisticated tasks for
users”
(Berners-Lee, 2001)
• What do you think now?
• How promising can SWT be?
- As everyday systems
• Is it a new way to solve problems?
- Or
• A new set of capabilities & solutions?
Topic Selection
• Choose a topic (and corresponding week) to overview
• Topic Presentations should include:
-
Overview of the technology
Provide examples of the technology in use
Show how to build using the technology (examples)
A list of citations and readings that you drew from and for
extended reference
• Do not rely on wikipedia & blogs as your only sources
• Academic journal & conference papers
• Books (development or conceptual design)
• How can these Semantic Web technologies help
coordinate, discover, organize information and
knowledge?
• Your own point of view about the practicality &
promise of these tools & procedures
Current list of Topics
•
•
•
•
•
•
•
•
•
•
•
•
•
RDF
Metadata (e.g. Dublin Core, MediaRSS)
Ontology building (applications)
REST, XMLHttpRequest & AJAX
Greasemonkey
Javascript: Introduction
Javascript: Advanced
TagClouds
GIS, Maps & Mapping Mashups
XSLT
WordNet
Semantic Commerce
Trust
Next Week
•
•
•
•
Readings & Discussion
Blogging & Tagging (ongoing)
Finalize topics & presentation dates
Suggestions for speakers