Knowledge Acquisition on the Semantic Web
Download
Report
Transcript Knowledge Acquisition on the Semantic Web
Tools for the
Semantic web
Jim Hendler
http://www.mindswap.org
Sem Web: What it’s all about
Knowledge representation, as this technology is often
called, is currently in a state comparable to that of
hypertext before the advent of the web: it is clearly a
good idea, and some very nice demonstrations exist, but it
has not yet changed the world. It contains the seeds of
important applications, but to unleash its full power it
must be linked into a single global system.
-- Tim Berners-Lee, inventor of the WWW, 2001.
Kyoto U, Oct 2002
2
www.mindswap.org
2
Part I: Review of semantic WEB
Kyoto U, Oct 2002
3
www.mindswap.org
3
On the Web -- links are critical!
Web page
Any Web Resource
<a href=
URI>
HTML
<a href=“http://…”>
On the Semantic WEB -- links are critical!
URI
URI
RDF
Kyoto U, Oct 2002
URI
RDF is like the web!
4
www.mindswap.org
4
Sem Web models start from RDF…
DOC1
<mind:Person rdf:id=“Hendler”>
<mind:title jobs:Professor>
<jobs:placeOfWork http://www.cs.umd.edu>
</mind:Person>
Jobs: Professor
Mind:
Mind:title
DOC1
Jobs:
Kyoto U, Oct 2002
Hendler
Jobs:placeOfWork
5
Web Page
http://www…
www.mindswap.org
5
XML is NOT semantics
Kyoto U, Oct 2002
6
www.mindswap.org
6
XML is NOT semantics
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
7
www.mindswap.org
7
XML is NOT semantics
Xml schema is DOCUMENT checking
photo has multiple subject fields
photo has one physical location
etc.
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
8
www.mindswap.org
8
XML is NOT semantics
Xml schema is DOCUMENT checking
photo has multiple subject fields
photo has one physical location
etc.
WHICH SAYS NOTHING ABOUT
TALKS, SUBJECTS, PEOPLE,
EVENTS, etc.
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
9
www.mindswap.org
9
The SEMANTICS is in
the links (e.g. to ontologies)!
Event:title
<daml:ObjectProperty rdf:ID="photograph">
<rdfs:domain rdf:resource="#Picture"/>
<rdfs:range rdf:resource= …#person"/>
</daml:ObjectProperty>
Event:WebPage
< > rdf:type photo:Photograph,
Photo:File http://…/images#image1,
Photo:topic :event1#event:speaker.
Event1 a Event:event;
date “May 7-11”,
speaker http://…#timbl.html
Title “WWW 2002…”
TimBL rdf:type w3c-ont:person;
name “Tim Berners-Lee”
…
Kyoto U, Oct 2002
<s:Class
rdf:about="http://www.semanticweb.org/o
ntologies/swrc-onto-2000-0910.daml#Conference">
<s:comment>
describes a generic conceptabout events
</s:comment>
<s:subClassOf
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#Event"/>
<a:disjointFrom
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#Workshop"/>
<a:restrictedBy
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#genid18"/>
<rdf:Description rdf:about="http://www.w3.org/2001/03/earl/0.95#Person">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2001/03/earl/0.95#Assertor"/>
</rdf:Description>
10
www.mindswap.org
10
Semantic Web Ontologies are “models”
nme
CV
CV
work
vate
educ
CV
ed
uc
•New SW languages add models to provide mappings and structure.
•XML necessary, not sufficient.
Semantics on the WEB
Web ontologies, like the WWW itself, are not “separable”
Thinking about the ontologies, without considering
The links to other ontologies
The instances that link to them
The crawling and collecting of ontological terminologies
Is like thinking about the Web without the links!!
Other
titles
Mind:
Other
URIs
Jobs:
Kyoto U, Oct 2002
Other
Professors
Jobs: Professor
Other
Pages
Mind:title
DOC1
Hendler
Jobs:placeOfWork
Other
descriptions
12
Web Page
http://www…
www.mindswap.org
12
Part 2: OWL - The “Web Ontology Language”
OWL
Kyoto U, Oct 2002
13
www.mindswap.org
13
OWL extends RDF…
RDF-schema
Class, subclass
Property, subproperty
+ Restrictions
Range, domain
Local, global
Existential
Cardinality
+ Combinators
Union, Intersection
Complement
Symmetric, transitive
+ Mapping
Equivalence
Inverse
Kyoto U, Oct 2002
rdfs:Class rdf:ID="Meeting">
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#MeetingName"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/>
<daml:cardinality>1</daml:cardinality>
</daml:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#uri"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#uriReference"/>
<daml:maxCardinality>1</daml:maxCardinality>
</daml:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#location"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/>
<daml:cardinality>1</daml:cardinality>
</daml:Restriction>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#Issues" />
<daml:toClass rdf:resource="#Issue" />
<daml:minCardinality>0</daml:minCardinality>
</daml:Restriction>
</rdfs:subClassOf>
</rdfs:Class>
14
www.mindswap.org
14
Into a usable “Modeling” language
In science, models provide interoperability across jargons
Mathematical models: equations of a system
Physical models: “sticks and balls” of the atom
Virtual models: the visualization of a complex data set
INFORMATION MODELS: taxonomies and thesauris
Ontologies extend thesaurus information models to provide
Semantic restrictions on property relations
Must have vs. May have vs. Doesn’t have
Has some vs. has N vs. has 1
Some vs. All property restrictions
Formal underpinnings
Logical entailments
Note: rules, logics, proofs are parts of ontologies, but not yet at a
“consensus” level for standardization
Kyoto U, Oct 2002
Should build as add-ons to OWL to take advantage of “terminology features”
15
www.mindswap.org
15
OWL is not
OWL is NOT…
… A knowledge representation language per se
Definitely not “The standard: for KR”
… A “Description
Logic” per se
It does support DL “idioms”
E.g. “Lymphoma” is restricted to be a subClassOf those things whose
“disease” property is “Cancer”
It will include a “subset” which is
Complete, decidable, in DL complexity case
But, it will allow uses that DLs do not
Maybe outside the “semantics” of the model theory
…The
right thing to use in KR/KA research per se
But do use it to distribute your results
But do use it to test your theories
Kyoto U, Oct 2002
16
www.mindswap.org
16
OWL is a WEB ontology langauge
OWL is
WEB-BASED
DISTRIBUTED
MACHINE-PROCESSIBLE
BASED ON DAML+OIL
By charter!
It may become a Web recommendation
Same “language status” as HTML, XML, XML schema
A starting place for further evolution
And SMIL, P3P,
Standard ≠ Use
Kyoto U, Oct 2002
17
www.mindswap.org
17
Part 3: KA in the (OWL supported) Sem Web
The good news:
DAML+OIL is already the most used ontology language in history
Sept 30, 02: Crawler finds 5M+ DAML statements on 20,000+ web pages
Doesn’t include many instance KBs tied to ontologies
Doesn’t include many very large RDFS-based KBs that include some OWL
OWL is being supported by large corporation labs
Web tool developers: IBM, HP, Sun, Intel, Fujitsu
Content providers: Daimler-Chrysler, Nokia, Motorola, EDS, Agfa
OWL is starting to be used by thesaurus distributors
C.f. National Cancer Institute metathesaurus to be released in OWL
The bad news
On the web it is a statistical blip -- the web is HUGE (HUMONGOUS!!)
The big players are still on the sidelines
We could become the next XML or the next SMIL
Kyoto U, Oct 2002
18
www.mindswap.org
18
Do we need KA?
Tom Mitchell made an interesting point
He says “users are lazy” they won’t do mark-up
He says we should use NLP + machine learning (primarily)
He’s WRONG
Greatest impact likely to be non-textual, non-document content
DATA
AND
PROGRAMS
2010
IMAGES
AND
DOCUMENTS
2000
1990
Kyoto U, Oct 2002
19
www.mindswap.org
19
So who is going to mark it up?
There are not now, and never will be, enough knowledge
engineers to support the important, critical applications of
our technology
Government applications: NASA, US DoD …
Health Care applications: Open Health, Swiss hospitals …
Genomics/Bioinformatics: NCI metathesaurus, Gene Ontology…
...
Historians: Freedman’s project
Let alone the really important stuff out there
MY information
My photo archives, my home page, my daughter’s home page, my project
pages, my favorite hobby pages, etc. etc. etc.
Personal information created the Web!!!
Kyoto U, Oct 2002
20
www.mindswap.org
20
Then a miracle occurs
THE WEB!!
Mosaic
M ANY
Q uickTim e™ and a G r aphics decom pr essor ar e needed t o see t his pict ur e.
Tool
USERS
Language
HTML
M ar ket
BETTER
Users
TO O LS
Netscape, IE,
Altavista, etc.
Kyoto U, Oct 2002
21
www.mindswap.org
21
Key: The Value Proposition
Tools must consider work v. value
People will NOT use tools that require a lot of work and have little
(perceived) value
People WILL use tools that save them work and/or provide high
(perceived) value
“Perceived” value ≠ “real” value in many cases
Creating Web pages (ca. 1993) was “cool”
No study has yet shown a positive work value for the Web as a whole
But it has changed the way we live
Viral: My friend sees it, wants one.
My competitor sees it, needs one
Kyoto U, Oct 2002
TBL’s “secret” advice: Start small but viral
and you can change many things (July, 02)
22
www.mindswap.org
22
Value Proposition 1: Semantic Page Creation
The personal info killer application?
Tell me about your :
Important Person
Hobby
Job
Query
I know about
- Scuba shop
- Scuba vacation 1
- Scuba vacation 2
- Scuba instructor
Ont
Library
Marked Up Pages
classes
Quic kTime™ and a TIFF (LZW) decompress or are needed to see this picture.
Choice
XHTML+OWL
QuickTi me™ and a TIFF (LZW) decompr essor ar e needed to see this picture.
• Many people don’t have home pages
Value: Hints for useful properties (using ontology classes)
Help create content (using ontology instances).
Kyoto U, Oct 2002
•Note: Useful libraries (lots of stuff) already exist (see daml.org)
www.mindswap.org
23
23
Value Proposition 2: Semantic Web Portals
The MOSAIC of the Semantic Web?
<XSLT/>
KB
<Oncogene rdf:ID="Oncogene,
MYB"><code>C3682</code><id>3683</id>
<Found_In_Organism
rdf:ID="Human"></Found_In_Organism>
<Gene_Has_Function rdf:ID="Gene
Transcription"></Gene_Has_Function>
<Gene_Has_Function
rdf:ID="Transcriptional
Regulation"></Gene_Has_Function>
<In_Chromosomal_Location rdf:ID="6q22q23"/>
</Oncogene>
<Oncogene rdf:ID="Oncogene NMYC">
<code>C17656</code><id>17657</id><Found_In_Organism
rdf:ID="Human"></Found_In_Organism>
<In_Chromosomal_Location rdf:ID="2p24.1"/>
<Gene_Has_Function rdf:ID="Transcriptional Regulation">
</Gene_Has_Function><Gene_Associated_With_Disease
rdf:ID="Neuroblastoma ">
</Gene_Associated_With_Disease></ Oncogene>
• Combine browsing, search, and authoring
Value: As I link to concepts, I find useful resources
Pages, Databases, programs, etc.
Kyoto U, Oct 2002
24
www.mindswap.org
24
Value prop 3: Semantic Web Services
Kyoto U, Oct 2002
25
www.mindswap.org
25
VP 3: And service composition
Kyoto U, Oct 2002
Buy the French version of a book from amazon.fr and
have it sent to my mother
26
www.mindswap.org
26
Semantic Web Knowledge Acquisition
Virtually no one will create ontologies from scratch
High-End ontology developers will be a tiny percentage
(10,000 High end Web Designers = 1/10,000 of users)
It is easier to read then to create ontologies
Expect “cut and paste” (HTML analogy)
Most used OWL editor to date is Emacs
Can Bootstrap from existing content
HTML screen scrapers, structured data, Excel spread sheets,…
No training allowed
Motivated users will skim the docs on occasion
Most users want to use it now
“Everyone” has a browser - deploy tools through that
Common metaphors must be used: Form fill, menu, search
Note: No formal justification for any of these - but it worked before!
Kyoto U, Oct 2002
27
www.mindswap.org
27
Adding power via Semantic Web
Tools can be domain independent
Your tool should be usable in lots of contexts!
Use the standards:
OWL and its successors crucial
Tools should assume multiple ontologies
“It’s the links, stupid”
Ontology search, collection, “integration” crucial
Check out the DAML crawler (http://www.daml.org/crawler)
BackEnd technologies must be scaleable
Can co-evolve with Semantic Web size
But remember, the Web is HUGE
Kyoto U, Oct 2002
28
www.mindswap.org
28
Allow extensibility
Users MUST be able to add their own concepts
Semantic Web (and OWL) allow this
Advanced users will become ontology providers
It will be “cool” to have yours be the ontology of choice in a domain
Consistency CANNOT be maintained on the web
May be a useful heuristic
Insist on consistency and the Semantic Web fails!
Kyoto U, Oct 2002
29
www.mindswap.org
29
GIVE IT AWAY!!!!!
There is, and will be, no market for any of this unless we
create it!
No one will make money selling their tools until we have
MANY more users
Make small, cheap, easy to download version of your tools
available
Give it away
The big winners on the web made it available for free:
Browsers: Mosaic, Netscape, IE
Plug-ins: Flash, RealPlayer, Quicktime
Tools: Adobe, Real Media
Kyoto U, Oct 2002
30
www.mindswap.org
30
Part 4: Mindswap tools
Maryland Information and Network Dynamics Laboratory
Semantic Web Agents Project
http://www.mindswap.org/
Kyoto U, Oct 2002
31
www.mindswap.org
31
Practicing what I preach
Open source Tools at http://www.mindswap.org
Described in proceedings
But out of date - open source moves fast
Based on the principals outlined in this talk
RIC: Ontologies make it EASIER to enter knowledge
Turn properties into forms, use restrictions to check form filling
Creates a KB of the results that can be used for search
Coming soon: create a nice web page (using SXMLT)
SMORE: Create content and markup as you go
Multiple ontology
ConvertToRDF: Dump spreadsheets to RDF using mapping ontology
RDFScreenScraper: turn semi-structured web pages into
ParkaSW: Scaleable, data-based KB back-end
Some built in inferencing
Pulled from the patent system to become open source!
Kyoto U, Oct 2002
32
www.mindswap.org
32
Conclusions
The Semantic Web is real, and it is moving fast
Two years ago you hadn’t heard of it, now it’s on the cover of your proceedings
We’ll win if we remember the “rules of the web”
Berners-Lee Principle: Build small but viral
Hendler’s Rule: On the web there is no “THE”
Yours is ONE of the ways of doing it
Consensus is hard, but critical
We did it once and created
DAML+OIL, the most-used AI language ever
Everyone’s application is needed
Value proposition: Make it fun, cool, and useful
and people will kill to do the markup
(The Web proves this)
Give it away: Create the markets and we’ll all win
THE
YOUR work is important!
Kyoto U, Oct 2002
This time it could be for real!
33
www.mindswap.org
33