Knowledge Acquisition on the Semantic Web

Download Report

Transcript Knowledge Acquisition on the Semantic Web

Tools for the
Semantic web
Jim Hendler
http://www.mindswap.org
Sem Web: What it’s all about
Knowledge representation, as this technology is often
called, is currently in a state comparable to that of
hypertext before the advent of the web: it is clearly a
good idea, and some very nice demonstrations exist, but it
has not yet changed the world. It contains the seeds of
important applications, but to unleash its full power it
must be linked into a single global system.
-- Tim Berners-Lee, inventor of the WWW, 2001.
Kyoto U, Oct 2002
2
www.mindswap.org
2
Part I: Review of semantic WEB
Kyoto U, Oct 2002
3
www.mindswap.org
3
On the Web -- links are critical!
Web page
Any Web Resource
<a href=
URI>
HTML
<a href=“http://…”>
On the Semantic WEB -- links are critical!
URI
URI
RDF
Kyoto U, Oct 2002
URI
RDF is like the web!
4
www.mindswap.org
4
Sem Web models start from RDF…
DOC1
<mind:Person rdf:id=“Hendler”>
<mind:title jobs:Professor>
<jobs:placeOfWork http://www.cs.umd.edu>
</mind:Person>
Jobs: Professor
Mind:
Mind:title
DOC1
Jobs:
Kyoto U, Oct 2002
Hendler
Jobs:placeOfWork
5
Web Page
http://www…
www.mindswap.org
5
XML is NOT semantics
Kyoto U, Oct 2002
6
www.mindswap.org
6
XML is NOT semantics
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
7
www.mindswap.org
7
XML is NOT semantics
Xml schema is DOCUMENT checking
photo has multiple subject fields
photo has one physical location
etc.
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
8
www.mindswap.org
8
XML is NOT semantics
Xml schema is DOCUMENT checking
photo has multiple subject fields
photo has one physical location
etc.
WHICH SAYS NOTHING ABOUT
TALKS, SUBJECTS, PEOPLE,
EVENTS, etc.
<photo>
<subject> http://www.w3.org/~timbl </subject>
<name> Tim Berners-Lee</name> </name>
…
</photo>
Kyoto U, Oct 2002
9
www.mindswap.org
9
The SEMANTICS is in
the links (e.g. to ontologies)!
Event:title
<daml:ObjectProperty rdf:ID="photograph">
<rdfs:domain rdf:resource="#Picture"/>
<rdfs:range rdf:resource= …#person"/>
</daml:ObjectProperty>
Event:WebPage
< > rdf:type photo:Photograph,
Photo:File http://…/images#image1,
Photo:topic :event1#event:speaker.
Event1 a Event:event;
date “May 7-11”,
speaker http://…#timbl.html
Title “WWW 2002…”
TimBL rdf:type w3c-ont:person;
name “Tim Berners-Lee”
…
Kyoto U, Oct 2002
<s:Class
rdf:about="http://www.semanticweb.org/o
ntologies/swrc-onto-2000-0910.daml#Conference">
<s:comment>
describes a generic conceptabout events
</s:comment>
<s:subClassOf
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#Event"/>
<a:disjointFrom
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#Workshop"/>
<a:restrictedBy
rdf:resource="http://www.semanticweb.or
g/ontologies/swrc-onto-2000-0910.daml#genid18"/>
<rdf:Description rdf:about="http://www.w3.org/2001/03/earl/0.95#Person">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="http://www.w3.org/2001/03/earl/0.95#Assertor"/>
</rdf:Description>
10
www.mindswap.org
10
Semantic Web Ontologies are “models”
nme
CV
CV
work
vate
educ
CV
ed
uc
•New SW languages add models to provide mappings and structure.
•XML necessary, not sufficient.
Semantics on the WEB
 Web ontologies, like the WWW itself, are not “separable”
 Thinking about the ontologies, without considering
 The links to other ontologies
 The instances that link to them
 The crawling and collecting of ontological terminologies
Is like thinking about the Web without the links!!
Other
titles
Mind:
Other
URIs
Jobs:
Kyoto U, Oct 2002
Other
Professors
Jobs: Professor
Other
Pages
Mind:title
DOC1
Hendler
Jobs:placeOfWork
Other
descriptions
12
Web Page
http://www…
www.mindswap.org
12
Part 2: OWL - The “Web Ontology Language”
OWL
Kyoto U, Oct 2002
13
www.mindswap.org
13
OWL extends RDF…
 RDF-schema
 Class, subclass
 Property, subproperty
 + Restrictions
 Range, domain
 Local, global
 Existential
 Cardinality
 + Combinators
 Union, Intersection
 Complement
 Symmetric, transitive
 + Mapping
 Equivalence
 Inverse
Kyoto U, Oct 2002
rdfs:Class rdf:ID="Meeting">
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#MeetingName"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/>
<daml:cardinality>1</daml:cardinality>
</daml:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#uri"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#uriReference"/>
<daml:maxCardinality>1</daml:maxCardinality>
</daml:Restriction>
</rdfs:subClassOf>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#location"/>
<daml:toClass rdf:resource="http://www.w3.org/2000/10/XMLSchema#string"/>
<daml:cardinality>1</daml:cardinality>
</daml:Restriction>
<rdfs:subClassOf>
<daml:Restriction>
<daml:onProperty rdf:resource="#Issues" />
<daml:toClass rdf:resource="#Issue" />
<daml:minCardinality>0</daml:minCardinality>
</daml:Restriction>
</rdfs:subClassOf>
</rdfs:Class>
14
www.mindswap.org
14
Into a usable “Modeling” language
 In science, models provide interoperability across jargons




Mathematical models: equations of a system
Physical models: “sticks and balls” of the atom
Virtual models: the visualization of a complex data set
INFORMATION MODELS: taxonomies and thesauris
 Ontologies extend thesaurus information models to provide

Semantic restrictions on property relations
 Must have vs. May have vs. Doesn’t have
 Has some vs. has N vs. has 1
 Some vs. All property restrictions

Formal underpinnings
 Logical entailments
 Note: rules, logics, proofs are parts of ontologies, but not yet at a
“consensus” level for standardization

Kyoto U, Oct 2002
Should build as add-ons to OWL to take advantage of “terminology features”
15
www.mindswap.org
15
OWL is not
 OWL is NOT…
 … A knowledge representation language per se
 Definitely not “The standard: for KR”
 … A “Description
Logic” per se
 It does support DL “idioms”
E.g. “Lymphoma” is restricted to be a subClassOf those things whose
“disease” property is “Cancer”
 It will include a “subset” which is
 Complete, decidable, in DL complexity case
 But, it will allow uses that DLs do not


Maybe outside the “semantics” of the model theory
 …The
right thing to use in KR/KA research per se
 But do use it to distribute your results
 But do use it to test your theories
Kyoto U, Oct 2002
16
www.mindswap.org
16
OWL is a WEB ontology langauge
 OWL is
 WEB-BASED
 DISTRIBUTED
 MACHINE-PROCESSIBLE
 BASED ON DAML+OIL
 By charter!
 It may become a Web recommendation
 Same “language status” as HTML, XML, XML schema
 A starting place for further evolution

And SMIL, P3P,
 Standard ≠ Use
Kyoto U, Oct 2002
17
www.mindswap.org
17
Part 3: KA in the (OWL supported) Sem Web
 The good news:
 DAML+OIL is already the most used ontology language in history
 Sept 30, 02: Crawler finds 5M+ DAML statements on 20,000+ web pages
Doesn’t include many instance KBs tied to ontologies
 Doesn’t include many very large RDFS-based KBs that include some OWL
 OWL is being supported by large corporation labs
 Web tool developers: IBM, HP, Sun, Intel, Fujitsu
 Content providers: Daimler-Chrysler, Nokia, Motorola, EDS, Agfa
 OWL is starting to be used by thesaurus distributors
 C.f. National Cancer Institute metathesaurus to be released in OWL

 The bad news
 On the web it is a statistical blip -- the web is HUGE (HUMONGOUS!!)
 The big players are still on the sidelines
 We could become the next XML or the next SMIL
Kyoto U, Oct 2002
18
www.mindswap.org
18
Do we need KA?
 Tom Mitchell made an interesting point


He says “users are lazy” they won’t do mark-up
He says we should use NLP + machine learning (primarily)
 He’s WRONG

Greatest impact likely to be non-textual, non-document content
DATA
AND
PROGRAMS
2010
IMAGES
AND
DOCUMENTS
2000
1990
Kyoto U, Oct 2002
19
www.mindswap.org
19
So who is going to mark it up?
 There are not now, and never will be, enough knowledge
engineers to support the important, critical applications of
our technology

Government applications: NASA, US DoD …

Health Care applications: Open Health, Swiss hospitals …

Genomics/Bioinformatics: NCI metathesaurus, Gene Ontology…

...

Historians: Freedman’s project
 Let alone the really important stuff out there

MY information
 My photo archives, my home page, my daughter’s home page, my project
pages, my favorite hobby pages, etc. etc. etc.
Personal information created the Web!!!
Kyoto U, Oct 2002
20
www.mindswap.org
20
Then a miracle occurs
THE WEB!!
Mosaic
M ANY
Q uickTim e™ and a G r aphics decom pr essor ar e needed t o see t his pict ur e.
Tool
USERS
Language
HTML
M ar ket
BETTER
Users
TO O LS
Netscape, IE,
Altavista, etc.
Kyoto U, Oct 2002
21
www.mindswap.org
21
Key: The Value Proposition
 Tools must consider work v. value


People will NOT use tools that require a lot of work and have little
(perceived) value
People WILL use tools that save them work and/or provide high
(perceived) value
 “Perceived” value ≠ “real” value in many cases


Creating Web pages (ca. 1993) was “cool”
No study has yet shown a positive work value for the Web as a whole
 But it has changed the way we live
 Viral: My friend sees it, wants one.
My competitor sees it, needs one
Kyoto U, Oct 2002
TBL’s “secret” advice: Start small but viral
and you can change many things (July, 02)
22
www.mindswap.org
22
Value Proposition 1: Semantic Page Creation
The personal info killer application?
Tell me about your :
Important Person
Hobby
Job
Query
I know about
- Scuba shop
- Scuba vacation 1
- Scuba vacation 2
- Scuba instructor
Ont
Library
Marked Up Pages
classes
Quic kTime™ and a TIFF (LZW) decompress or are needed to see this picture.
Choice
XHTML+OWL
QuickTi me™ and a TIFF (LZW) decompr essor ar e needed to see this picture.
• Many people don’t have home pages
Value: Hints for useful properties (using ontology classes)
Help create content (using ontology instances).
Kyoto U, Oct 2002
•Note: Useful libraries (lots of stuff) already exist (see daml.org)
www.mindswap.org
23
23
Value Proposition 2: Semantic Web Portals
The MOSAIC of the Semantic Web?
<XSLT/>
KB
<Oncogene rdf:ID="Oncogene,
MYB"><code>C3682</code><id>3683</id>
<Found_In_Organism
rdf:ID="Human"></Found_In_Organism>
<Gene_Has_Function rdf:ID="Gene
Transcription"></Gene_Has_Function>
<Gene_Has_Function
rdf:ID="Transcriptional
Regulation"></Gene_Has_Function>
<In_Chromosomal_Location rdf:ID="6q22q23"/>
</Oncogene>
<Oncogene rdf:ID="Oncogene NMYC">
<code>C17656</code><id>17657</id><Found_In_Organism
rdf:ID="Human"></Found_In_Organism>
<In_Chromosomal_Location rdf:ID="2p24.1"/>
<Gene_Has_Function rdf:ID="Transcriptional Regulation">
</Gene_Has_Function><Gene_Associated_With_Disease
rdf:ID="Neuroblastoma ">
</Gene_Associated_With_Disease></ Oncogene>
• Combine browsing, search, and authoring
Value: As I link to concepts, I find useful resources
Pages, Databases, programs, etc.
Kyoto U, Oct 2002
24
www.mindswap.org
24
Value prop 3: Semantic Web Services
Kyoto U, Oct 2002
25
www.mindswap.org
25
VP 3: And service composition
Kyoto U, Oct 2002
Buy the French version of a book from amazon.fr and
have it sent to my mother
26
www.mindswap.org
26
Semantic Web Knowledge Acquisition
 Virtually no one will create ontologies from scratch



High-End ontology developers will be a tiny percentage
(10,000 High end Web Designers = 1/10,000 of users)
It is easier to read then to create ontologies
Expect “cut and paste” (HTML analogy)
 Most used OWL editor to date is Emacs

Can Bootstrap from existing content
 HTML screen scrapers, structured data, Excel spread sheets,…
 No training allowed



Motivated users will skim the docs on occasion
Most users want to use it now
“Everyone” has a browser - deploy tools through that
 Common metaphors must be used: Form fill, menu, search
Note: No formal justification for any of these - but it worked before!
Kyoto U, Oct 2002
27
www.mindswap.org
27
Adding power via Semantic Web
 Tools can be domain independent
 Your tool should be usable in lots of contexts!
 Use the standards:
 OWL and its successors crucial
 Tools should assume multiple ontologies
 “It’s the links, stupid”
 Ontology search, collection, “integration” crucial
 Check out the DAML crawler (http://www.daml.org/crawler)
 BackEnd technologies must be scaleable
 Can co-evolve with Semantic Web size
 But remember, the Web is HUGE
Kyoto U, Oct 2002
28
www.mindswap.org
28
Allow extensibility
 Users MUST be able to add their own concepts
 Semantic Web (and OWL) allow this
 Advanced users will become ontology providers
 It will be “cool” to have yours be the ontology of choice in a domain
 Consistency CANNOT be maintained on the web
 May be a useful heuristic
 Insist on consistency and the Semantic Web fails!
Kyoto U, Oct 2002
29
www.mindswap.org
29
GIVE IT AWAY!!!!!
 There is, and will be, no market for any of this unless we
create it!
 No one will make money selling their tools until we have
MANY more users
 Make small, cheap, easy to download version of your tools
available
 Give it away

The big winners on the web made it available for free:
 Browsers: Mosaic, Netscape, IE
 Plug-ins: Flash, RealPlayer, Quicktime
 Tools: Adobe, Real Media
Kyoto U, Oct 2002
30
www.mindswap.org
30
Part 4: Mindswap tools
Maryland Information and Network Dynamics Laboratory
Semantic Web Agents Project
http://www.mindswap.org/
Kyoto U, Oct 2002
31
www.mindswap.org
31
Practicing what I preach
 Open source Tools at http://www.mindswap.org
 Described in proceedings
 But out of date - open source moves fast

Based on the principals outlined in this talk
 RIC: Ontologies make it EASIER to enter knowledge
Turn properties into forms, use restrictions to check form filling
 Creates a KB of the results that can be used for search
 Coming soon: create a nice web page (using SXMLT)
SMORE: Create content and markup as you go
 Multiple ontology
ConvertToRDF: Dump spreadsheets to RDF using mapping ontology
RDFScreenScraper: turn semi-structured web pages into
ParkaSW: Scaleable, data-based KB back-end
 Some built in inferencing
 Pulled from the patent system to become open source!





Kyoto U, Oct 2002
32
www.mindswap.org
32
Conclusions
 The Semantic Web is real, and it is moving fast

Two years ago you hadn’t heard of it, now it’s on the cover of your proceedings
 We’ll win if we remember the “rules of the web”


Berners-Lee Principle: Build small but viral
Hendler’s Rule: On the web there is no “THE”
 Yours is ONE of the ways of doing it
 Consensus is hard, but critical
We did it once and created
DAML+OIL, the most-used AI language ever
 Everyone’s application is needed



Value proposition: Make it fun, cool, and useful
and people will kill to do the markup
(The Web proves this)
Give it away: Create the markets and we’ll all win
THE
 YOUR work is important!

Kyoto U, Oct 2002
This time it could be for real!
33
www.mindswap.org
33