Transcript powerpoint

Artificial Intelligence and Lisp #6
1. Semantic Web
2. Description Languages
3. Ontologies
Today's theme:
What are the methods for organizing
large volumes of facts and knowledge
of the kind that is needed in many practical
Artificial intelligence systems
Quantities involved (examples)

Encyclopaedia Britannica: 500.000 articles,
40.000.000 words in 30 volumes

Nationalencyklopedin: 172.000 articles

Yongle encyclopaedia: ~2.000.000 articles


English-language Wikipedia: 3.052.283 articles
(2009-10-05) with > 1.000.000.000 words
Dbpedia: ~ 420.000 nodes
0. Ontologies



In philosophy: what the world is like [in our
models of it]
In computer and information science, including
artificial intelligence: formal frameworks for
organizing large amounts of knowledge about
the world
These frameworks may be expressed as
formulas, as diagrams, and/or as data
structures in computers
1. Semantic Web




Term coined by Tim Berners-Lee
Vision: WWW as a universal and active medium for
information exchange using software agents
Commercial interpretation: a network of interacting
service providers
Advancing technology interpretation: the meaning of
information and services on the web is defined,
making it possible for the web to understand and
satisfy the requests of people and machines to use the
web content (as opposed to the hypertext web)
Obtaining the semantic information
for the semantic web




Pre-semantic-web: large, organized project for building
a universal knowledgebase (Cyc project)
Approach in the first stage of semantic web: semantic
annotation in conventional web pages
Current approaches: 1. build knowledgebases by
tapping and processing large knowledge sources that
are available on the conventional web that have been
designed structurally, e.g. dbpedia, wordnet
and 2. download and reverse-engineer large
collections of specialized information on the web
Example from wordnet (transformed to Leonardo notation):
---------------------------------------------------------- ferocity.n0
[: type synset]
[: has-lexes {ferocity.n0 fierceness.n0 furiousness.n0 fury.n0 vehemence.n0
violence.n0 wildness.n2}]
[: explain “the property of being wild or turbulent; 'the storm's violence' “]
[: synset-offset "04978805"]
[: lex-filenum "07"]
[: wordnet-links {[: subclass-of {intensity.n0}]
[: has-derivations {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0
ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0
fierce.s0 ferocious.s0}]
[: has-subclasses {savageness.n0}]}]
[: wordnet-origlinks {[has-hypernyms {intensity.n0}]
[derivation-from {angry.s0 violent.s3 violent.a0 vehement.s0 angry.s0
ferocious.s0 angry.s0 angered.s0 ferocious.s0 boisterous.s0 cutthroat.s0
fierce.s0 ferocious.s0}]
[has-hyponyms {savageness.n0}]}]
Example from dbpedia (transformed to Leonardo notation):
---------------------------------------------------------- Kepler.Johannes
[: type scientist]
[: fullname Kepler.Johannes]
[: source-entities {[: wiki w.Johannes_Kepler]
[: wordnet Kepler.n0]}]
[: given-names <"Johannes">]
[: family-name "Kepler"]
[: date-of-birth [GregCal 1571]]
[: date-of-death [GregCal 1630]]
[: explain-seq <"German astronomer who first stated laws of
planetary motion">]
[: in-classes {astronomer.n0}]
[: in-disciplines {Astronomy Astrology Mathematics
Natural_Philosophy}]
[: studied-at {w.University_of_Tübingen}]
[: worked-at {w.University_of_Linz}]
CIA (U.S. Central Intelligence Agency) webpage:
The World Factbook provides information on the
history, people, government, economy, geography,
communications, transportation, military, and
transnational issues for 266 world entities. Our
Reference tab includes: maps of the major world
regions, as well as Flags of the World, a Physical Map
of the World, a Political Map of the World, and a
Standard Time Zones of the World map.

CIA Factbook, example

Sweden
Chiefs of State and Cabinet Members of Foreign Governments
Date of Information: 7/23/2009
King
CARL XVI GUSTAF
Prime Min.
Fredrik REINFELDT
Dep. Prime Min.
Maud OLOFSSON
Min. of Agriculture, Food, & Fisheries
Eskil ERLANDSSON
Min. of Culture
Lena Adelsohn LILJEROTH
Min. of Defense
Sten TOLGFORS
Min. for Education
Jan BJORKLUND
Min. for Employment
Sven Otto LITTORIN
Min. of Enterprise & Energy
Maud OLOFSSON
Min. of Environment
Anders CARLGREN
Min. of European Affairs
Cecila MALMSTROM
Min. of Finance
Anders BORG
Min. of Foreign Affairs
Carl BILDT
Min. of Foreign Trade
Ewa BJORLING
Min. of Health & Elderly Care
Maria LARSSON
Min. for Higher Education & Research
Tobias KRANTZ
Min. of Infrastructure
Asa TORSTENSSON
Example of (2), European University Association
(around 1000 items in their list of members):
AGH University of Science and Technology (AGH)
AkademiaGórniczo-Hutnicza im.Stanislawa Staszica krakowie
Krakow l Poland l http://www.agh.edu.pl
Individual full member
Agricultural University of Athens (AUA)
Athinai l Greece l http://www.aua.gr/
Individual full member
Akdeniz University (Akdeniz Üniversitesi)
Akdeniz Üniversitesi
Antalya l Turkey l http://www.akdeniz.edu.tr/
Individual full member
Alexander Dubcek University, Trencin
Trencianska univerzita Alexandra Dubceka v Trencíne
Trencin l Slovakia l http://www.tnuni.sk/
Individual Associate Members
Copyright issues for knowledge acquisition
using sources on the www





Which information is covered by copyright (including
copyleft)?
Can there be other kinds of proprietary restrictions?
(Explicit or implicit contracts, EU database directive)
Do these restrictions only apply to redissemination of
information content, or also to download for use in your
own project?
What are the rules if the downloaded information is
integrated with other information and then redisseminated?
What are the rules if the downloaded information is merely
used as an instrument for the processing of other
information?
Semantic web today


A set of design principles
A number of working groups, in particular within the
WWW Consortium

A number of proposed enabling technologies:

Resource Description Framework (RDF)

Data Interchange Formats: RDF/XML, N-Triples

Notations, e.g. Web Ontology Language (OWL)

Software systems supporting these, e.g. Protégé

Published knowledgebases using the above
Knowledge Representation in
semantic web work (so far)




Rely on notational look-and-feel of XML
Strong emphasis on a network representation
consisting of nodes and arcs
Use of a subsumption relation in such networks,
relating a more general and a more specialized
concept
Some notations and systems also use logic
formulas for characterizing other kinds of
restrictions on admissible network structures and
other kinds of information about the domain.
Web Ontology Language (OWL)
The OWL Web Ontology Language is designed for use
by applications that need to process the content of
information instead of just presenting information to
humans. OWL facilitates greater machine interpretability
of Web content than that supported by XML, RDF, and
RDF Schema (RDF-S) by providing additional vocabulary
along with a formal semantics. OWL has three
increasingly-expressive sublanguages: OWL Lite, OWL
DL, and OWL Full.
(From http://www.w3.org/TR/owl-features/)
Namespace declaration
<rdf:RDF
xmlns ="http://www.w3.org/TR/2004/REC-owl-guide-20040210/wine#"
xmlns:vin ="http://www.w3.org/TR/2004/REC-owl-guide-20040210/wine#"
xml:base ="http://www.w3.org/TR/2004/REC-owl-guide-20040210/wine#"
xmlns:food="http://www.w3.org/TR/2004/REC-owl-guide-20040210/food#"
xmlns:owl ="http://www.w3.org/2002/07/owl#"
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd ="http://www.w3.org/2001/XMLSchema#">
The first two declarations identify the namespace associated with this
ontology. The first makes it the default namespace, stating that unprefixed
qualified names refer to the current ontology. The second identifies the
namespace of the current ontology with the prefix vin:. The third identifies
the base URI for this document (see below ). The fourth identifies the
namespace of the supporting food ontology with the prefix food:. The fifth
namespace declaration says that in this document, elements prefixed with
owl: should be understood as referring to things drawn from the namespace
called http://www.w3.org/2002/07/owl#.
This is a conventional OWL declaration, used to introduce
the OWL vocabulary.
Ontology headers
<owl:Ontology rdf:about="">
<rdfs:comment>An example OWL ontology</rdfs:comment>
<owl:priorVersion rdf:resource=
"http://www.w3.org/TR/2003/PR-owl-guide-20031215/wine"/>
<owl:imports rdf:resource=
"http://www.w3.org/TR/2004/REC-owl-guide-20040210/food"/>
<rdfs:label>Wine Ontology</rdfs:label>
...
Classes and Things
<owl:Class rdf:ID="Winery"/>
<owl:Class rdf:ID="Region"/>
<owl:Class rdf:ID="ConsumableThing"/>
<owl:Thing rdf:about="#CentralCoastRegion">
<rdf:type rdf:resource="#Region"/>
</owl:Thing>
Defining and using properties
<owl:ObjectProperty rdf:ID="madeFromGrape">
<rdfs:domain rdf:resource="#Wine"/>
<rdfs:range rdf:resource="#WineGrape"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="course">
<rdfs:domain rdf:resource="#Meal" />
<rdfs:range rdf:resource="#MealCourse" />
</owl:ObjectProperty>
<owl:Thing rdf:ID="LindemansBin65Chardonnay">
<madeFromGrape rdf:resource="#ChardonnayGrape" />
</owl:Thing>
-- LindemansBin65Chardonnay
[: type Thing]
[: madeFromGrape ChardonnayGrape]
Class subsumption; restrictions on properties
<owl:Class rdf:ID="Wine">
<rdfs:subClassOf rdf:resource="&food;PotableLiquid"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#madeFromGrape"/>
<owl:minCardinality rdf:datatype=
"&xsd;nonNegativeInteger">1</owl:minCardinality>
</owl:Restriction>
</rdfs:subClassOf>
...
</owl:Class>
The restriction subexpression represents an “anonymous” class. It imposes
the condition that each instance of the type Wine must have at least one link
labelled madeFromGrape
Essential points about OWL





Represents binary relations between entities
Well developed machinery for managing name
spaces, versions, and the like which is considered as
necessary for large knowledgebases with distributed
contents and distributed development activity
Relies on XML syntactic tradition
Knowledge modules are organized as documents,
somewhat analogous to computer programs: entities
are “declared” before they are “used”
Not easily readable; graphic interfaces are required for
practical work with the notation
Essential point about ontology
representation languages and systems

Major issues: subsumption hierarchies, restrictions on
admissible structures, information that makes logical
inference possible

Other issues that are needed in practice:

Namespaces

Administration of modules: comments, version
information, author and IPR information, etc.

A supertype system, e.g. class vs thing in OWL

A conventional type system

Issue: how to relate subsumption, supertypes, and types?
2. Description Languages


Description languages are very widely used in
Semantic Web contexts and based on OWL
The presentation here will use standard
description-language notation but assuming that
the knowledgebase is expressed in Leonardo
notation
Description Languages: Basic concepts





An information state (e.g. the contents of an
entityfile) is called a description information
state iff it satisfies the following:
Two kinds of entities in it: classes, individuals.
Classes are also called concepts.
Classes have only one attribute, designates
Individuals can have several attributes, but their
values must always be sets of individuals
Attributes of individuals are called roles and
represent binary relations between individuals.
Class expressions and assertions





Example:
mammal ∏ whale 
land-animal
Note: the 
symbol should be written
squarishly as shown in the lecture notes
Evaluation: the value of a class symbol .c is
obtained as (get .c designates)
The values of set theory-like expressions are obtained
as in set theory

The predicate 

Class expressions can also be used for other purposes.
is evaluated as subset-or-equal
Additional kinds of class expressions

Let I be the domain of all individuals

 evaluates to the empty set


r.C evaluates to the set of all .i  I such
that (get .i r) is a subset of the value of C
r.c evaluates to the set of all .i  I such
that some member of (get .i r) is a
member of the value of C
Additional kinds of class expressions

Let I be the domain of all individuals

 evaluates to the empty set



r.C evaluates to the set of all .i  I such
that (get .i r) is a subset of the value of C
r.c evaluates to the set of all .i  I such
that some member of (get .i r) is a
member of the value of C
The variant of description language using these
constructs is characterized as ALC
Example 1


member.my-golf-club ∏ children.female
can be assumed to represent the set of those
members of my golf club that have at least one
daughter
Example 2






Hospital application using three classes: patients,
illnesses, clinical laboratories
Some of the illnesses are highly contagious diseases
Some of the laboratories are able to handle patients
having those
Patients characterized using the attributes (roles)
has-illness and in-lab, labs using handles-contag
The obvious handling rule can be expressed as:
patient ∏ has-illness.contag 

in-lab.handles-contag
More expressive description languages





Inverse relation operator, r- (- in the exponent)
Qualified number restrictions (generalization of
the  and  operators) e.g.
member.my-golf-club ∏
2 children.female
for “those members of my golf club having at
least 2 daughters”
Topics in Description Languages


Traditional, major research issue: what is the
computational complexity (i.e. worst-case resource
needs, in particular for computation time) for a given
variety of description language?
Additional major issue: how can the basic notions of
description languages be extended to additional types
of problems, for example for representing actions, and
representing spatio-temporal information. (In this
respect it competes in particular with logic-based
approaches)
3. Ontologies


Practical definition: that which can be
expressed using an ontology language or
ontology representation system
Some contenders:

SUMO (Suggested Upper Merged Ontology)

CYC, OpenCyc, ResearchCyc

SUMO (Suggested Upper Merged
Ontology)

The Suggested Upper Merged Ontology (SUMO) and its
domain ontologies form the largest formal public ontology in
existence today. They are being used for research and
applications in search, linguistics and reasoning.
SUMO is the only formal ontology that has been mapped to all
of the WordNet lexicon.
SUMO is written in the SUO-KIF language.
SUMO is free and owned by the IEEE. The ontologies that
extend SUMO are available under GNU General Public
License.
Adam Pease is the Technical Editor of SUMO.
(From http://www.ontologyportal.org/ )
I. Geography Terms for the CIA World Fact Book
;; A. Location
;; B. Geographic coordinates
;; C. Map references
;; D. Area
;; E. Area - comparative
;; F. Land boundaries
;; G. Coastline
;; H. Maritime claims
;; I. Climate
;; J. Terrain
;; K. Elevation extremes
;; L. Natural resources
;; M. Land use
;; N. Irrigated land
;; O. Natural hazards
;; P. Environment - current issues
;; Q. Environment - international agreements
;; R. Geography - note
;; II. General Geography Terms and Background
;; A. Planet Geography & Astronomical Bodies
;; B. Directions and Distances
;; C. Land Forms
;; D. Water Areas
;;
1. Oceans & Seas
;;
2. Tides & Currents
;;
3. Water Subregions
;;
4. Fresh Water Areas
;; E. Coastal and Shoreline Areas
;; F. Air and Atmosphere
;; G. Weather & Climate
;; H. Vegetation and Biomes
;; I. Natural Disasters
;; J. Environmental Areas of Concern
(subclass SubtropicalDesertClimateZone DesertClimateZone)
(documentation SubtropicalDesertClimateZone
EnglishLanguage
"&%SubtropicalDesertClimateZone is a subclass of
&%DesertClimateZone that is characterized by an
average temperature greater than 18 degrees Celsius,
as well as very low rainfall. This is Koeppen system
'BWh'.")
(=>
(and (instance ?AREA DesertClimateZone)
(subclass ?MO Month)
(averageTemperatureForPeriod ?AREA ?MO ?TEMP)
(greaterThan ?TEMP
(MeasureFn 18 CelsiusDegree) ))
(instance ?AREA SubtropicalDesertClimateZone) )
(subclass LandlockedWater BodyOfWater)
(documentation LandlockedWater EnglishLanguage "&%LandlockedWater includes
water areas that are surrounded by land, including salt lakes, fresh water lakes,
ponds, reservoirs, and (more or less) wetlands.")
; need a way to say that the body of water is surrounded by land (e.g., perimeter)
(subclass SaltLake SaltWaterArea)
(subclass SaltLake LandlockedWater)
(documentation SaltLake EnglishLanguage
"&%SaltLake is the class of landlocked bodies of salt water, including those
referred to as 'Seas', e.g., the &%CaspianSea. But note that the
&%MediterraneanSea is a &%Sea.")
(instance CaspianSea SaltLake)
(names "Caspian Sea" CaspianSea)
(instance AralSea SaltLake)
(names "Aral Sea" AralSea)
(instance GreatSaltLake SaltLake)
(names "Great Salt Lake" GreatSaltLake)
(geographicSubregion GreatSaltLake Utah)
(instance DeadSea SaltLake)
(names "Dead Sea" DeadSea)
etc.
(instance GulfOfOman Gulf)
(instance GulfOfOman SaltWaterArea)
(names "Gulf of Oman" GulfOfOman)
(connected StraitOfHormuz GulfOfOman)
(connected GulfOfOman ArabianSea)
(meetsSpatially Iran GulfOfOman)
(meetsSpatially Oman GulfOfOman)
(instance GulfOfAden Gulf)
(instance GulfOfMexico Gulf)
(instance PersianGulf Gulf)
These are all the instances of Gulf in the file
This topic is interrupted in the middle More about ontologies in the next lecture;
also, more about large knowledgebases,
and some about multiple inheritance