Transcript Slides

Principles of the Semantic Web
DB or KRDB?
Alon Halevy
Agenda
A spectrum of representation and query
formalisms:
– From relational databases to description logics and
beyond.
Questions:
–
–
–
–
What can we represent?
What can we query?
How much will it cost?
Can we actually use this stuff?
2
Why Do We Care?
Need to represent data and knowledge on the
semantic web.
Need to query.
Need to map between different representations
of data/knowledge.
People are confused, very religious about these
issues.
3
Perspectives from the Structure
Chasm
Authoring
Querying
Data sharing
Writing text
Creating a schema
keywords
Someone else’s
schema
Easy
Committees
& standards
Specifics
The relational model, a few query languages
– Representing real data
Horn rules, the DB view and the KR view.
XML: shattering the myth of no semantics.
Description Logics: a logic of descriptions.
One shameless plug for my past work.
5
Recalling First-Order Logic
A KB is a set of well-formed formulas
X (Person(X)  Mortal(X))
X (Student(X)  Smart(X))
Person(a) v Dog(a)
X (Student(x)  (x=Aristotle))
Interpretation: a mapping from terms to the
universe of discourse.
Model: any interpretation that satisfies the formulas.
Key idea: infer implicit facts from the explicit ones.
6
Example Inferences
The KB:
X (Person(X)  Mortal(X))
X (Student(X)  Smart(X))
Person(a) v Dog(a)
X (Student(x)  (x=Oren))
– Mortal(a)? don’t know.
– Smart(Oren)? Yes.
– Mortal(a) v Mortal(a)? Yes
–X (Person(x)  not Mortal(x))? No
7
Table name
Attribute names
The Relational Data Model
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
Tuples or rows
8
No’s
No negation
No disjunction
Ambiguous support for incomplete information
The database represents a single model.
Hence, inference is just model checking.
9
Integrity Constraints
Very specific forms of logical formulae. Enforced
(maybe) by the database system:
– Functional dependencies: A B
– Foreign key constraints: every tuple in the Purchase
table must refer to a product in the Product table.
– Multi-valued dependencies.
– Tuple-generating dependencies, etc. etc.
10
Inference: Type 1 (Querying)
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)
Find all countries that manufacture some product in the
‘Gadgets’ category.
SELECT country
FROM
Product, Company
WHERE manufacturer=cname AND category=‘Gadgets’
Q(c) :- Product(w,y,’Gadgets’,x), Company(x,p,c)
11
Query Language Features
Fundamental question: what queries can I
express with my language?
Start with selection, projection, and join.
+ union and negation (= relational completeness)
To deal with real data:
– Grouping and aggregation
– Dealing with duplicates
– Outer joins
12
Inference Type 2: Query Containment
• Question: is the result of Q1 always a subset of
Q2?
Q1(A,B) :- cites(A,B), cites(B,A), sameTopic(A,B)
Q2(C,D) :- cites(C,C1), cites(D,D1)
• Inference on a very specific type of formula.
• Only finite models are considered.
13
Complexity Results Galore
For select-project-equi-join: NP-Complete
Add comparisons (e.g, <): Pi^p_2 complete.
Add negation:
– Level 0: still Pi^p_2 complete.
– Level 2: undecidable
– Level 1: Sagiv and Ullman know but don’t want to tell
Allow at most 2 occurrences of every predicate
name: polynomial.
A lot of papers.
14
Last Week Recap
The relational model: ground facts + UNA + CWA.
Integrity constraints: expressing more knowledge.
Inference Type 1: querying (= model checking).
– Note: polynomial time is not good enough.
Inference Type 2: query containment
– Type 2.5: answering queries using views.
– Both are an inference problem of a particular type of
formula in first-order logic over finite models.
15
Beyond Containment
Answering Queries Using Views
Given a query Q and a set of view definitions V1,…,Vn:
Is it possible to answer Q using only the V’s?
V1(A,B) :- cites(A,B), cites(B,A)
V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1)
Query:
q(x,y) :- sameTopic(x,y), cites(x,y), cites(y,x)
Query rewriting:
q’(X,Y) :- V1(X,Y), V2(X,Y)
16
Didn’t We Say 590 Semantic Web?
Assume a virtual schema of the WWW, e.g.,
– Course(number, university, title, prof, quarter)
Every data source on the web contains the
answer to a view over the virtual schema:
UW database: SELECT number, title, prof
FROM Course
WHERE univ=‘UW’ AND quarter=‘2/02’
Stanford database: SELECT number, title, prof, quarter
FROM Course
WHERE univ=‘Stanford’
User query: find all professors who teach “database systems”
17
Horn Rules / Datalog
Easy for KR people. Hard for database people.
Add recursion to the query language:
– Path (x,y) :- edge(x,y)
– Path (x,y) :- Path(x,z), Path(z,y)
DB people consider least fixed point semantics.
{ edge(a,b), Path(a,b), Path(b,c), Path(a,c)}:
– Is a model for KR folks, not DB folks.
Recursion is not expressible in first-order logic.
18
More on Datalog
Many clever algorithms for evaluating datalog
queries:
– They have fancy names: e.g., magic sets
Query containment: undecidable, unless you
constrain the queries (ask Surajit)
Some ideas made it into SQL and relational
systems:
– Magic sets (useful even without recursion)
– Linear recursion in SQL-3
19
XML
<db>
<book>
<title>Complete Guide to DB2</title>
<author>Chamberlin</author>
</book>
<book>
<title>Transaction Processing</title>
<author>Bernstein</author>
<author>Newcomer</author>
</book>
<publisher>
<name>Morgan Kaufman</name>
<state>CA</state>
</publisher>
</db>
20
XML: Issues
Data model: edge-labeled graph (/tree):
– The tags can be viewed as binary relations
Features:
–
–
–
–
–
The schema is embedded in the data.
Can have a predefined schema (XML Schema)
Nesting can be arbitrary.
Can be irregular (e.g., different formats for elements)
Order of elements may be important.
21
Querying XML
XPath, XQuery, XSLT.
XQuery is based on XPath, XML-QL, SQL.
Query languages features:
– Path expressions
– The Return clause: creates the output XML
document.
– Standard query language bells and whistles.
– Not in XQuery: tag variables – bind variables to
schema elements.
Query containment: ask Dan and Gerome.
22
Knowledge Representation
McCarthy suggested some form of first-order
logic.
Semantic networks – popular on the east coast.
Evolved into:
– Frame-based systems
– Description logics (a.k.a. terminological logics).
23
Description Logics
A subset of first-order logic with a German
syntax. No variables.
Allows only:
– Unary relations (Concepts): Person, Happy
– Binary relations (Roles, attributes): childOf
A DL Knowledge base:
– Abox of ground facts: Person(sue), Happy(bob)
– Tbox of definitions.
24
Concept Descriptions
Built using a set of constructs:
C, D  A
T
^
CD
CUD
C
 R.C
 R.C
|
|
|
|
|
|
|
|
Primitives Konzept
Top Konzept
Bottom Konzept
Durchschnitt
Vereinigung
Komplement
Rollenquantifikation/Werterestriktion
Rollenquant./Existentielle Restriktion
25
Concept Descriptions
Built using a set of constructs:
C, D  A
T
^
CD
CUD
C
 R.C

 R.C

(> n R)
|
|
|
|
|
|
|
|
|
Primitive concepts
Top concept
Bottom concept
Intersection
Union
Complement
Universal restriction
Existential restriction
number restriction
26
TBox Assertions
Concept introduction:
– Person  Mammal
Concept definition:
– Parent = Person  (> 0 child)
– HappyParent = Parent  ( child.Smart)
Inclusion assertions:
– Parent  ( child. (= name Karina)) HappyParent
– Person  (> 5 child)  HappyParent ^
27
Reasoning in DLs.
Note: you can assert view facts – HappyParent(bob)
– You don’t know who the children are, but you know
they’re smart.
Classification: C(a)?
Consistency: is C necessarily an empty concept?
Subsumption: C1 C2?
Theoretically, everything boils down to subsumption.
Complexity depends on set of constructors allowed.
28
DLs vs. Horn rules
Horn rules can handle any variable pattern
– DL’s can handle only specific patterns.
DL’s can do subsumption with negation, number
restrictions, and various other features.
They can be combined but decidability is subtle
(see CARIN, [Levy and Rousset, 1996]).
29