Relational Models

Download Report

Transcript Relational Models

Relational Models
CSE 515 in One Slide
We will learn to:
 Put probability distributions on everything
 Learn them from data
 Do inference with them
Beyond Vectors
We want to put distributions on:
 Trees
 Graphs
 Objects of different types and their relations
 Class hierarchies
 Relational databases
 Knowledge bases
 Programs
 Etc.
Two Approaches

Piecemeal


Develop probabilistic models for each
All-in-one


All are easily expressed in first-order logic
Add probability to first-order logic
First-Order Logic







Symbols: Constants, variables, functions, predicates
E.g.: Anna, x, MotherOf(x), Friends(x, y)
Logical connectives: Conjunction, disjunction,
negation, implication, quantification, etc.
Literal: Atom (predicate) or its negation
Clause: Disjunction of literals
All formulas can be converted to conjunction
of clauses
Grounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)
World: Assignment of truth values to all ground atoms
Example: Friends & Smokers
Example: Friends & Smokers
Smoking causes cancer.
Friends have similar smoking habits.
Example: Friends & Smokers
x Smokes( x )  Cancer ( x )
x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Relational Models
Representation
Logical
Language
Probabilistic
Language
Knowledge-based
model construction
Stochastic logic
programs
Probabilistic
relational models
Relational Markov
networks
Bayesian logic
Horn clauses
Bayes nets
Horn clauses
PCFGs
Frame systems
Bayes nets
SQL queries
Markov nets
First-order
Bayes nets
Markov logic
First-order logic Markov nets
Markov Logic



Most developed approach to date
Many other approaches can be viewed
as special cases
Main focus of this lecture
Markov Logic: Intuition



A logical KB is a set of hard constraints
on the set of possible worlds
Let’s make them soft constraints:
When a world violates a formula,
It becomes less probable, not impossible
Give each formula a weight
(Higher weight  Stronger constraint)
P(world)  exp  weights of formulas it satisfies

Markov Logic: Definition

A Markov Logic Network (MLN) is a set of
pairs (F, w) where



F is a formula in first-order logic
w is a real number
Together with a set of constants,
it defines a Markov network with


One node for each grounding of each predicate in
the MLN
One feature for each grounding of each formula F
in the MLN, with the corresponding weight w
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Two constants: Anna (A) and Bob (B)
Smokes(A)
Cancer(A)
Smokes(B)
Cancer(B)
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Two constants: Anna (A) and Bob (B)
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Two constants: Anna (A) and Bob (B)
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
Example: Friends & Smokers
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Two constants: Anna (A) and Bob (B)
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
Markov Logic Networks

MLN is template for ground Markov nets

Probability of a world x:
1


P( x)  exp   wi ni ( x) 
Z
 i

Weight of formula i



No. of true groundings of formula i in x
Typed variables and constants greatly reduce
size of ground Markov net
Functions, existential quantifiers, etc.
Infinite and continuous domains
Relation to Statistical Models



Markov logic has all the models we’ve seen
in this class as special cases
Markov logic allows objects to be
interdependent (non-i.i.d.)
Markov logic makes it easy to compose
models
Relation to First-Order Logic



Infinite weights  First-order logic
Satisfiable KB, positive weights 
Satisfying assignments = Modes of distribution
Markov logic allows contradictions between
formulas
Inference in Markov Logic




P(Formula|MLN,C) = ?
MCMC: Sample worlds, check formula holds
P(Formula1|Formula2,MLN,C) = ?
If Formula2 = Conjunction of ground atoms


First construct min subset of network necessary to
answer query (generalization of KBMC)
Then apply MCMC, belief propagation, etc.
Ground Network Construction
network ← Ø
queue ← query nodes
repeat
node ← front(queue)
remove node from queue
add node to network
if node not in evidence then
add neighbors(node) to queue
until queue = Ø
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
Friends(A,B)
Friends(A,A)
Smokes(A)
Smokes(B)
Cancer(A)
Friends(B,B)
Cancer(B)
Friends(B,A)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Example Grounding
1.5 x Smokes( x )  Cancer ( x )
1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 
e 2.2 if Smokes(B)
Friends(A,B)
 0
e otherwise
Friends(A,A)
Smokes(A)
Smokes(B)
e1.5Cancer(A)
if Smokes(B)  Cancer ( B )
 0
Friends(B,A)
e otherwise
Friends(B,B)
Cancer(B)
P( Cancer(B) | Smokes(A), Friends(A,B), Friends(B,A))
Other Options


Lazy inference
Lifted inference
Learning




Data is a relational database
Closed world assumption → Complete data
Learning parameters (weights)
Learning structure (formulas)
Weight Learning

Parameter tying: Groundings of same clause

log Pw ( x)  ni ( x)  Ew ni ( x)
wi
No. of times clause i is true in data
Expected no. times clause i is true according to MLN


Generative learning: Pseudo-likelihood
Discriminative learning: Cond. likelihood,
use MC-SAT or MaxWalkSAT for inference
Structure Learning








Generalizes feature induction in Markov nets
Any inductive logic programming approach
can be used, but . . .
Goal is to induce any clauses, not just Horn
Evaluation function should be likelihood
Requires learning weights for each candidate
Turns out not to be bottleneck
Bottleneck is counting clause groundings
Solution: Subsampling
Structure Learning




Initial state: Unit clauses or hand-coded KB
Operators: Add/remove literal, flip sign
Evaluation function:
Pseudo-likelihood + Structure prior
Search: Beam, shortest-first, bottom-up, etc.
Alchemy
Open-source software including:
 Full first-order logic syntax
 Inference: MCMC, belief propagation, etc.
 Generative & discriminative weight learning
 Structure learning
 Programming language features
alchemy.cs.washington.edu
Example:
Information Extraction
Parag Singla and Pedro Domingos, “Memory-Efficient
Inference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficent
inference in relatonal domains. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence
(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inference
with Probabilistic and Deterministic Dependencies”, in
Proc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence.
Segmentation
Author
Title
Venue
Parag Singla and Pedro Domingos, “Memory-Efficient
Inference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficent
inference in relatonal domains. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence
(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inference
with Probabilistic and Deterministic Dependencies”, in
Proc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence.
Entity Resolution
Parag Singla and Pedro Domingos, “Memory-Efficient
Inference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficent
inference in relatonal domains. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence
(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inference
with Probabilistic and Deterministic Dependencies”, in
Proc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence.
Entity Resolution
Parag Singla and Pedro Domingos, “Memory-Efficient
Inference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficent
inference in relatonal domains. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence
(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inference
with Probabilistic and Deterministic Dependencies”, in
Proc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of the
Twenty-First National Conference on Artificial Intelligence.
State of the Art

Segmentation


Entity resolution



HMM (or CRF) to assign each token to a field
Logistic regression to predict same field/citation
Transitive closure
Alchemy implementation: Seven formulas
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}
field = {Author, Title, Venue}
citation = {C1, C2, ...}
position = {0, 1, 2, ...}
Token(token, position, citation)
InField(position, field, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}
field = {Author, Title, Venue, ...}
citation = {C1, C2, ...}
position = {0, 1, 2, ...}
Token(token, position, citation)
InField(position, field, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Optional
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}
field = {Author, Title, Venue}
citation = {C1, C2, ...}
position = {0, 1, 2, ...}
Token(token, position, citation)
Evidence
InField(position, field, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}
field = {Author, Title, Venue}
citation = {C1, C2, ...}
position = {0, 1, 2, ...}
Token(token, position, citation)
InField(position, field, citation)
SameField(field, citation, citation)
SameCit(citation, citation)
Query
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
Token(+t,i,c) => InField(i,+f,c)
InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)
f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’)
^ InField(i’,+f,c’) => SameField(+f,c,c’)
SameField(+f,c,c’) <=> SameCit(c,c’)
SameField(f,c,c’) ^ SameField(f,c’,c”)
=> SameField(f,c,c”)
SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Results: Segmentation on Cora
1
Precision
0.8
0.6
0.4
Tokens
0.2
Tok. + Seq. + Period
Tokens + Sequence
Tok. + Seq. + P. + Comma
0
0
0.2
0.4
0.6
Recall
0.8
1
Results:
Matching Venues on Cora
1
Precision
0.8
0.6
Similarity
0.4
Sim. + Relations
Sim. + Transitivity
0.2
Sim. + Rel. + Trans.
0
0
0.2
0.4
0.6
Recall
0.8
1