Daphne Koller
Download
Report
Transcript Daphne Koller
Probabilistic Models
of Relational Domains
Daphne Koller
Stanford University
© Daphne Koller, 2003
Overview
Relational logic
Probabilistic relational models
Inference in PRMs
Learning PRMs
Uncertainty about domain structure
Summary
© Daphne Koller, 2003
Attribute-Based Models &
Relational Models
© Daphne Koller, 2003
Bayesian Networks: Problem
Bayesian nets use propositional representation
Real world has objects, related to each other
Intelligence
Diffic_CS101
Intell_Jane
Difficulty
Diffic_CS101
Intell_George
These
“instances”
are not
independent
Grade_Jane_CS101
A
Grade
Intell_George
Diffic_Geo101
Grade_George_Geo101
© Daphne Koller, 2003
Grade_George_CS101
C
The University BN
Difficulty_
CS101
Difficulty_
Geo101
Grade_
Jane_
CS101
Grade_
George_
CS101
Grade_
George_
Geo101
© Daphne Koller, 2003
Intelligence_
Jane
Intelligence_
George
The Genetics BN
G_Harry
G_Betty
B_Harry
G_Homer
B_Betty
G_Marge
B_Homer
B_Marge
G_Bart
G_Lisa
G_Maggie
B_Bart
B_Lisa
B_Maggie
G = genotype
B = bloodtype
© Daphne Koller, 2003
G_Selma
B_Selma
Simple Approach
Graphical model with shared parameters
… and shared local dependency structure
Want to encode this constraint:
For human knowledge engineer
For network learning algorithm
How do we specify
which nodes share params?
shared (but different) structure across nodes?
© Daphne Koller, 2003
Simple Approach II
We can write a special-purpose program for
each domain:
genetic inheritance (family tree imposes constraints)
university (course registrations impose constraints)
Is there something more general?
© Daphne Koller, 2003
Attribute-Based Worlds
Smart_Jane & easy_CS101 GetA_Jane_CS101
Smart_Mike & easy_Geo101
GetA_Mike_Geo101
Smart_Jane & easy_Geo101 GetA_Jane_Geo101
Smart_Rick & easy_CS221 GetA_Rick_C
World = assignment of values to attributes
/ truth values to propositional symbols
© Daphne Koller, 2003
Object-Relational Worlds
x,y(Smart(x) & Easy(y) & Take(x,y)
Grade(A,x,y))
World = relational interpretation:
Objects in the domain
Properties of these objects
Relations (links) between objects
© Daphne Koller, 2003
Relational Logic
General framework for representing:
objects & their properties
classes of objects with same model
relations between objects
Represent a model at the template level, and
apply it to an infinite set of domains
Given finite domain, each instantiation of the
model is propositional, but the template is not
© Daphne Koller, 2003
Relational Schema
Specifies types of objects in domain, attributes of each
type of object & types of relations between objects
Professor
Classes
Student
Intelligence
Teaching-Ability
Teach
Has
Attributes
Course
Difficulty
In
Registration
Grade
Satisfaction
© Daphne Koller, 2003
Relations
St. Nordaf University
Prof.
Smith
Teaching-ability
High
Teaches
Teaches
Prof.
Jones
High
Low
Teaching-ability
C
B
In-courseGrade
Registered
Satisfac
Hate
Like
Intelligence
Weak
Welcome to
George
Geo101
Grade
B
C
Welcome to
Difficulty
Easy
Registered
In-courseSatisfac
Hate
CS101
Intelligence
Smart
Grade
A
Difficulty
Easy
Hard
© Daphne Koller, 2003
Hate
Like
In-courseSatisfac
Registered
Jane
Relational Logic: Summary
Vocabulary:
Classes of objects:
Individual objects in a class:
George.Intelligence, Reg1.Grade
Relationships between these objects
George, Jane, …
Attributes of these objects:
Person, Course, Registration, …
Of(Reg1,George), Teaches(CS101,Smith)
A world specifies:
A set of objects, each in a class
The values of the attributes of all objects
The relations that hold between the objects
© Daphne Koller, 2003
Binary Relations
Any relation can be converted into an object:
R(x1,x2,…,xk)
new “relation” object y,
R1(x1,y), R2(x2,y),…, Rk(xk,y)
E.g., registrations are “relation objects”
Can restrict attention to binary relations R(x,y)
© Daphne Koller, 2003
Relations & Links
Binary relations can also be viewed as links:
Specify the set of objects related to x via R
R(x,y) y x.R1, x y.R2
E.g., Teaches(p,c)
p.Courses = {courses c : Teaches(p,c)}
c.Instructor = {professors p : Teaches(p,c)}
© Daphne Koller, 2003
Probabilistic Relational Models:
Relational Bayesian Networks
© Daphne Koller, 2003
Probabilistic Models
Uncertainty model:
In attribute-based models, world specifies
space of “possible worlds”;
probability distribution over this space.
assignment of values to fixed set of random variables
In relational models, world specifies
Set of domain elements
Their properties
Relations between them
© Daphne Koller, 2003
PRM Scope
Entire set of relational worlds is infinite and too broad
Assume circumscribed class of sets of worlds x
consistent with some type of background knowledge x
PRM is a template defining P(x) for any such x
Simplest class attribute-based PRMs:
x = relational skeleton:
finite set of entities E and relations between them
x = all assignments of values to all attributes of entities in E
PRM template defines distribution over x for any such x
[K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
Relational Bayesian Network
Universals: Probabilistic patterns hold for all objects in class
Locality: Represent direct probabilistic dependencies
Links define potential interactions
Professor
Teaching-Ability
Student
Intelligence
Course
Difficulty
A
B
C
easy,low
Reg
Grade
Satisfaction
[K.
& Pfeffer;
© Daphne
Koller,Poole;
2003 Ngo & Haddawy]
easy,high
hard,low
hard,high
0%
20%
40%
60%
80%
100%
RBN: Semantics
x: set of objects & relations between them
x: the set of all assignments of values to all
attributes of all objects in x
P(x) is defined by a ground Bayesian network:
variables: attributes of all objects
dependencies: determined by
relational links in x
dependency structure of RBN model
© Daphne Koller, 2003
RBN Structure
For each class X and attribute A, structure specifies
parents for X.A
X.B
Course.Level
X..B
Reg.In.Difficulty
X.A
Course.Difficulty
X.A
Reg.Grade
For any object x in class
X, x.B is parent of x.A
© Daphne Koller, 2003
: link or chain of links
For any object x in class
X, x..B is parent of x
RBN Semantics
Prof. Jones
Teaching-ability
Prof. Smith
Teaching-ability
Grade
Welcome to
Intelligence
Satisfac
George
Geo101
Grade
Welcome
to to
Welcome
Difficulty
Satisfac
CS101
CS101
Grade
Difficulty
© Daphne Koller, 2003
Satisfac
Intelligence
Jane
The Web of Influence
Welcome to
CS101
C
0%
0%
50%
50%
Welcome to
Geo101
© Daphne
Koller,
2003
easy
/ hard
A
low
high
low / high
100%
100%
Aggregate Dependencies
avg
A
B
C
y n
0.8 0.2
0.4 0.6
0.1 0.9
Reg
#5077
Grade
Reg
av
g
Satisfaction#5054
Grade
Reg
Reg
Satisfaction#5639
Grade Grade
Satisfaction
Satisfaction
[K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
Student
Student
Intelligence
Jane Doe
Intelligence
Honors
Honors
Problem!!!
Need
dependencies
on sets
Aggregate Dependencies
Professor
Course
Student
Teaching-Ability
Intelligence
Stress-Level
Honors
Difficulty
count
Rating
avg
Reg
Grade
avg
© Daphne Koller, 2003
Satisfaction
sum, min, max,
avg, mode, count
Basic RBN: Summary
RBN specifies
A probabilistic dependency structure S:
A set of parents X..B for each class attribute X.A
A set of local probability models:
Aggregator to use for each multi-valued dependency
Set of CPD parameters
X.A
Given relational skeleton structure x, RBN induces
a probability distribution over worlds
Distribution defined via ground BN over attributes x.A
P ( | x , S , ) P (x .A | parentsS ,x (x .A ), X .A )
X .A x Xx
© Daphne Koller, 2003
Attributes Objects
Extension: Class Hierarchy
Subclasses inherit all attributes of parents, but
may have additional ones
For inherited attribute X.A, subclass can:
inherit parent’s probabilistic model
overwrite with local probabilistic model
Example:
Professor has subclasses assistant, associate, full
Inherit distribution over Stress-Level
Modify distribution over Salary
[K. & Pfeffer ’98]
© Daphne Koller, 2003
Extension: Class Hierarchies
Hierarchies allow reuse in knowledge engineering
and in learning
Parameters and dependency models shared across
more objects
If class assignments specified in x, class hierarchy
does not introduce complications
[K. & Pfeffer ’98]
© Daphne Koller, 2003
Coherence & Acyclicity
Professor
Teaching-Ability
Stress-Level
Reg
Grade
Satisfaction
Smith.Stress-level depends
probabilistically on itself
[Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
For given skeleton x, PRM
asserts dependencies
between attributes of objects:
y .B x , x .A
defines coherent probability
model over if x, is acyclic
Guaranteeing Acyclicity
How do we guarantee that a PRM is acyclic
for every object skeleton x?
PRM
dependency
structure S
template-level
dependency
graph
Y.B
X.A
Attribute stratification:
if X..B Parents(X.A), and
class of X. is Y
class dependency graph acyclic
x, acyclic for all x
© Daphne Koller, 2003
Limitation of Stratification
Father
Person
M-chromosome
P-chromosome
Blood-type
Mother
Person
M-chromosome
Person
P-chromosome
M-chromosome
P-chromosome
Blood-type
Blood-type
Person.M-chrom
© Daphne Koller, 2003
Person.P-chrom
Person.B-type
Limitation of Stratification
Person
M-chromosome
P-chromosome
Blood-type
Father
Mother
Person
M-chromosome
P-chromosome
Person
M-chromosome
P-chromosome
Blood-type
Blood-type
Prior knowledge: the Father-of relation is acyclic
Dependence of Person.A on Person.Father.B cannot
induce cycles
© Daphne Koller, 2003
Guaranteeing Acyclicity
With guaranteed acyclic relations, some cycles in
the dependency graph are guaranteed to be safe.
We color the edges in the dependency graph
yellow: within
single object
X.A
green: via
g.a. relation
X.B
Person.M-chrom
Y.B
red: via
other relations
X.A
Y.B
Person.P-chrom
Person.B-type
© Daphne Koller, 2003
X.A
A cycle is safe if
it has a green edge
it has no red edge
Object-Oriented Bayesian Nets*
OOBNs are RBN with only one type of relation
One object can be a “part-of” another
Objects can only interact with component parts
Other types of relationships must be embedded into
the “part-of” framework
Defines “neat” hierarchy of related objects
Provides clearly defined object interface
between object x and its enclosing object y
* Simplified view
© Daphne Koller, 2003
[K. & Pfeffer ’97]
OOBN Example
Drivers, cars, collision must all be viewed
as part of situation object
Driver1
Driver2
Owner
Owner
Car1
Value
Speed
Car1
Damage
© Daphne Koller, 2003
Car2
Speed
Collision
Severity
Value
Car2
Damage
Part-of hierarchy
Objects in the model are encapsulated within
each other, forming a tree.
Accident situation
Driver1
Income
Car1
Price
Engine Speed
Power
© Daphne Koller, 2003
Severity
Car2
Driver2
Speed Engine Price Income
Power
Lines of Communication
Accident situation
Driver1
Income Price
Car1
Severity
Engine
Speed Speed
Power
© Daphne Koller, 2003
Car2
Engine
Power
Driver2
Price Income
Probabilistic Relational Models:
Relational Markov Networks
© Daphne Koller, 2003
Why Undirected Models?
Symmetric, non-causal interactions
Patterns involving multiple entities
E.g., web: categories of linked pages are correlated
Cannot introduce direct edges because of cycles
E.g., web: “triangle” patterns
Directed edges not appropriate
“Solution”: Impose arbitrary direction
Not clear how to parameterize CPD for variables
involved in multiple interactions
Very difficult within a class-based parameterization
[Taskar, Abbeel, K. 2001]
© Daphne Koller, 2003
Markov Networks: Review
ABC
LLL
LLH
LHL
LHH
HLL
HLH
HHL
HHH
Potential
Alice
(A,B,C)
Eve
Bob
Chris
Dave
1
P(A, B, C, D, E) (A, B, C) (C,D) (D, E) (E, A)
Z
© Daphne Koller, 2003
Markov Networks: Review
A Markov network is an undirected graph over
some set of variables V
Graph associated with a set of potentials i
Each potential is factor over subset Vi
Variables in Vi must be a (sub)clique in network
1
P (V )
Z
© Daphne Koller, 2003
(V )
i
i
i
Relational Markov Networks
Probabilistic patterns hold for groups of objects
Groups defined as sets of (typed) elements linked in
particular ways
Student
Intelligence
Reg
Grade
Course
Difficulty
Student2
Intelligence
© Daphne Koller, 2003
Template potential
Reg2
Grade
CC
CB
CA
BC
Study
BB
BA
AC
AB
AA
Group
0
0.5
1
1.5
2
RMN Language
Define clique templates
All tuples {reg R1, reg R2, group G}
s.t. In(G, R1), In(G, R2)
Compatibility potential (R1.Grade, R2.Grade)
Ground Markov network contains potential
(r1.Grade, r2.Grade) for all appropriate r1, r2
© Daphne Koller, 2003
Ground MN (or Chain Graph)
Grade
Welcome to
Intelligence
Geo Study Group
Geo101
Welcome to
Difficulty
Grade
CS101
Difficulty
Intelligence
Grade
CS Study Group
Grade
© Daphne Koller, 2003
George
Jane
Intelligence
Jill
PRM Inference
© Daphne Koller, 2003
Inference: Simple Method
Define ground network as in semantics
Apply standard inference methods
Problem:
Very large models can be specified very easily
Resulting ground network often highly connected
Exact inference is typically intractable
In practice, often must resort to approximate
methods such as belief propagation
© Daphne Koller, 2003
Exploiting Structure: Encapsulation
Objects interact only in limited ways
Can define object interface:
Outputs: Object attributes influencing other objects
Inputs: External attributes influencing object
Object is independent of everything given interface
Inference can be encapsulated within objects, with
“communication” limited to interfaces
Student
Upbringing
Intelligence
GPA
[K.
& Pfeffer,
© Daphne
Koller,1997]
2003
Honors
Personality
Job-prospects
Exploiting Structure: Encapsulation
Marginalize object distribution onto interface
Dependency graph over interfaces induced by
Perform inference over interfaces
If interaction graph has low tree-width, can use
exact inference
Inter-object dependencies
And hence by the relational structure
E.g., part-of hierarchy in some OOBNs
If relational structure more complex, can use BP
A form of Kikuchi BP, where cluster selection is
guided by relational structure
© Daphne Koller, 2003
Exploiting Structure: Reuse
Objects from same class have same model
For generic objects – no internal evidence –
marginalize interface is the same
Can reuse inference – a form of “lifting”
George
Jane
Upbringing
Intelligence
GPA
Honors
Intelligence
Personality
Job-prospects
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
Upbringing
GPA
Honors
Personality
Job-prospects
Exploiting Structure: Reuse
Generic objects often play same role in model
Multiple students that all take the same class
Reuse: compute interface once
Combinatorics: compute total contribution to
probability in closed form
P(k students like the class | teaching-ability = low) =
P(generic student likes the class | teaching ability = low)k
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
Case study
Battlefield situation assessment for missile units
several locations
many units
each has detailed model
Example object classes:
Battalion
Battery
Vehicle
Location
Weather.
© Daphne Koller, 2003
Example relations:
At-Location
Has-Weather
Sub-battery/In-battalion
Sub-vehicle/In-battery
Effect of Exploiting Structure
6000
5000
running time in seconds
Query: P(Next-Mission)
flat BN
no reuse
with reuse
with combinatorics
4000
3000
BN has 2400 nodes
cost 20 minutes
2000
cost 9 sec
1000
0
1
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
2
3
4
5
6
7
8
#vehicles of each type in each battery
9
10
PRM Learning
© Daphne Koller, 2003
Outline
Relational Bayesian networks
Relational Markov networks
Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:
Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
PRM Learning: Input
Prof.
Smith
Teaching-ability
High
Teaches
Teaches
Prof.
Jones
High
Teaching-ability
Welcome to
Input: a full world
B
In-courseGrade
Registered
Satisfac
Hate
Geo101
Welcome to
Difficulty
Easy
George
Grade
C
In-course
Satisfac
Hate
Registered
CS101
Difficulty
Hard
© Daphne Koller, 2003
Intelligence
Weak
Grade
A
Satisfac
Hate
In-course
Intelligence
Smart
Registered
Jane
Likelihood Function
L(S , : ) P ( | x , S , )
P (x .A | parentsS , (x .A ), X .A )
X .A x Xx
Attributes Objects
Likelihood of a BN with shared parameters
Joint likelihood is a product of likelihood terms
One for each attribute X.A and its family
For each X.A, the likelihood function aggregates
counts from all occurrences x.A in world
[Friedman,
Getoor,
© Daphne Koller,
2003 K., Pfeffer, 1999]
Likelihood Function: Multinomials
Log-likelihood:
log P ( | x , S , )
M (a , u ) log
X .A uVal ( Pa ( X .A )) a Val ( X .A )
x |u
Sufficient statistics:
M (a , u )
{x X :x .A a , parentsS , (x .A ) u}
© Daphne Koller, 2003
RBN Parameter Estimation
MLE parameters:
Pˆ( Reg .Grade A | Student .Intell hi , Course .Diff lo )
M ( Reg .Grade A , Student .Intell hi , Course .Diff lo )
M ( Reg .Grade *,Student .Intell hi , Course .Diff lo )
Bayesian estimation:
Prior for each attribute X.A
Posterior uses aggregated sufficient statistics
© Daphne Koller, 2003
Learning w. Missing Data
EM Algorithm applies essentially unchanged
E-step computes expected sufficient statistics,
aggregated over all objects in class
M-step uses ML (or MAP) parameter estimation
Key difference:
In general, the hidden variables are not independent
Computation of expected sufficient statistics requires
inference over entire network
[Same reasoning as for forward-backward algorithm
in temporal models]
© Daphne Koller, 2003
Learning w. Missing Data: EM
[Dempster et al. 77]
Students
Courses
A
B
C
easy,low
easy,high
hard,low
hard,high
0%
20%
40%
60%
80% 100%
P(Registration.Grade |
Course.Difficulty,
Student.Intelligence)
easy / hard
© Daphne Koller, 2003
low / high
Learning RBN Structure
Define set of legal RBN structures
Ones with legal class dependency graphs
Define scoring function Bayesian score
marginal
likelihood
prior
Score (S : ) log[ P ( | S , x )P (S )]
P ( | S , x ) L (S , S : )P (S )dS
Product of family scores:
One for each X.A
Uses aggregated sufficient statistics
Search for high-scoring legal structure
[Friedman,
Getoor,
© Daphne Koller,
2003 K., Pfeffer, 1999]
Learning RBN Structure
All operations done at class level
Dependency structure = parents for X.A
Acyclicity checked using class dependency graph
Score computed at class level
Individual objects only contribute to sufficient
statistics
Can be obtained efficiently using standard DB queries
© Daphne Koller, 2003
Exploiting Locality: Phased Search
Phase 0: consider only dependencies within a class
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Phase 1: consider dependencies one link away
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Phase k: consider dependencies k links away
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Professor
Popularity
Course
Teaching-Ability
Student
Difficulty
Intelligence
Rating
Honors
Reg
Grade
Satisfaction
© Daphne Koller, 2003
TB Patients in San Francisco
Strains
Patients
[Getoor, Rhee, K., Small, 2001]
© Daphne Koller, 2003
Contacts
TB Patients in San Francisco
1000 strains
Suscept
Gender
POB
Race
HIV
Age
TB-type
XRay
Outbreak
Strain
Input: Relational database
Output: PRM for domain
Infected
Treatment
2300 patients
Patient
Smear
Subcase
CareLoc
CareLoc
Contact
© Daphne Koller, 2003
CloseCont
20000 contacts
Relation
Outline
Relational Bayesian networks
Relational Markov networks
Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:
Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
Learning RMN Parameters
Student
Intelligence
Reg
Grade
Course
Difficulty
Student2
Intelligence
Template potential
Reg2
CC
CB
CA
BC
Study
BB
BA
AC
AB
AA
Group
Grade
exp{ wAA f AA ... wCC fCC }
Parameterize potentials as log-linear model
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Learning RMN Parameters
(w : ) log Pw ( | w ) wi fi ( ) log Z
i
Counts in Partition
function
For example:
fAA () = # of tuples {reg r1, reg r2, group g} s.t.
In(g, r1), In(g, r2); r1.Grade=A, r2.Grade=A
© Daphne Koller, 2003
Learning RMN Parameters
Parameter estimation is not closed form
Convex problem unique global maximum
Can use methods such as conjugate gradient
# (Grade A , Grade A ) actual count
w AA
P (Grade A , Grade A )
- expected count
Gradient process tries to find parameters s.t.
expected counts = actual counts
Computing expected counts requires inference
over ground Markov network
© Daphne Koller, 2003
Learning RMNs
L = log P(Grades,Intelligence,Difficulty)
easy / hard
ABC
low / high
Grade
Difficulty
Intelligence
Grade
Intelligence
(Reg1.Grade,Reg2.Grade)
CC
CB
CA
BC
BB
BA
AC
AB
AA
0
0.5
1
1.5
2
Grade
Difficulty
Intelligence
Grade
© Daphne Koller, 2003
L
# (Grade A, Grade A)
AA
P(Grade A, Grade A | Diffic )
Outline
Relational Bayesian networks
Relational Markov networks
Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:
Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
Collective Classification
Training Data
Course
Probabilistic
Relational
Model
Student
Reg
Learning
Model
Structure
New Data
Conclusions
Inference
Example:
Train on one year of student intelligence, course difficulty, and grades
Given only grades in following year, predict all students’ intelligence
© Daphne Koller, 2003
Discriminative Training
Goal: Given values of observed variables .O=o,
predict desired target values .T=t*
Do not necessarily want the model to fit the
joint distribution P(.O=o, .T=t*)
To maximize classification accuracy, we can
consider other optimization criteria
Maximize conditional log likelihood
P (T
. t * | .O o )
Maximize margin
log P (T
. t * | .O o )
log[max t t * P (T
. t | .O o )]
[Taskar, Abbeel, K., 2002;
Guestrin,
©Taskar,
Daphne Koller,
2003 K. 2003]
P(second highest probability label)
Web KB
Tom Mitchell
Professor
Project-of
WebKB
Project
Member
Advisor-of
Sean Slattery
Student
[Craven et al.]
© Daphne Koller, 2003
Web Classification Experiments
WebKB dataset
Four CS department websites
Bag of words on each page
Links between pages
Anchor text for links
Experimental setup
Trained on three universities
Tested on fourth
Repeated for all four combinations
© Daphne Koller, 2003
Standard Classification
Page
Category
Word1
Professor
department
extract
information
computer
science
machine
learning
…
© Daphne Koller, 2003
Categories:
faculty
course
project
student
other
...WordN
Discriminatively trained model
= Logistic Regression
Standard Classification
Page
Category
...WordN...LinkWordN
Word1
workin
0.18
g 0.16
0.14
with
0.12
Tom
0.1
Mitchell
0.08
…
0.06
0.04
0.02
0
© Daphne Koller, 2003
Logistic
Power of Context
Professor?
© Daphne Koller, 2003
Student?
Post-doc?
Collective Classification
To-Page
From-Page
Category
Category
Word1
© Daphne Koller, 2003
...WordN
FT
SS
SP
SF
SC
PS
PP
PF
PC
FS
FP
FF
FC
CS
CP
CF
CC
Link
...
WordN
Word1
Compatibility (From,To)
Collective Classification
To-Page
From-Page
Category
Category
Word1
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
...WordN
Link
Word1
...WordN
Classify all pages collectively,
maximizing the joint label probability
© Daphne Koller, 2003
Logistic
Links
More Complex Structure
© Daphne Koller, 2003
More Complex Structure
Faculty
W1
Students
C
Wn
S
S
© Daphne Koller, 2003
Courses
Collective Classification: Results
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
35.4% relative reduction in error
relative to strong flat approach
[Taskar, Abbeel, K., 2002]
© Daphne Koller, 2003
Logistic
Links
Section
Link+Section
Relational Clustering
Unlabeled Relational Data
Course
Probabilistic
Relational
Model
Stude
nt
Reg
Learning
Model
Structure
Clustering of instances
Example:
Given only students’ grades, cluster similar students
© Daphne Koller, 2003
Movie Data
© Daphne Koller, 2003
Internet Movie Database
http://www.imdb.com
Discovering Hidden Types
Actor
Director
Type
Type
Movie
Genres
Type
Year MPAA Rating
[Taskar, Segal, K., 2001]
© Daphne Koller, 2003
Rating
#Votes
Discovering Hidden Types
Movies
Actors
Directors
Wizard of Oz
Cinderella
Sound of Music
The Love Bug
Pollyanna
The Parent Trap
Mary Poppins
Swiss Family Robinson
Sylvester Stallone
Bruce Willis
Harrison Ford
Steven Seagal
Kurt Russell
Kevin Costner
Jean-Claude Van Damme
Arnold Schwarzenegger
Alfred Hitchcock
Stanley Kubrick
David Lean
Milos Forman
Terry Gilliam
Francis Coppola
Terminator 2
Batman
Batman Forever
GoldenEye
Starship Troopers
Mission: Impossible
Hunt for Red October
Anthony Hopkins
Robert De Niro
Tommy Lee Jones
Harvey Keitel
Morgan Freeman
Gary Oldman
Steven Spielberg
Tim Burton
Tony Scott
James Cameron
John McTiernan
Joel Schumacher
…
© Daphne Koller, 2003
…
Biology 101: Pathways
Pathways are sets of genes that act together to
achieve a common function
© Daphne Koller, 2003
Biology 101: Expression
Gene 1
Coding
Control
Gene 2
Coding
Control
DNA
RNA
Protein
© Daphne Koller, 2003
Cells express different
subsets of their genes
in different tissues and
under different conditions
Swi5 Transcription factor
Finding Pathways: Attempt I
Use protein-protein interaction data
Pathway IIII
II
© Daphne Koller, 2003
Finding Pathways: Attempt I
Use protein-protein interaction data
Problems:
Data is very noisy
Structure is lost:
Large connected component
(3527/3589 genes)
in interaction graph
© Daphne Koller, 2003
Finding Pathways: Attempt II
Use gene expression data
Thousands of arrays available under different conditions
Pathway I
Clustering
Pathway II
© Daphne Koller, 2003
Finding Pathways: Attempt II
Use gene expression data
Thousands of arrays available under different conditions
Problems:
Expression is only ‘weak’
indicator of interaction
Data is noisy
Interacting pathways are
not separable
© Daphne Koller, 2003
Pathway I
Pathway II
Finding Pathways: Our Approach
Use both types of data to find pathways
Find “active” interactions using gene expression
Find pathway-related co-expression using interactions
Pathway I
Pathway III
Pathway II
Pathway IV
© Daphne Koller, 2003
Probabilistic Model
Genes are partitioned into “pathways”:
Expression component:
Every gene is assigned to one of ‘k’ pathways
Random variable for each gene with domain {1,…,k}
Model likelihood is higher when genes in the same
pathway have similar expression profiles
Interaction component:
Model likelihood is higher when genes in the same
pathway interact
[Segal, Wang, K., 2003]
© Daphne Koller, 2003
Expression Component
Naïve Bayes
g.C=1
g.E1
0
g.E2
0
g.E3
0
Pathway of gene g
…
g.Ek
0
Expression level of gene g in m arrays
© Daphne Koller, 2003
Protein Interaction Component
Interacting genes are more likely to be in the
same pathway
Gene 1
g1.C
(g.C,g.C)
g1.C g2.C
protein product
interaction
Gene 2
g2.C
1
1
1
2
2
2
3
3
3
1
1
2
3
1
2
3
1
2
3
Compatibility potential
© Daphne Koller, 2003
Joint Probabilistic Model
Gene 1
g1.C
Gene 2
g2.C
Gene 3
g3.C
Gene 4
g4.C
1
1
1
1
2
2
2
3
3
3
1
2
3
1
2
3
1
2
3
1
g1.E2
g1.E3
g2.E1
g2.E2
g2.E3
g3.E1
g3.E2
g3.E3
g4.E1
g4.E2
g4.E3
1
0
0
© Daphne Koller, 2003
g1.E1
2
1 2 3
0
3
Path. I
Learning Task
Large Markov network with high connectivity
Use loopy belief propagation
1
g1.C
g2.C
1
g1.E
1
1
1
2
2
2
3
3
3
g2.E1
1
2
3
1
2
3
1
2
3
1
0
2
g2.E2
3
g3.C
g1.E3
g1.E2
0
g2.E3
Path. I
g3.E1
g3.E2
g3.E3
g4.E1
g4.E2
g4.E3
g4.C
E-step: compute pathway assignments
M-step: Estimate Gaussian distribution parameters
Estimate compatibility potentials using cross-validation
© Daphne Koller, 2003
Capturing Protein Complexes
Independent data set of interacting proteins
400
Our method
Standard expression clustering
Num Complexes
350
300
250
200
150
124 complexes covered
at 50% for our method
46 complexes covered at
50% for clustering
100
50
0
0
© Daphne Koller, 2003
10 20 30 40 50 60 70 80 90 100
Complex Coverage (%)
RNAse Complex Pathway
YHR081W
RRP40
RRP42
MTR3
RRP45
RRP4
RRP43
DIS3
TRM7
SKI6
RRP46
CSL4
Includes all 10
known pathway
genes
Only 5 genes
found by
clustering
© Daphne Koller, 2003
RRP40
TRM7
RRP43
DIS3
RRP42
CSL4
RRP45
YHR081W
RRP46
RRP4
SKI6
MTR3
Interaction Clustering
RNAse complex found by interaction clustering as
part of cluster with 138 genes
© Daphne Koller, 2003
Uncertainty about
Domain Structure
or
PRMs are not just template
BNs/MNs
© Daphne Koller, 2003
Structural Uncertainty
Class uncertainty:
Relational uncertainty:
To which class does an object belong
What is the relational (link) structure
Identity uncertainty:
Which “names” refer to the same objects
Also covers data association
© Daphne Koller, 2003
Relational Uncertainty
Probability distribution over graph structures
Link existence model
E.g., hyperlinks between webpages
Each potential link is a separate event
Its existence is a random variable
Link reference model
E.g., instructors teaching a course
Distribution over # of endpoints for each outgoing link
Each link has distribution over link endpoint
e.g., instructor link for CS course likely to point to CS prof
Many other models possible
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Link Existence Model
Background knowledge x is an object skeleton
A set of entity objects
PRM defines distribution over worlds
Assignments of values to all attributes
Existence of links between objects
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Link Existence as a PRM
Define objects for any potential links
Potential link object for any pair of webpages w1, w2
Each potential link object has link existence
attribute, denoting whether the link exists or not
Link existence variables have probabilistic model
Page-from
Type
Words
Page-to
Link
Exists
Type
Words
From.Type
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Student
Student
Professor
Professor
To.Type
False
True
Student
Professor
Student
Professor
0.999
0.995
0.998
0.999
0001
0005
0002
0001
Why Are Link Models Useful?
Predict which links exist in the world
Which professor teaches each course
Which student will register for which course
Use known links to infer values of attributes
Given that student registered for a hard class, is she
more likely to be a graduate student
Given that one page points to another, is it more
likely to be a faculty page?
© Daphne Koller, 2003
Predicting Relationships
Tom Mitchell
Professor
Member
WebKB
Project
Member
Advisor-of
Sean Slattery
Student
Predict and classify relationships between objects
© Daphne Koller, 2003
Flat Model
To-Page
From-Page
Word1
NONE
advisor
instructor
TA
member
project-of
© Daphne Koller, 2003
...
Word1
WordN
Rel
LinkWord1
Type
...
LinkWordN
...
WordN
Collective Classification: Links
To-Page
From-Page
Category
Category
...
...
Word1
Rel
LinkWord1
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Word1
WordN
Type
...
LinkWordN
WordN
Link Model
...
...
...
...
...
© Daphne Koller, 2003
...
Triad Model
Professor
Advisor
TA
Instructor
Course
© Daphne Koller, 2003
Student
Link Prediction: Results
...
...
72.9% relative reduction in error
...
relative to strong flat approach
Error measured over
links predicted to be
present
30
25
20
15
Link presence cutoff is
10
at precision/recall
5
break-even point
(30% for all models) 0
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Flat
Links
Triad
Identity Uncertainty Model
Background knowledge x is an object universe
A set of potential objects
PRM defines distribution over worlds
Assignments of values to object attributes
Partition of objects into equivalence classes
Objects in same class have same attribute values
[Pasula, Marthi, Milch, Russell, Shpitser, 2002]
© Daphne Koller, 2003
Citation Matching Model*
Link chain:
Appears-in.
Refers-to.
Written-by
Author
Name
Written-by
Paper
ObsTitle
PubType
Name
Refers-to Citation
Title
Author-as-Cited
Appears-in
Text
Each citation object associated with paper object
Uncertainty over equivalence classes for papers
If P1=P2, have same attributes & links
Simplified
©*Daphne
Koller, 2003
Title, PubType
Authors
Identity Uncertainty
Depending on choice of equivalence classes:
Number of objects changes
Dependency structure changes
No “nice” corresponding ground BN
Algorithm:
Each partition hypothesis defines simple BN
Use MCMC over equivalence class partition
Exact inference over resulting BN defines acceptance
probability for Markov chain
[Pasula, Marthi, Milch, Russell, Shpitser, 2002]
© Daphne Koller, 2003
Identity Uncertainty Results
Phrase Match
100
95
PRM+MCMC
61.5% relative reduction in error
relative to state of the art
90
85
80
75
70
Face
RL
Reasoning Constraint
Average
Accuracy of citation recovery:
% of actual citation clusters recovered perfectly
[Pasula,
Marthi,
Milch,
©
Daphne Koller,
2003
Russell, Shpitser, 2002]
Summary: PRMs …
Inherit the advantages of graphical models:
Coherent probabilistic semantics
Exploit structure of local interactions
Allow us to represent the world in terms of:
Objects
Classes of objects
Properties of objects
Relations
© Daphne Koller, 2003
So What Do We Gain?
Convenient language for specifying complex models
“Web of influence”: subtle & intuitive reasoning
A mechanism for tying parameters and structure
within models
across models
Framework for learning from relational and
heterogeneous data
© Daphne Koller, 2003
So What Do We Gain?
New way of thinking about models & problems
Incorporating heterogeneous data by connecting
related entities
New problems:
Collective classification
Relational clustering
Uncertainty about richer structures:
Link graph structure
Identity
© Daphne Koller, 2003
But What Do We Really Gain?
Simple PRMs relational logic w. fixed domain and only
Induce a “propositional” BN
Can augment language with additional expressivity
Existentialjust
quantifiers
& functions
Are PRMs
a convenient
language for
Equality
specifying attribute-based graphical models?
Resulting language is inherently more expressive, allowing
us to represent distributions over
worlds where dependencies vary significantly [Getoor et al., Pasula et al.]
worlds with different numbers of objects [Pfeffer et al., Pasula et al.]
worlds with infinitely many objects [Pfeffer & K.]
Big questions: Inference & Learning
© Daphne Koller, 2003
Pieter Abbeel
Nir Friedman
Lise Getoor
Carlos Guestrin
Brian Milch
Avi Pfeffer
Eran Segal
Ken Takusagawa
Ben Taskar
Haidong Wang
Ming-Fai Wong
http://robotics.stanford.edu/~koller/
http://dags.stanford.edu/
© Daphne Koller,
2003