Daphne Koller

Download Report

Transcript Daphne Koller

Probabilistic Models
of Relational Domains
Daphne Koller
Stanford University
© Daphne Koller, 2003
Overview






Relational logic
Probabilistic relational models
Inference in PRMs
Learning PRMs
Uncertainty about domain structure
Summary
© Daphne Koller, 2003
Attribute-Based Models &
Relational Models
© Daphne Koller, 2003
Bayesian Networks: Problem


Bayesian nets use propositional representation
Real world has objects, related to each other
Intelligence
Diffic_CS101
Intell_Jane
Difficulty
Diffic_CS101
Intell_George
These
“instances”
are not
independent
Grade_Jane_CS101
A
Grade
Intell_George
Diffic_Geo101
Grade_George_Geo101
© Daphne Koller, 2003
Grade_George_CS101
C
The University BN
Difficulty_
CS101
Difficulty_
Geo101
Grade_
Jane_
CS101
Grade_
George_
CS101
Grade_
George_
Geo101
© Daphne Koller, 2003
Intelligence_
Jane
Intelligence_
George
The Genetics BN
G_Harry
G_Betty
B_Harry
G_Homer
B_Betty
G_Marge
B_Homer
B_Marge
G_Bart
G_Lisa
G_Maggie
B_Bart
B_Lisa
B_Maggie
G = genotype
B = bloodtype
© Daphne Koller, 2003
G_Selma
B_Selma
Simple Approach



Graphical model with shared parameters
… and shared local dependency structure
Want to encode this constraint:



For human knowledge engineer
For network learning algorithm
How do we specify


which nodes share params?
shared (but different) structure across nodes?
© Daphne Koller, 2003
Simple Approach II

We can write a special-purpose program for
each domain:



genetic inheritance (family tree imposes constraints)
university (course registrations impose constraints)
Is there something more general?
© Daphne Koller, 2003
Attribute-Based Worlds
Smart_Jane & easy_CS101  GetA_Jane_CS101
Smart_Mike & easy_Geo101 
GetA_Mike_Geo101
Smart_Jane & easy_Geo101  GetA_Jane_Geo101
Smart_Rick & easy_CS221  GetA_Rick_C

World = assignment of values to attributes
/ truth values to propositional symbols
© Daphne Koller, 2003
Object-Relational Worlds
x,y(Smart(x) & Easy(y) & Take(x,y)
 Grade(A,x,y))

World = relational interpretation:



Objects in the domain
Properties of these objects
Relations (links) between objects
© Daphne Koller, 2003
Relational Logic

General framework for representing:





objects & their properties
classes of objects with same model
relations between objects
Represent a model at the template level, and
apply it to an infinite set of domains
Given finite domain, each instantiation of the
model is propositional, but the template is not
© Daphne Koller, 2003
Relational Schema

Specifies types of objects in domain, attributes of each
type of object & types of relations between objects
Professor
Classes
Student
Intelligence
Teaching-Ability
Teach
Has
Attributes
Course
Difficulty
In
Registration
Grade
Satisfaction
© Daphne Koller, 2003
Relations
St. Nordaf University
Prof.
Smith
Teaching-ability
High
Teaches
Teaches
Prof.
Jones
High
Low
Teaching-ability
C
B
In-courseGrade
Registered
Satisfac
Hate
Like
Intelligence
Weak
Welcome to
George
Geo101
Grade
B
C
Welcome to
Difficulty
Easy
Registered
In-courseSatisfac
Hate
CS101
Intelligence
Smart
Grade
A
Difficulty
Easy
Hard
© Daphne Koller, 2003
Hate
Like
In-courseSatisfac
Registered
Jane
Relational Logic: Summary

Vocabulary:

Classes of objects:


Individual objects in a class:


George.Intelligence, Reg1.Grade
Relationships between these objects


George, Jane, …
Attributes of these objects:


Person, Course, Registration, …
Of(Reg1,George), Teaches(CS101,Smith)
A world specifies:



A set of objects, each in a class
The values of the attributes of all objects
The relations that hold between the objects
© Daphne Koller, 2003
Binary Relations

Any relation can be converted into an object:


R(x1,x2,…,xk) 
new “relation” object y,
R1(x1,y), R2(x2,y),…, Rk(xk,y)
E.g., registrations are “relation objects”
 Can restrict attention to binary relations R(x,y)
© Daphne Koller, 2003
Relations & Links




Binary relations can also be viewed as links:
Specify the set of objects related to x via R
R(x,y)  y  x.R1, x  y.R2
E.g., Teaches(p,c) 


p.Courses = {courses c : Teaches(p,c)}
c.Instructor = {professors p : Teaches(p,c)}
© Daphne Koller, 2003
Probabilistic Relational Models:
Relational Bayesian Networks
© Daphne Koller, 2003
Probabilistic Models

Uncertainty model:



In attribute-based models, world specifies


space of “possible worlds”;
probability distribution over this space.
assignment of values to fixed set of random variables
In relational models, world specifies



Set of domain elements
Their properties
Relations between them
© Daphne Koller, 2003
PRM Scope

Entire set of relational worlds is infinite and too broad
Assume circumscribed class of sets of worlds x
consistent with some type of background knowledge x
PRM  is a template defining P(x) for any such x

Simplest class  attribute-based PRMs:





x = relational skeleton:
 finite set of entities E and relations between them
x = all assignments of values to all attributes of entities in E
PRM template defines distribution over x for any such x
[K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
Relational Bayesian Network


Universals: Probabilistic patterns hold for all objects in class
Locality: Represent direct probabilistic dependencies
 Links define potential interactions
Professor
Teaching-Ability
Student
Intelligence
Course
Difficulty
A
B
C
easy,low
Reg
Grade
Satisfaction
[K.
& Pfeffer;
© Daphne
Koller,Poole;
2003 Ngo & Haddawy]
easy,high
hard,low
hard,high
0%
20%
40%
60%
80%
100%
RBN: Semantics



x: set of objects & relations between them
x: the set of all assignments of values to all
attributes of all objects in x
P(x) is defined by a ground Bayesian network:


variables: attributes of all objects
dependencies: determined by


relational links in x
dependency structure of RBN model 
© Daphne Koller, 2003
RBN Structure


For each class X and attribute A, structure specifies
parents for X.A
X.B
Course.Level
X..B
Reg.In.Difficulty
X.A
Course.Difficulty
X.A
Reg.Grade
For any object x in class
X, x.B is parent of x.A
© Daphne Koller, 2003


: link or chain of links
For any object x in class
X, x..B is parent of x
RBN Semantics
Prof. Jones
Teaching-ability
Prof. Smith
Teaching-ability
Grade
Welcome to
Intelligence
Satisfac
George
Geo101
Grade
Welcome
to to
Welcome
Difficulty
Satisfac
CS101
CS101
Grade
Difficulty
© Daphne Koller, 2003
Satisfac
Intelligence
Jane
The Web of Influence
Welcome to
CS101
C
0%
0%
50%
50%
Welcome to
Geo101
© Daphne
Koller,
2003
easy
/ hard
A
low
high
low / high
100%
100%
Aggregate Dependencies
avg
A
B
C
y n
0.8 0.2
0.4 0.6
0.1 0.9
Reg
#5077
Grade
Reg
av
g
Satisfaction#5054
Grade
Reg
Reg
Satisfaction#5639
Grade Grade
Satisfaction
Satisfaction
[K. & Pfeffer ’98; Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
Student
Student
Intelligence
Jane Doe
Intelligence
Honors
Honors
Problem!!!
Need
dependencies
on sets
Aggregate Dependencies
Professor
Course
Student
Teaching-Ability
Intelligence
Stress-Level
Honors
Difficulty
count
Rating
avg
Reg
Grade
avg
© Daphne Koller, 2003
Satisfaction
sum, min, max,
avg, mode, count
Basic RBN: Summary

RBN specifies

A probabilistic dependency structure S:



A set of parents X..B for each class attribute X.A
A set of local probability models:

Aggregator to use for each multi-valued dependency

Set of CPD parameters
X.A
Given relational skeleton structure x, RBN induces
a probability distribution over worlds 

Distribution defined via ground BN over attributes x.A
P ( | x , S , )    P (x .A | parentsS ,x (x .A ), X .A )
X .A x Xx
© Daphne Koller, 2003
Attributes Objects
Extension: Class Hierarchy


Subclasses inherit all attributes of parents, but
may have additional ones
For inherited attribute X.A, subclass can:



inherit parent’s probabilistic model
overwrite with local probabilistic model
Example:



Professor has subclasses assistant, associate, full
Inherit distribution over Stress-Level
Modify distribution over Salary
[K. & Pfeffer ’98]
© Daphne Koller, 2003
Extension: Class Hierarchies

Hierarchies allow reuse in knowledge engineering
and in learning


Parameters and dependency models shared across
more objects
If class assignments specified in x, class hierarchy
does not introduce complications
[K. & Pfeffer ’98]
© Daphne Koller, 2003
Coherence & Acyclicity
Professor

Teaching-Ability
Stress-Level
Reg
Grade
Satisfaction
Smith.Stress-level depends
probabilistically on itself
[Friedman, Getoor, K. Pfeffer ’99]
© Daphne Koller, 2003
For given skeleton x, PRM 
asserts dependencies
between attributes of objects:
y .B x , x .A

 defines coherent probability
model over  if x, is acyclic
Guaranteeing Acyclicity
How do we guarantee that a PRM  is acyclic
for every object skeleton x?
PRM
dependency
structure S
template-level
dependency
graph
Y.B
X.A
Attribute stratification:
if X..B  Parents(X.A), and
class of X. is Y
class dependency graph acyclic 
x, acyclic for all x
© Daphne Koller, 2003
Limitation of Stratification
Father
Person
M-chromosome
P-chromosome
Blood-type
Mother
Person
M-chromosome
Person
P-chromosome
M-chromosome
P-chromosome
Blood-type
Blood-type
Person.M-chrom
© Daphne Koller, 2003
Person.P-chrom
Person.B-type
Limitation of Stratification
Person
M-chromosome
P-chromosome
Blood-type
Father
Mother
Person
M-chromosome
P-chromosome
Person
M-chromosome
P-chromosome
Blood-type
Blood-type

Prior knowledge: the Father-of relation is acyclic

Dependence of Person.A on Person.Father.B cannot
induce cycles
© Daphne Koller, 2003
Guaranteeing Acyclicity


With guaranteed acyclic relations, some cycles in
the dependency graph are guaranteed to be safe.
We color the edges in the dependency graph
yellow: within
single object
X.A
green: via
g.a. relation
X.B
Person.M-chrom
Y.B
red: via
other relations
X.A
Y.B
Person.P-chrom
Person.B-type
© Daphne Koller, 2003
X.A
A cycle is safe if

it has a green edge

it has no red edge
Object-Oriented Bayesian Nets*

OOBNs are RBN with only one type of relation





One object can be a “part-of” another
Objects can only interact with component parts
Other types of relationships must be embedded into
the “part-of” framework
Defines “neat” hierarchy of related objects
Provides clearly defined object interface
between object x and its enclosing object y
* Simplified view
© Daphne Koller, 2003
[K. & Pfeffer ’97]
OOBN Example

Drivers, cars, collision must all be viewed
as part of situation object
Driver1
Driver2
Owner
Owner
Car1
Value
Speed
Car1
Damage
© Daphne Koller, 2003
Car2
Speed
Collision
Severity
Value
Car2
Damage
Part-of hierarchy

Objects in the model are encapsulated within
each other, forming a tree.
Accident situation
Driver1
Income
Car1
Price
Engine Speed
Power
© Daphne Koller, 2003
Severity
Car2
Driver2
Speed Engine Price Income
Power
Lines of Communication
Accident situation
Driver1
Income Price
Car1
Severity
Engine
Speed Speed
Power
© Daphne Koller, 2003
Car2
Engine
Power
Driver2
Price Income
Probabilistic Relational Models:
Relational Markov Networks
© Daphne Koller, 2003
Why Undirected Models?

Symmetric, non-causal interactions



Patterns involving multiple entities



E.g., web: categories of linked pages are correlated
Cannot introduce direct edges because of cycles
E.g., web: “triangle” patterns
Directed edges not appropriate
“Solution”: Impose arbitrary direction


Not clear how to parameterize CPD for variables
involved in multiple interactions
Very difficult within a class-based parameterization
[Taskar, Abbeel, K. 2001]
© Daphne Koller, 2003
Markov Networks: Review
ABC
LLL
LLH
LHL
LHH
HLL
HLH
HHL
HHH
Potential
Alice
(A,B,C)
Eve
Bob
Chris
Dave
1
P(A, B, C, D, E)   (A, B, C) (C,D)  (D, E) (E, A)
Z
© Daphne Koller, 2003
Markov Networks: Review


A Markov network is an undirected graph over
some set of variables V
Graph associated with a set of potentials i


Each potential is factor over subset Vi
Variables in Vi must be a (sub)clique in network
1
P (V ) 
Z
© Daphne Koller, 2003
  (V )
i
i
i
Relational Markov Networks


Probabilistic patterns hold for groups of objects
Groups defined as sets of (typed) elements linked in
particular ways
Student
Intelligence
Reg
Grade
Course
Difficulty
Student2
Intelligence
© Daphne Koller, 2003
Template potential
Reg2
Grade
CC
CB
CA
BC
Study
BB
BA
AC
AB
AA
Group
0
0.5
1
1.5
2
RMN Language

Define clique templates



All tuples {reg R1, reg R2, group G}
s.t. In(G, R1), In(G, R2)
Compatibility potential (R1.Grade, R2.Grade)
Ground Markov network contains potential
(r1.Grade, r2.Grade) for all appropriate r1, r2
© Daphne Koller, 2003
Ground MN (or Chain Graph)
Grade
Welcome to
Intelligence
Geo Study Group
Geo101
Welcome to
Difficulty
Grade
CS101
Difficulty
Intelligence
Grade
CS Study Group
Grade
© Daphne Koller, 2003
George
Jane
Intelligence
Jill
PRM Inference
© Daphne Koller, 2003
Inference: Simple Method

Define ground network as in semantics
Apply standard inference methods

Problem:





Very large models can be specified very easily
Resulting ground network often highly connected
Exact inference is typically intractable
In practice, often must resort to approximate
methods such as belief propagation
© Daphne Koller, 2003
Exploiting Structure: Encapsulation


Objects interact only in limited ways
Can define object interface:




Outputs: Object attributes influencing other objects
Inputs: External attributes influencing object
Object is independent of everything given interface
Inference can be encapsulated within objects, with
“communication” limited to interfaces
Student
Upbringing
Intelligence
GPA
[K.
& Pfeffer,
© Daphne
Koller,1997]
2003
Honors
Personality
Job-prospects
Exploiting Structure: Encapsulation


Marginalize object distribution onto interface
Dependency graph over interfaces induced by




Perform inference over interfaces
If interaction graph has low tree-width, can use
exact inference


Inter-object dependencies
And hence by the relational structure
E.g., part-of hierarchy in some OOBNs
If relational structure more complex, can use BP

A form of Kikuchi BP, where cluster selection is
guided by relational structure
© Daphne Koller, 2003
Exploiting Structure: Reuse



Objects from same class have same model
For generic objects – no internal evidence –
marginalize interface is the same
Can reuse inference – a form of “lifting”
George
Jane
Upbringing
Intelligence
GPA
Honors
Intelligence
Personality
Job-prospects
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
Upbringing
GPA
Honors
Personality
Job-prospects
Exploiting Structure: Reuse

Generic objects often play same role in model



Multiple students that all take the same class
Reuse: compute interface once
Combinatorics: compute total contribution to
probability in closed form
P(k students like the class | teaching-ability = low) =
P(generic student likes the class | teaching ability = low)k
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
Case study
Battlefield situation assessment for missile units




several locations
many units
each has detailed model
Example object classes:





Battalion
Battery
Vehicle
Location
Weather.
© Daphne Koller, 2003

Example relations:




At-Location
Has-Weather
Sub-battery/In-battalion
Sub-vehicle/In-battery
Effect of Exploiting Structure
6000
5000
running time in seconds
Query: P(Next-Mission)
flat BN
no reuse
with reuse
with combinatorics
4000
3000
BN has  2400 nodes
cost  20 minutes
2000
cost  9 sec
1000
0
1
[Pfeffer
K. 1998]
© Daphne&Koller,
2003
2
3
4
5
6
7
8
#vehicles of each type in each battery
9
10
PRM Learning
© Daphne Koller, 2003
Outline

Relational Bayesian networks





Relational Markov networks


Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:


Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
PRM Learning: Input
Prof.
Smith
Teaching-ability
High
Teaches
Teaches
Prof.
Jones
High
Teaching-ability
Welcome to
Input: a full world 
B
In-courseGrade
Registered
Satisfac
Hate
Geo101
Welcome to
Difficulty
Easy
George
Grade
C
In-course
Satisfac
Hate
Registered
CS101
Difficulty
Hard
© Daphne Koller, 2003
Intelligence
Weak
Grade
A
Satisfac
Hate
In-course
Intelligence
Smart
Registered
Jane
Likelihood Function
L(S , :  )  P ( | x , S , )
   P (x .A | parentsS , (x .A ), X .A )
X .A x Xx
Attributes Objects


Likelihood of a BN with shared parameters
Joint likelihood is a product of likelihood terms


One for each attribute X.A and its family
For each X.A, the likelihood function aggregates
counts from all occurrences x.A in world 
[Friedman,
Getoor,
© Daphne Koller,
2003 K., Pfeffer, 1999]
Likelihood Function: Multinomials
Log-likelihood:
log P ( | x , S , ) 


 M (a , u ) log 
X .A uVal ( Pa ( X .A )) a Val ( X .A )
x |u
Sufficient statistics:
M (a , u ) 
{x  X :x .A  a , parentsS , (x .A )  u}
© Daphne Koller, 2003
RBN Parameter Estimation

MLE parameters:
Pˆ( Reg .Grade  A | Student .Intell  hi , Course .Diff  lo ) 
M ( Reg .Grade  A , Student .Intell  hi , Course .Diff  lo )
M ( Reg .Grade  *,Student .Intell  hi , Course .Diff  lo )

Bayesian estimation:


Prior for each attribute X.A
Posterior uses aggregated sufficient statistics
© Daphne Koller, 2003
Learning w. Missing Data

EM Algorithm applies essentially unchanged



E-step computes expected sufficient statistics,
aggregated over all objects in class
M-step uses ML (or MAP) parameter estimation
Key difference:



In general, the hidden variables are not independent
Computation of expected sufficient statistics requires
inference over entire network
[Same reasoning as for forward-backward algorithm
in temporal models]
© Daphne Koller, 2003
Learning w. Missing Data: EM
[Dempster et al. 77]
Students
Courses
A
B
C
easy,low
easy,high
hard,low
hard,high
0%
20%
40%
60%
80% 100%
P(Registration.Grade |
Course.Difficulty,
Student.Intelligence)
easy / hard
© Daphne Koller, 2003
low / high
Learning RBN Structure

Define set of legal RBN structures


Ones with legal class dependency graphs
Define scoring function  Bayesian score
marginal
likelihood
prior
 

Score (S :  )  log[ P ( | S , x )P (S )]
P ( | S , x )   L (S , S : )P (S )dS

Product of family scores:
One for each X.A
 Uses aggregated sufficient statistics


Search for high-scoring legal structure
[Friedman,
Getoor,
© Daphne Koller,
2003 K., Pfeffer, 1999]
Learning RBN Structure

All operations done at class level




Dependency structure = parents for X.A
Acyclicity checked using class dependency graph
Score computed at class level
Individual objects only contribute to sufficient
statistics

Can be obtained efficiently using standard DB queries
© Daphne Koller, 2003
Exploiting Locality: Phased Search
Phase 0: consider only dependencies within a class
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Phase 1: consider dependencies one link away
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Phase k: consider dependencies k links away
Course
Course
Reg
Student
Course
© Daphne Koller, 2003
Reg
Reg
Student
Student
Exploiting Locality: Phased Search
Professor
Popularity
Course
Teaching-Ability
Student
Difficulty
Intelligence
Rating
Honors
Reg
Grade
Satisfaction
© Daphne Koller, 2003
TB Patients in San Francisco
Strains
Patients
[Getoor, Rhee, K., Small, 2001]
© Daphne Koller, 2003
Contacts
TB Patients in San Francisco
1000 strains
Suscept
Gender
POB
Race
HIV
Age
TB-type
XRay
Outbreak
Strain
Input: Relational database
Output: PRM for domain
Infected
Treatment
2300 patients
Patient
Smear
Subcase
CareLoc
CareLoc
Contact
© Daphne Koller, 2003
CloseCont
20000 contacts
Relation
Outline

Relational Bayesian networks





Relational Markov networks


Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:


Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
Learning RMN Parameters
Student
Intelligence
Reg
Grade
Course
Difficulty
Student2
Intelligence
Template potential
Reg2
CC
CB
CA
BC
Study
BB
BA
AC
AB
AA

Group
Grade
  exp{ wAA  f AA  ...  wCC  fCC }
Parameterize potentials as log-linear model
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Learning RMN Parameters
(w :  )  log Pw ( | w )  wi fi ( )  log Z
i
Counts in  Partition
function
For example:
fAA () = # of tuples {reg r1, reg r2, group g} s.t.
In(g, r1), In(g, r2); r1.Grade=A, r2.Grade=A
© Daphne Koller, 2003
Learning RMN Parameters



Parameter estimation is not closed form
Convex problem  unique global maximum
Can use methods such as conjugate gradient

 # (Grade  A , Grade  A ) actual count
w AA
  P (Grade  A , Grade  A )


- expected count
Gradient process tries to find parameters s.t.
expected counts = actual counts
Computing expected counts requires inference
over ground Markov network
© Daphne Koller, 2003
Learning RMNs
L = log P(Grades,Intelligence,Difficulty)
easy / hard
ABC
low / high
Grade
Difficulty
Intelligence
Grade
Intelligence
(Reg1.Grade,Reg2.Grade)
CC
CB
CA
BC
BB
BA
AC
AB
AA
0
0.5
1
1.5
2
Grade
Difficulty
Intelligence
Grade
© Daphne Koller, 2003
L
 # (Grade  A, Grade  A)
AA
 P(Grade  A, Grade  A | Diffic )
Outline

Relational Bayesian networks





Relational Markov networks


Likelihood function
ML parameter estimation
EM
Structure learning
Parameter estimation
Applications:


Collective classification – web data
Relational clustering – biological data
© Daphne Koller, 2003
Collective Classification
Training Data
Course
Probabilistic
Relational
Model
Student
Reg
Learning
Model
Structure
New Data
Conclusions
Inference
Example:

Train on one year of student intelligence, course difficulty, and grades

Given only grades in following year, predict all students’ intelligence
© Daphne Koller, 2003
Discriminative Training



Goal: Given values of observed variables .O=o,
predict desired target values .T=t*
Do not necessarily want the model to fit the
joint distribution P(.O=o, .T=t*)
To maximize classification accuracy, we can
consider other optimization criteria

Maximize conditional log likelihood
P (T
. t * | .O  o )

Maximize margin
log P (T
.  t * |  .O  o )
 log[max t t * P (T
.  t |  .O  o )]
[Taskar, Abbeel, K., 2002;
Guestrin,
©Taskar,
Daphne Koller,
2003 K. 2003]
P(second highest probability label)
Web  KB
Tom Mitchell
Professor
Project-of
WebKB
Project
Member
Advisor-of
Sean Slattery
Student
[Craven et al.]
© Daphne Koller, 2003
Web Classification Experiments

WebKB dataset





Four CS department websites
Bag of words on each page
Links between pages
Anchor text for links
Experimental setup



Trained on three universities
Tested on fourth
Repeated for all four combinations
© Daphne Koller, 2003
Standard Classification
Page
Category
Word1
Professor
department
extract
information
computer
science
machine
learning
…
© Daphne Koller, 2003
Categories:
faculty
course
project
student
other
...WordN
Discriminatively trained model
= Logistic Regression
Standard Classification
Page
Category
...WordN...LinkWordN
Word1
workin
0.18
g 0.16
0.14
with
0.12
Tom
0.1
Mitchell
0.08
…
0.06
0.04
0.02
0
© Daphne Koller, 2003
Logistic
Power of Context
Professor?
© Daphne Koller, 2003
Student?
Post-doc?
Collective Classification
To-Page
From-Page
Category
Category
Word1
© Daphne Koller, 2003
...WordN
FT
SS
SP
SF
SC
PS
PP
PF
PC
FS
FP
FF
FC
CS
CP
CF
CC
Link
...
WordN
Word1
Compatibility (From,To)
Collective Classification
To-Page
From-Page
Category
Category
Word1
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
...WordN
Link
Word1
...WordN
Classify all pages collectively,
maximizing the joint label probability
© Daphne Koller, 2003
Logistic
Links
More Complex Structure
© Daphne Koller, 2003
More Complex Structure
Faculty
W1
Students
C
Wn
S
S
© Daphne Koller, 2003
Courses
Collective Classification: Results
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
35.4% relative reduction in error
relative to strong flat approach
[Taskar, Abbeel, K., 2002]
© Daphne Koller, 2003
Logistic
Links
Section
Link+Section
Relational Clustering
Unlabeled Relational Data
Course
Probabilistic
Relational
Model
Stude
nt
Reg
Learning
Model
Structure
Clustering of instances
Example:

Given only students’ grades, cluster similar students
© Daphne Koller, 2003
Movie Data
© Daphne Koller, 2003
Internet Movie Database
http://www.imdb.com
Discovering Hidden Types
Actor
Director
Type
Type
Movie
Genres
Type
Year MPAA Rating
[Taskar, Segal, K., 2001]
© Daphne Koller, 2003
Rating
#Votes
Discovering Hidden Types
Movies
Actors
Directors
Wizard of Oz
Cinderella
Sound of Music
The Love Bug
Pollyanna
The Parent Trap
Mary Poppins
Swiss Family Robinson
Sylvester Stallone
Bruce Willis
Harrison Ford
Steven Seagal
Kurt Russell
Kevin Costner
Jean-Claude Van Damme
Arnold Schwarzenegger
Alfred Hitchcock
Stanley Kubrick
David Lean
Milos Forman
Terry Gilliam
Francis Coppola
Terminator 2
Batman
Batman Forever
GoldenEye
Starship Troopers
Mission: Impossible
Hunt for Red October
Anthony Hopkins
Robert De Niro
Tommy Lee Jones
Harvey Keitel
Morgan Freeman
Gary Oldman
Steven Spielberg
Tim Burton
Tony Scott
James Cameron
John McTiernan
Joel Schumacher
…
© Daphne Koller, 2003
…
Biology 101: Pathways

Pathways are sets of genes that act together to
achieve a common function
© Daphne Koller, 2003
Biology 101: Expression
Gene 1
Coding
Control
Gene 2
Coding
Control
DNA
RNA
Protein
© Daphne Koller, 2003
Cells express different
subsets of their genes
in different tissues and
under different conditions
Swi5 Transcription factor
Finding Pathways: Attempt I

Use protein-protein interaction data
Pathway IIII
II
© Daphne Koller, 2003
Finding Pathways: Attempt I


Use protein-protein interaction data
Problems:


Data is very noisy
Structure is lost:

Large connected component
(3527/3589 genes)
in interaction graph
© Daphne Koller, 2003
Finding Pathways: Attempt II

Use gene expression data

Thousands of arrays available under different conditions
Pathway I
Clustering
Pathway II
© Daphne Koller, 2003
Finding Pathways: Attempt II

Use gene expression data


Thousands of arrays available under different conditions
Problems:



Expression is only ‘weak’
indicator of interaction
Data is noisy
Interacting pathways are
not separable
© Daphne Koller, 2003
Pathway I
Pathway II
Finding Pathways: Our Approach

Use both types of data to find pathways


Find “active” interactions using gene expression
Find pathway-related co-expression using interactions
Pathway I
Pathway III
Pathway II
Pathway IV
© Daphne Koller, 2003
Probabilistic Model

Genes are partitioned into “pathways”:



Expression component:


Every gene is assigned to one of ‘k’ pathways
Random variable for each gene with domain {1,…,k}
Model likelihood is higher when genes in the same
pathway have similar expression profiles
Interaction component:

Model likelihood is higher when genes in the same
pathway interact
[Segal, Wang, K., 2003]
© Daphne Koller, 2003
Expression Component
Naïve Bayes
g.C=1
g.E1
0
g.E2
0
g.E3
0
Pathway of gene g
…
g.Ek
0
Expression level of gene g in m arrays
© Daphne Koller, 2003
Protein Interaction Component

Interacting genes are more likely to be in the
same pathway
Gene 1
g1.C
(g.C,g.C)
g1.C g2.C
protein product
interaction
Gene 2
g2.C
1
1
1
2
2
2
3
3
3
1

1
2
3
1
2
3
1
2
3
Compatibility potential
© Daphne Koller, 2003
Joint Probabilistic Model
Gene 1
g1.C
Gene 2
g2.C
Gene 3
g3.C
Gene 4
g4.C
1
1
1
1
2
2
2
3
3
3
1
2
3
1
2
3
1
2
3

1
g1.E2
g1.E3
g2.E1
g2.E2
g2.E3
g3.E1
g3.E2
g3.E3
g4.E1
g4.E2
g4.E3
1
0
0
© Daphne Koller, 2003
g1.E1
2
1 2 3
0
3
Path. I
Learning Task
Large Markov network with high connectivity

Use loopy belief propagation
1

g1.C
g2.C
1
g1.E
1
1
1
2
2
2
3
3
3
g2.E1
1
2
3
1
2
3
1
2
3
1
0
2
g2.E2
3
g3.C
g1.E3
g1.E2
0
g2.E3
Path. I
g3.E1
g3.E2
g3.E3
g4.E1
g4.E2
g4.E3
g4.C



E-step: compute pathway assignments
M-step: Estimate Gaussian distribution parameters
Estimate compatibility potentials using cross-validation
© Daphne Koller, 2003
Capturing Protein Complexes

Independent data set of interacting proteins
400
Our method
Standard expression clustering
Num Complexes
350
300

250
200

150
124 complexes covered
at 50% for our method
46 complexes covered at
50% for clustering
100
50
0
0
© Daphne Koller, 2003
10 20 30 40 50 60 70 80 90 100
Complex Coverage (%)
RNAse Complex Pathway
YHR081W
RRP40
RRP42
MTR3
RRP45
RRP4
RRP43
DIS3
TRM7
SKI6
RRP46
CSL4


Includes all 10
known pathway
genes
Only 5 genes
found by
clustering
© Daphne Koller, 2003
RRP40
TRM7
RRP43
DIS3
RRP42
CSL4
RRP45
YHR081W
RRP46
RRP4
SKI6
MTR3
Interaction Clustering

RNAse complex found by interaction clustering as
part of cluster with 138 genes
© Daphne Koller, 2003
Uncertainty about
Domain Structure
or
PRMs are not just template
BNs/MNs
© Daphne Koller, 2003
Structural Uncertainty

Class uncertainty:


Relational uncertainty:


To which class does an object belong
What is the relational (link) structure
Identity uncertainty:


Which “names” refer to the same objects
Also covers data association
© Daphne Koller, 2003
Relational Uncertainty
Probability distribution over graph structures


Link existence model
 E.g., hyperlinks between webpages
 Each potential link is a separate event
 Its existence is a random variable
Link reference model
 E.g., instructors teaching a course


Distribution over # of endpoints for each outgoing link
Each link has distribution over link endpoint


e.g., instructor link for CS course likely to point to CS prof
Many other models possible
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Link Existence Model

Background knowledge x is an object skeleton


A set of entity objects
PRM defines distribution over worlds 


Assignments of values to all attributes
Existence of links between objects
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Link Existence as a PRM

Define objects for any potential links



Potential link object for any pair of webpages w1, w2
Each potential link object has link existence
attribute, denoting whether the link exists or not
Link existence variables have probabilistic model
Page-from
Type
Words
Page-to
Link
Exists
Type
Words
From.Type
[Getoor, Friedman, K. Taskar, 2002]
© Daphne Koller, 2003
Student
Student
Professor
Professor
To.Type
False
True
Student
Professor
Student
Professor
0.999
0.995
0.998
0.999
0001
0005
0002
0001
Why Are Link Models Useful?

Predict which links exist in the world



Which professor teaches each course
Which student will register for which course
Use known links to infer values of attributes


Given that student registered for a hard class, is she
more likely to be a graduate student
Given that one page points to another, is it more
likely to be a faculty page?
© Daphne Koller, 2003
Predicting Relationships
Tom Mitchell
Professor
Member
WebKB
Project
Member
Advisor-of
Sean Slattery
Student
Predict and classify relationships between objects
© Daphne Koller, 2003
Flat Model
To-Page
From-Page
Word1
NONE
advisor
instructor
TA
member
project-of
© Daphne Koller, 2003
...
Word1
WordN
Rel
LinkWord1
Type
...
LinkWordN
...
WordN
Collective Classification: Links
To-Page
From-Page
Category
Category
...
...
Word1
Rel
LinkWord1
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Word1
WordN
Type
...
LinkWordN
WordN
Link Model
...
...
...
...
...
© Daphne Koller, 2003
...
Triad Model
Professor
Advisor
TA
Instructor
Course
© Daphne Koller, 2003
Student
Link Prediction: Results
...

...
72.9% relative reduction in error
...
relative to strong flat approach
Error measured over
links predicted to be
present
30
25
20

15
Link presence cutoff is
10
at precision/recall
5
break-even point
(30% for all models) 0
[Taskar, Wong, Abbeel, K., 2002]
© Daphne Koller, 2003
Flat
Links
Triad
Identity Uncertainty Model

Background knowledge x is an object universe


A set of potential objects
PRM defines distribution over worlds 



Assignments of values to object attributes
Partition of objects into equivalence classes
Objects in same class have same attribute values
[Pasula, Marthi, Milch, Russell, Shpitser, 2002]
© Daphne Koller, 2003
Citation Matching Model*
Link chain:
Appears-in.
Refers-to.
Written-by
Author
Name
Written-by
Paper
ObsTitle
PubType


Name
Refers-to Citation
Title

Author-as-Cited
Appears-in
Text
Each citation object associated with paper object
Uncertainty over equivalence classes for papers
If P1=P2, have same attributes & links
Simplified
©*Daphne
Koller, 2003
Title, PubType
Authors
Identity Uncertainty

Depending on choice of equivalence classes:


Number of objects changes
Dependency structure changes

No “nice” corresponding ground BN

Algorithm:



Each partition hypothesis defines simple BN
Use MCMC over equivalence class partition
Exact inference over resulting BN defines acceptance
probability for Markov chain
[Pasula, Marthi, Milch, Russell, Shpitser, 2002]
© Daphne Koller, 2003
Identity Uncertainty Results
Phrase Match
100
95
PRM+MCMC
61.5% relative reduction in error
relative to state of the art
90
85
80
75
70
Face
RL
Reasoning Constraint
Average
Accuracy of citation recovery:
% of actual citation clusters recovered perfectly
[Pasula,
Marthi,
Milch,
©
Daphne Koller,
2003
Russell, Shpitser, 2002]
Summary: PRMs …

Inherit the advantages of graphical models:



Coherent probabilistic semantics
Exploit structure of local interactions
Allow us to represent the world in terms of:




Objects
Classes of objects
Properties of objects
Relations
© Daphne Koller, 2003
So What Do We Gain?

Convenient language for specifying complex models

“Web of influence”: subtle & intuitive reasoning

A mechanism for tying parameters and structure



within models
across models
Framework for learning from relational and
heterogeneous data
© Daphne Koller, 2003
So What Do We Gain?
New way of thinking about models & problems
 Incorporating heterogeneous data by connecting
related entities

New problems:



Collective classification
Relational clustering
Uncertainty about richer structures:


Link graph structure
Identity
© Daphne Koller, 2003
But What Do We Really Gain?

Simple PRMs  relational logic w. fixed domain and  only


Induce a “propositional” BN
Can augment language with additional expressivity
Existentialjust
quantifiers
& functions
Are PRMs
a convenient
language for
Equality
specifying attribute-based graphical models?



Resulting language is inherently more expressive, allowing
us to represent distributions over




worlds where dependencies vary significantly [Getoor et al., Pasula et al.]
worlds with different numbers of objects [Pfeffer et al., Pasula et al.]
worlds with infinitely many objects [Pfeffer & K.]
Big questions: Inference & Learning
© Daphne Koller, 2003






Pieter Abbeel
Nir Friedman
Lise Getoor
Carlos Guestrin
Brian Milch
Avi Pfeffer





Eran Segal
Ken Takusagawa
Ben Taskar
Haidong Wang
Ming-Fai Wong
http://robotics.stanford.edu/~koller/
http://dags.stanford.edu/
© Daphne Koller,
2003