Transcript Slide 1
Practical Probabilistic Relational
Learning
Sriraam Natarajan
Take-Away Message
Learn from rich, highly
structured data!
Traditional Learning
Data is
i.i.d.
Burglary
Earthquake
Alarm
MaryCalls
+
B
E
A
M
J
1
0
1
1
0
0
0
0
0
1
.
.
.
0
1
1
0
1
JohnCalls
Attributes(Features)
Data
Learning
0.08
0.01
0.92
0.99
Earthquake
Burglary
0.1
Alarm
0.9
0.55 0.45
0.6
0.4
0.95 0.05
MaryCalls
0.3 0.7
0.8 0.2
JohnCalls
0.1
0.9
0.9
0.1
P1
M
3/22/63
Prescriptions
Lab Tests
PatientID Date
P1
P1
PatientID
P1
Lab Test
5/17/98
P1
P1
Result
1/1/01 blood glucose
1/9/01 blood glucose
Date Prescribed
PatientID Date
42
45
1/1/01
2/1/03
Physician Symptoms
Smith
Jones
Diagnosis
palpitations hypoglycemic
fever, aches influenza
PatientID SNP1 SNP2 … SNP500K
SNP Table
Patient Table
PatientID Gender Birthdate
Visit Table
Real-World Problem:
Predicting Adverse Drug Reactions
P1
P2
AA
AB
AB
BB
Date Filled
Physician
Medication
5/18/98
Jones
prilosec
BB
AA
Dose
10mg
Duration
3 months
Logic + Probability = Probabilistic Logic
aka Statistical Relational Learning Models
Logic
Add Probabilities
Statistical
Relational
Learning (SRL)
Probabilities
Add Relations
• Several previous SRL Workshops in the past decade
• This year – StaRAI @ AAAI 2013
Classical Machine
Learning
Statistical Relational Learning
Probability Theory
Stochastic
Deterministic
Prop Rule
Learning
Inductive Logic Programming
Learning
No Learning
Probabilistic Logic
Propositional
Logic
Prop
First Order Logic
FO
Costs and Benefits of the SRL soup
Benefits
Rich pool of different languages
Very likely that there is a language that fits your task
at hand well
A lot research remains to be done, ;-)
Costs
“Learning” SRL is much harder
Not all frameworks support all kinds of inference and
learning settings
How do we actually learn relational models from data?
Why is this problem hard?
Non-convex problem
Repeated search of parameters for every step in
induction of the model
First-order logic allows for different levels of
generalization
Repeated inference for every step of parameter
learning
Inference is P# complete
How can we scale this?
Relational Probability Trees
To predict heartAttack(X)
Each conditional
probability distribution
can be learned as a tree
Leaves are probabilities
The final model is the
set of the RRTs
male(X)
yes
no
chol(X,Y,L), Y>40,L>200
yes
0.8
…
no
diag(X,Hypertension,Z),Z>55
no
yes
bmi(X,W,55), W>30
[Blockeel & De Raedt ’98]
yes
0.77
no
0.3
0.05
Gradient (Tree) Boosting
[Friedman Annals of Statistics 29(5):1189-1232, 2001]
Models = weighted combination of a large number of small trees (models)
Intuition: Generate an additive model by sequentially fitting small trees to
pseudo-residuals from a regression at each iteration…
Data
Data
+
-
=
Residuals
Loss fct
+
Induce
Predictions
Initial Model
+
+
Iterate
Final
Model
=
+
+
+
+
…
Boosting Results – MLJ 11
Predicting the
advisor for a
student
Movie
Recommendation
Algo
Likelihood AUC-ROC
AUC-PR
Time
Boosting
0.810
0.961
0.930
9s
MLN
0.730
0.535
0.621
93 hrs
Citation Analysis
Machine Reading
Other Applications
Similar Results in several other problems
Imitation Learning – Learning how to act from
demonstrations (Natarajan et al IJCAI ‘11)
Robocup, a grid world domain, traffic signal domain and blocksworld
Prediction of CAC Levels – Predicting cardio-vascular risks in
young adults (Natarajan et al – IAAI 13)
Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine
12)
Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12,
Natarajan et al IJMLC 2013)
Parallel Lifted Learning
Statistical
Relational
Stochastic
ML
Scales well,
stochastic
gradients, online
learning, …
Parallel
Symmetries,
compact models,
lifted inference, ….
Symmetries,
compact models,
lifted inference, ….
Symmetry based inference
2
1
1
2
3
4
3
4
5
P(Anna)
5
HI (Bob)
root clause
P(Anna) !P(Bob)
1
neighboring clauses
2
3
4
P(Anna) => !HI(Bob)
P(Anna) => HI(Anna)
P(Bob) => HI(Bob)
5
HI(Anna)
P(Bob)
P(Bob) => !HI(Anna)
1
2
3
5
Tree (set of clauses)
P(Anna)!P(Bob)
P(Bob)=> HI(Bob)
P(Bob)=> !HI(Anna)
Variabilized tree
P(X)!P(Y)
P(Y)=> HI(Y)
P(Y)=> !HI(X)
4
Lifted Training
Generate tree
pieces from
corresponding
patterns.
Randomly draw
mini-batches
Generate initial
tree pieces and
variablize its
arguments.
Update parameter
vector and the
corresponding
equations
Compute gradient
using lifted BP
Update covariance
matrix C or some
low rank variant
Challenges
Message schedules
Iterative Map-reduce?
How do we take this idea to learning the
models?
How can we more efficiently parallelize
symmetry identification?
What are the compelling problems? Vision,
NLP,…
Conclusion
The world is inherently relational and uncertain
SRL has developed into an exciting field in the past decade
Several previous SRL workshops
Boosting Relational models has promising initial results
Applied to several different problems
First scalable relational learning algorithm
How can we parallelize/scale this algorithm?
Can this benefit from an inference algorithm like Belief
Propagation that can be parallelized easily?