Section 0: Introduction

Download Report

Transcript Section 0: Introduction

Tutorial Website https://oschulte.github.io/srl-tutorialslides/
Learning Bayesian Networks for
Complex Relational Data
Tutorial Introduction
Presenters: Oliver Schulte, Ted Kirkpatrick
School of Computing Science
Simon Fraser University
Vancouver-Burnaby, Canada
Overview
Learning Bayesian Networks for Complex Relational Data
The Short Story
 Many organizations keep their data in a relational
database.
 We describe methods for learning a Bayesian
network for data in a relational database.
 Simultaneous joint statistical analysis of multiple interrelated
tables.
Learning Bayesian Networks for Complex Relational Data
3
Questions Considered
1.Semantics: how do you interpret a relational/first-order
Bayesian network?
2.How can you use it?
3.What are the statistical challenges for learning?
4.What are the computational challenges for learning?
Learning Bayesian Networks for Complex Relational Data
4
Motivation
Learning Bayesian Networks for Complex Relational Data
Database Management Systems
 Maintain data in linked tables.
 Structured Query Language (SQL) allows fast data
retrieval.
 E.g., find all movie ratings > 4 where the user is a woman.
 Multi-billion dollar industry, $Bn15+ in 2006.
 IBM, Microsoft, Oracle, SAP, Peoplesoft.
Learning Bayesian Networks for Complex Relational Data
6
Beyond storing and retrieving data
 Much new interest in analyzing databases.
 Data Mining.
 Data Warehousing.
 Business Intelligence.
 Predictive Analytics.
Learning Bayesian Networks for Complex Relational Data
7
Unifying Logic and Probability
• Fundamental Question in AI: how to combine logic and
probability and learning?
 Statistical-Relational Learning
• Domingos (U of W, CS): “Logic handles complexity,
probability represents uncertainty.”
• Recent survey paper by Stuart Russell
Learning Bayesian Networks for Complex Relational Data
8
Query Examples
Learning Bayesian Networks for Complex Relational Data
Sample Queries
 Inference in a Bayesian network computes answers to
probabilistic queries
 A Bayesian network for relational data can answer
relational probabilistic queries
 We give some examples of relational and nonrelational
queries
Learning Bayesian Networks for Complex Relational Data
10
Single-Table Queries (Not relational)
Query
P(Drama(Movie) =
T|RunTime(Movie) = Long)
English Paraphrase
The probability that a movie is
a drama, given that it is long.
P(Country(Actor) =
U.S.|gender(Actor)=W)
The probability that an actor
is from the US, given that her
gender is woman.
11
Cross-Table Queries (Movies)
Query
English Paraphrase
Positive relationship
P(Drama(Movie) = T|
RunTime(Movie) = Long,
ActsIn(Movie,”brad pitt”),
ActsIn(Movie,”julie delp”),
Country(“julie delp” = France))
The probability that a
movie is from the US,
given that it is long, and
given that Brad Pitt and
Julie Delp have appeared
in it and Julie Delp is
from France.
Negative relationship
P(Drama(Movie) = T|
RunTime(Movie) = Long,
ActsIn(Movie,”brad pitt”),
not ActsIn(Movie,”juliette binoche”)
Country(“juliette binoche” =
France))
The probability that the
movie named Movie is
from the US, given that
it is long, and given that
Brad Pitt has appeared in
it, and Juliette Binoche
has not appeared in
it
12
and is from France.
Cross-Table Queries (Actors)
Query
English Paraphrase
Positive relationship P(Country(Actor) = U.S.|
gender(Actor)=W,
ActsIn(“hate”,Actor),
RunTime(“hate”)=short)
The probability that an
actor is from the US,
given that she is a woman,
and given that she
appeared in the movie
“hate”.
Negative
relationship
The probability that the
actor named Actor is from
the US, given that she is a
woman, and given that she
appeared in the long
movie Movie, and did not
appear in the short movie
“hate”.
13
P(Country(Actor) =
U.S.|gender(Actor)=W,
not ActsIn(“hate”,Actor),
RunTime(“hate”)=short)
Motivating Applications
The ability to answer relational probabilistic queries has
supported a number of successful applications. For
example:
 Relational Query Optimization
 Information Extraction (DeepDive)
 Ontology Matching
 Entity Resolution
 Link-based classification
 Anomaly detection/exception mining
14
Tutorial Approach
 Our tutorial is a survey of issues, not of systems
 We give references to different systems
 Discuss a range of issues but only a single model class
(Bayesian networks)
 Most concepts generalize to log-linear models for relational data.
 Focus on the new challenges of learning Bayesian networks
with relational data, compared to traditional iid data
 Illustrate challenges and solutions with a running
example
Kimmig, A.; Mihalkova, L. & Getoor, L. (2014), 'Lifted graphical models: a survey', Machine Learning, 1--45.
Sutton, C. & McCallum, A. (2007), An Introduction to Conditional Random Fields For Relational Learning’
Introduction to Statistical Relational Learning', MIT Press, , pp. 93-127.
15
Tutorial Plan
 Relational Data
 First-Order Bayesian networks
 Parameter Learning for First-Order BNs
 30 min break
 Structure Learning for First-Order BNs
 Link-Based Classification using a First-Order BN
 Relational Anomaly Detection using a First-Order BN
Learning Bayesian Networks for Complex Relational Data
16