Using Relational Structure for Learning and

Download Report

Transcript Using Relational Structure for Learning and

Using Relational Structure for
Learning and Modeling in
Biomedical and Social
Domains
Mark Goadrich
Computer Science and
Mathematics
Centenary College of Louisiana
Natural Science Colloquium
November 6th, 2007
Overview
• First-Order Logic and Machine Learning
– The world is full of Objects
– Model these Objects to understand the world
• Inductive Logic Programming
– Objects and Relations/Properties
• Agent-Based Modeling
– Objects and Interactions/Behaviors
Bongard Problems
• 6 positive examples of a concept on left
• 6 negative examples on right
• How to learn this concept using a computer?
First-Order Logic using
PROLOG
• Objects
– e3, t1, t2, c1
• Types
Positive Example 3
–
–
–
–
example(e3)
triangle(t1)
triangle(t2)
circle(c1)
• Relations
–
–
–
–
–
–
–
has_shape(e3, t1)
has_shape(e3, t2)
has_shape(e3, c1)
inside(t2, c1)
left(t2, t1)
size(c1, 2.5)
above(t2, t1)
…
Repeat this process for each example in dataset
Inductive Logic Programming
(ILP)
• Search the space of possible rules “positive(E) :…”
• Judge rule quality by positive - negative coverage
positive(E)
positive(E):- has_shape(E, A)
positive(E):- has_shape(E, A), triangle(A)
positive(E) :- has_shape(E, A), has_shape(E, B),
triangle(A), circle(B), inside(A, B).
Research Issues in ILP
•
•
•
•
•
Enormous space to search for rules
Enormous number of examples
Incorporation of continuous features
Learning of probabilistic rules
Evaluation of rule quality
• Survey of ILP domains and future
interests
Mutagenesis
•
Designing effective and selective
drugs
•
Represent chemicals as atoms and
bonds between them
atm(127, 127_1, c, 22, 0.191 )
bond(127, 127_1, 127_6, 7 )
•
Learned mutagenic rule:
mutagenic(A) :- atm(A, B, c, 27, C),
bond(A, D, E, 1), bond(A, B, E, 7).
Breast Cancer
Detection
•
Large dataset of abnormalities
found in mammograms
•
Not enough radiologists
•
Relational features
– More than one abnormality per
mammogram
– More than one mammogram per
person over time
malignant(A) :- not asymmetric(A),
in_same_mammorgram(A, A2),
spiculated_margin(A2),
not distorted(A2)
Robot
Scientist
•
Represent Metabolic
Pathways as a
Regulatory Network
Graph
•
Knock out genes, and
then systematically
deduce the unknown
function
•
Try to learn the network
from time-series
microarray data
Social
Networks
•
People are connected by
friendships into networks
•
Each person has likes/dislikes,
possibly influenced by their
network
•
Can we learn your interests
based on who you know and
what they like? Targeted
advertisements?
Netflix Prize
•
What movies should Netflix
recommend you watch next?
•
Large relational dataset
–
–
–
–
–
–
•
Movies
Titles
Ratings
Friends
Friend’s ratings
Genre
$1 million if you achieve 10%
improvement over their algorithm
Cinematch
Zendo
•
Board game about inductive
logic
•
Master creates a rule which
some 3-D pyramid structures fit
and others do not
•
Players build structures and try
to guess the Master rule
•
Easier to design computer
Master to decide if a structure
fits the rule
•
Harder to design computer
Player that must efficiently guess
the rule
Crab Claws
•
What physical characteristics
distinguish between two
species?
•
Within the same species, what
changes due to growth, diet and
their relation to predation?
•
Find the “shock graph” of
each image
•
Use ILP to learn differences
based on these graphs
Agent-Based Modeling
• Objects have interactions with each other
– Flocks of Birds, Schools of Fish
• Separation
• Alignment
• Cohesion
• Objects interact with their environment
– Ant Foraging, Pheromones, Traffic Laws
• Agent-Based Modeling (ABM)
– Create discrete-time computational simulation
– Align models with known behavior
– Vary parameters to test new hypotheses
Social Science
Cellular
Process
Conclusions
• First-Order Logic combines with ILP and
ABM to create a powerful representation of
the world
• Research Opportunities
–
–
–
–
–
–
Social Networks
Zendo Player
Claws and Shock Graphs
Cellular Simulation
Social Simulation
[Insert your favorite dataset here]