View learning
Download
Report
Transcript View learning
View Learning: An extension to SRL
An application in Mammography
Jesse Davis, Beth Burnside, Inês Dutra Vítor
Santos Costa, David Page, Jude Shavlik &
Raghu Ramakrishnan
Background
Breast cancer is the most common cancer
Mammography is the only proven screening test
At this time approximately 61% of women have
had a mammogram in the last 2 years
Translates into 20 million mammograms per
year
The Problem
Radiologists interpret mammograms
Variability in among radiologists
differences in training and
experience
Experts have higher cancer
detection and less benign biopsies
Shortage of experts
Common Mammography findings
Microcalcifications
Masses
Architectural distortion
Calcifications
Mass
Architectural distortion
Other important features
Microcalcifications
Masses
Shape, distribution, stability
Shape, margin, density, size, stability
Associated findings
Breast Density
Other variables influence risk
•
Demographic risk factors
Family History
Hormone therapy
Age
Standardization of Practice
-Passage of the Mammography Quality Standards Act
(MQSA) in 1992
-Requires tracking of patient outcomes through regular
audits of mammography interpretations and cases of
breast cancer
-Standardized lexicon: BI-RADS was developed
incorporating 5 categories that include 43 unique
descriptors
BI-RADS
Margins
-circumscribed
-microlobulated
-obscured
-indistinct
-Spiculated
Shape
-round
-oval
-lobular
-irregular
Typically Benign
-skin
-vascular
-coarse/popcorn
-rod-like
-round
-lucent-centered
-eggshell/rim
-milk of calcium
-suture
-dystrophic
-punctate
Associated
Findings
Mass
Skin
Thickening
Density
-high
-equal
-low
-fat containing
Lymph
Node
Architectural
Distortion
Distribution
-amorphous
Tubular
Density
Skin
Lesion
Trabecular
Thickening
Calcifications
Intermediate
Special
Cases
-clustered
-linear
-segmental
-regional
-diffuse/scattered
Higher Probability
Malignancy
-pleomorphic
-fine/linear/branching
Nipple
Retraction
Axillary
Adenopathy
Skin
Retraction
Assymetric
Breast Tissue
Focal Assymetric
Density
Mammography Database
Radiologist interpretation of mammogram
Patient may have multiple mammograms
A mammogram may have multiple
abnormalities
Expert defined Bayes net for determining
whether an abnormality is malignant
Original Expert Structure
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Types of Learning
Hierarchy of ‘types’ of learning that we
can perform on the Mammography
database
Level 1: Parameters
Be/Mal
Shape
Size
Given: Features (node labels, or
fields in database), Data, Bayes
net structure
Learn: Probabilities. Note:
probabilities needed are
Pr(Be/Mal), Pr(Shape|Be/Mal),
Pr (Size|Be/Mal)
Level 2: Structure
Be/Mal
Shape
Size
Given: Features, Data
Learn: Bayes net structure
and probabilities. Note: with
this structure, now will need
Pr(Size|Shape,Be/Mal)
instead of Pr(Size|Be/Mal).
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Level 3: Aggregates
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions such
as average, mode, max, etc.
Learn: Useful aggregate
features, Bayes net structure
that uses these features, and
probabilities. New features
may use other rows/tables.
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Mammography Database
Patient Abnormality Date Mass Shape …
P1
P1
P1
…
1
2
3
…
5/02
5/04
5/04
…
Spic
Var
Spic
…
Mass Size
0.03
0.04
0.04
…
Loc
RU4
RU4
LL3
…
Be/Mal
B
M
B
…
Level 4: View Learning
Shape change
in abnormality
at this location
Increase in
average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions and
intensionally-defined relations
such as “increase” or “same
location”
Learn: Useful new features
defined by views (equivalent to
rules or SQL queries), Bayes
net structure, and probabilities.
Structure Learning Algorithms
Three different algorithms
Naïve Bayes
Tree Augmented Naïve Bayes (TAN)
Sparse Candidate Algorithm
Naïve Bayes Net
Simple, computationally efficient
Class
Value
…
Attr 1
Attr 2
Attr 3
Attr N-2
Attr N-1
Attr N
Example TAN Net
Also computationally efficient
[Friedman,Geiger & Goldszmidt ‘97]
Class
Value
…
Attr 1
Attr 2
Attr 3
Attr N-2
Attr N-1
Attr N
TAN
Arc from class variable to each attribute
Less Restrictive than Naïve Bayes
Polynomial time bound on constructing
network
Each attribute permitted at most one extra
parent
O((# attributes)2 * |training set|)
Guaranteed to maximize LL(BT | D)
TAN Algorithm
Constructs a complete graph between all
the attributes (excluding class variable)
Edge weight is conditional mutual information
between the vertices
Find maximum weight spanning tree over
the graph
Pick root in tree and make edges directed
Add edges from directed tree to network
General Bayes Net
Attr 2
Class
Value
Attr N
Attr 1
Attr 3
Attr N-1
Attr N-2
Attr N-3
Sparse Candidate
Friedman et al ‘97
No restrictions on directionality of arcs for
class attribute
Limits possible parents for each node to a
small “candidate” set
Sparse Candidate Algorithm
Greedy hill climbing search with restarts
Initial structure is empty graph
Score graph using BDe metric (Cooper & Herskovits
’92, Heckerman ’96)
Selects candidate set using an information
metric
Re-estimate candidate set after each
restart
Sparse Candidate Algorithm
We looked at several initial structures
Expert structure
Naïve Bayes
TAN
Scored network on tune set accuracy
Our Initial Approach for Level 4
Use ILP to learn rules predictive of
“malignant”
Treat the rules as intensional definitions of
new fields
The new view consists of the original table
extended with the new fields
Using Views
malignant(A) :massesStability(A,increasing),
prior_mammogram(A,B,_),
H0_BreastCA(B,hxDCorLC).
Sample Rule
malignant(A) :BIRADS_category(A,b5),
MassPAO(A,present),
MassesDensity'(A,high),
HO_BreastCA(A,hxDCorLC),
in_same_mammogram(A,B),
Calc_Pleomorphic(B,notPresent),
Calc_Punctate(B,notPresent).
Methodology
10 fold cross validation
Split at the patient level
Roughly 40 malignant cases and 6000
benign cases in each fold
Methodology
Without the ILP rules
With ILP
6 folds for training set
3 folds for tuning set
4 folds to learn ILP rules
3 folds for training set
2 folds for tuning set
TAN/Naïve Bayes don’t require tune set
Evaluation
Precision and recall curves
Why not ROC curves?
With many negatives ROC curves look overly
optimistic
Large change in number of false positives yields
small change in ROC curve
Pooled results over all 10 folds
ROC: Level 2 (TAN) vs. Level 1
Precision-Recall Curves
Related Work: ILP for Feature
Construction
Pompe
& Kononenko, ILP’95
Srinivasan & King, ILP’97
Perlich & Provost, KDD’03
Neville, Jensen, Friedland and Hay,
KDD’03
Ways to Improve Performance
Learn rules to predict “benign” as well as
“malignant.”
Use Gleaner (Goadrich, Oliphant & Shavlik,
ILP’04) to get better spread of Precision
vs. Recall in the learned rules.
Incorporate aggregation into the ILP runs
themselves.
Richer View Learning Approaches
Learn rules predictive of other fields.
Use WARMR or other first-order clustering
approaches.
Integrate Structure Learning and View
Learning…score a rule by how much it
helps the current model when added
Level 4: View Learning
Shape change
in abnormality
at this location
Increase in
average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Given: Features, Data,
Background knowledge –
aggregation functions and
intensionally-defined relations
such as “increase” or “same
location”
Learn: Useful new features
defined by views (equivalent to
rules or SQL queries), Bayes
net structure, and probabilities.
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Increase in average size of
abnormalities
Avg
size
this
date
Be/Mal
Shape
Size
Integrated View/Structure Learning
sc(X):- id(X,P), id(Y,P), loc(X,L),
loc(Y,L), date(Y,D1), date(X,D2),
before(D1,D2), shape(X,Sh1),
shape(Y,Sh2), Sh1 \= Sh2, size(X,S1),
size(Y,S2), S1 > S2.
Avg
size
this
date
Be/Mal
Shape
Size
Richer View Learning (Cont.)
Learning new tables
Just rules for non-unary predicates
Train on pairs of malignancies for the same
mammogram or patient
Train on pairs (triples, etc.) of fields, where
pairs of values that appear in rows for
malignant abnormalities are positive
examples, while those that appear only in
rows for benign are negative examples
Conclusions
Graphical models over databases were originally
limited to the schema provided
Humans find it useful to define new views of a
database (new fields or tables intensionally
defined from existing data)
View learning appears to have promise for
increasing the capabilities of graphical models
over relational databases, perhaps other SRL
approaches
WILD Group
Jesse Davis
Beth Burnside
Ines Dutra
Vitor Santos Costa
Raghu Ramakrishnan
Jude Shavlik
David Page
Others:
Hector Corrada-Bravo
Irene Ong
Mark Goadrich
Louis Oliphant
Bee-Chung Chen