From Association Analysis to Causal Discovery

Download Report

Transcript From Association Analysis to Causal Discovery

From Association Analysis to
Causal Discovery
Prof Jiuyong Li
University of South Australia
Association analysis
• Diapers -> Beer
• Bread & Butter -> Milk
Positive correlation of birth rate to
stork population
• increasing the stork population would increase
the birth rate?
Further evidence for Causality ≠ Associations
Simpson paradox
Recovered
Not recovered Sum
Recover rate
Drug
20
20
40
50%
No Drug
16
24
40
40%
36
44
80
Female
Recovered
Not recovered Sum
Recover rate
Drug
2
8
10
20%
No Drug
9
21
30
30%
11
29
40
Male
Recovered
Not recovered Sum
Recover rate
Drug
18
12
30
60%
No Drug
7
3
10
70%
25
15
40
Association and Causal Relationship
• Two variables X and Y.
• Prob(Y | X) ≠ P(Y), X is associated with Y (association
rules)
• Prob(Y | do X) ≠ Prob(Y | X)
• How does Y vary when X changes?
• The key, How to estimate Prob(Y | do X)?
• In association analysis, the relationship of X and Y
is analysed in isolation.
• However, the relationship between X and Y is
affected by other variables.
5
Causal discovery 1
• Randomised
controlled trials
– Gold standard
method
– Expensive
– Infeasible
• Association =
causation
Causal discovery 2
• Bayesian network based
causal inference
– Do-calculus (Pearl 2000)
– IDA (Maathuis et al.
2009)
– To infer causal effects in
a Bayesian network.
– However
– Constructing a Bayesian
network is NP hard
– Low scalability to large
number of variables
Leaning causal structures
• PC algorithm (Spirtes,
Glymour and Scheines)
– Not (A ╨ B | Z), there is an
edge between A and B.
– The search space
exponentially increases with
the number of variables.
CCC
B
A
ABC, ABC, CAB
• Constraint based search
– CCC (G. F. Cooper, 1997)
– CCU (C. Silverstein et. al.
2000)
– Efficiently removing noncausal relationships.
C
CCU
B
A
C
ABC
Association rules
• Many efficient
algorithms
• Hundreds of
thousands to millions
of rules.
– Many are spurious.
• Interpretability
– Association rules do
not indicate causal
effects.
Causal rules
• Discover causal relationships using partial association
and simulated cohort study.
• Do not rely on Bayesian network structure learning.
The discovery of causal rules also have strong
theoretical support.
• Discover both single cause and combined causes.
• Can be discovered efficiently.
• Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun, and R. Wang,
Discovery of causal rules using partial association.
ICDM, 2012
• J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining
causal association rules. In Proceedings of ICDM
Workshop on Causal Discovery (CD), 2013.
Problem
Discover causal rules from large databases of binary variables
A
B
C
D
E
F
Y
#repeats
1
1
1
1
1
1
1
14
1
0
1
1
1
1
1
8
1
1
0
1
0
1
1
15
0
1
1
1
1
1
1
8
0
1
0
0
0
0
0
5
0
0
0
0
1
0
1
6
1
0
0
0
0
1
0
4
1
0
1
1
1
0
0
3
0
1
0
1
1
0
0
3
0
1
0
0
1
0
0
5
AY
CY
BF  Y
DE  Y
Partial association test
K
K
I
I
J
J
I
K
J
PA(I, J, K) ³ ca2 Nonzero partial association
PA(I, J, K ) =
(| å
k
n11k n00k - n10k n01k 1 2
|- )
n××k
2
n1×k n.1k n0×k n×0k
å n2 (n -1)
××k
××k
k
M. W. Birch, 1964.
Partial association test – an example
A
B
C
D
E
F
Y
G
#repeat
1
1
1
1
1
1
1
0
14
1
0
1
1
1
1
1
0
8
1
1
0
1
0
1
1
0
15
0
1
1
1
1
1
1
0
8
0
1
0
0
0
0
0
0
5
0
0
0
0
1
0
1
0
6
1
0
0
0
0
1
0
0
4
1
0
1
1
1
0
0
0
3
1
1
1
1
0
1
1
1
3
0
1
0
0
1
0
0
0
5
n11k n00k  n10k n01k 14  3  0  8

 1.68
nk
25
4. Partial association test.
PA( X , Y , K ) 
(| 
k
n11k n00k  n10k n01k 1 2
| )
nk
2
n1k n.1k n0k n0 k
k n2 (n 1)
k  k
PA( BF , Y , ACDE )
n1k n.1k n0k n0 k 14  22 11 3

 0.6776
n2k (nk  1)
252 (25  1)
Fast partial association test
PA(I, J, K ) =
(| å
k
n11k n00 k - n10 k n01k
1
| - )2
n××k
2
n n.1k n0×k n×0 k
å n1×k2 (n
××k
××k -1)
k
• K denotes all possible variable combinations,
the number is very large.
• Counting the frequencies of the combinations
is also time consuming.
• Our solution:
– Sort data and count frequencies of the
equivalence classes.
– Only use the combinations existing in the data set.
Pruning strategies
Definition (Redundant causal rules):
Assume that X⊂ W, if X → Y is a causal rule,
rule W → Y is redundant as it does not provide
new information.
Definition (Condition for testing causal rules):
We only test a combined causal rule XV → Y if
X and Y have a zero association and V and Y
have a zero association (cannot pass the quisquare test in step 3).
A
B
D
E
1
1
1
0
1
1
0
1
1
1
1
1
0
1
0
0
0
0
0
0
0
1
0
0
0
xG
F
Algorithm
Y
#repeats
1
14
1
8
1
15
0
1
8
0
0
0
5
1
0
0
1
6
0
1
0
0
4
0
0
3
1
zero
0
1
1
1 0
association
1
1
1
1 1
1
0
3
0
1
0
0
5
1
C
1
1
1 1 0
positive
1
1
1 1 0
association
0
1
0 1 0
0
0
1
0
Y=1
Y=0
Total
11
12
1.
X=0
n21
n22
n2.
Total
n.1
n.2
n
2
PA
(
X
,
Y
,
K
)


X=1
n
n
n


2
X ,Y

i  2, j  2

i 1, j 1
(nij  E (nij ))
E (nij )
2
1. Prune the variable set (support)
2. Create the contingency table
for each variable X
 X2 , Y
3. Calculate the
• If  X2 , Y  2 go to next step
2
2
• If  X , Y   move X to a set N
4. Partial association test.
• If PA(X, Y, K) is nonzero
then XY is a causal rule.
5. Repeat 1-4 for each variable
which is the combination of
variables in set N
Experimental evaluations
• We use the Arrhythmia data set in UCI machine learning
repository.
– We need to classify the presence and absence of cardiac arrhythmia.
The data set contains 452 records and each record obtains 279 data
attributes and one class attribute
• Our results are quite consistent with the results from CCC
method.
• Some rules in CCC are removed by our method as they cannot
pass the partial association test.
• Our method can discover the combined rules. CCC and CCU
methods are not set to discover these rules.
Comparison with CCC and CCU
Experimental evaluations
Figure 1: Extraction Time Comparison (20K Records)
Figure 1: Extraction Time Comparison (100K Records)
Summary 1
• Simpson paradox
– Associations might be inconsistent in subsets
• Partial association test
– Test the persistency of associations in all possible
partitions.
– Statistically sound.
– Efficiency in sparse data.
• What else?
Cohort study 1
Defined population
Expose
Have
a disease
Not have
a disease
Not expose
Have a
disease
• Prospective: follow up.
• Retrospective: look back. Historic study.
Not have
a disease
Cohort study 2
• Cohorts: share common characteristics but
exposed or not exposed.
• Determine how the exposure causes an
outcome.
• Measure: odds ratio = (a/b) / (c/d)
Diseased Healthy
Exposed
a
b
Not exposed
c
d
Limitations of cohort study
• Need to know a hypothesis beforehand
• Domain experts determine the control
variables.
• Collect data and test the hypothesis.
• Not for data exploration.
• We need
– Given a data set without any hypotheses.
– An automatic method to find and validate
hypotheses.
– For data exploration.
Control variables
Outcome
Cause
Other factors
• If we do not control covariates (especially those
correlated to the outcome), we could not
determine the true cause.
• Too many control variables result too few
matched cases in data.
– How many people with the same race, gender, blood
type, hair colour, eye colour, education level, ….
• Irrelevant variables should not be controlled.
– Eye colour may not relevant to the study.
Matches
• Exact matching
– Exact matches on all covariates. Infeasible.
• Limited exact matching
– Exact matches on a few key covariates.
• Nearest neighbour matching
– Find the closest neighbours
• Propensity score matching
– Based on the predicted effect of a treatment of
covariates.
Method1
Discover causal association rules from large databases of
binary variables
AY
A
B
C
D
E
F
Y
1
1
1
1
1
1
1
A
B
C
D
E
F
Y
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
1
0
1
0
1
1
1
0
1
1
1
1
1
1
1
1
0
1
0
1
0
0
1
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
1
0
1
1
0
0
0
0
1
0
0
1
1
1
1
1
0
1
0
1
1
1
0
0
0
0
1
0
1
1
0
0
1
0
1
1
0
0
0
1
0
1
0
1
1
0
1
0
0
1
0
0
0
0
1
0
1
0
1
Fair dataset
Methods
• A: Exposure variable
Fair dataset
A B C D E F Y
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
0
1
0
1
0
1
0
1
0
1
0 0
• {B,C,D,E,F}: controlled variable set.
• Rows with the same color for the
controlled variable set are called
matched record pairs.
A=0
0
1
1
1
1
1
0
0
0
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
A=1
Y=1
Y=0
Y=1
n11
n12
Y=0
n21
n22
n12
OddsRatioD f (A ® Y ) =
n21
• An association rule A  Y is a causal association rule if:
OddsRatioD f ( A  Y )  1
Algorithm
A
B
C
D
E
xF
1
1
1
1
1
1
…
…
1
1
x
G
Y
0
1
A
B
C
D
E
Y
1
1
1
1
1
1
…
…
0
1
…
…
0
1
0
1
0
1
…
1
1
1
0
…
1. Remove irrelevant variables (support, local support, association)
For each association rule (e. g. A  Y)
2. Find the exclusive variables of the exposure variable
(support, association), i.e. G, F.
The controlled variable set = {B, C, D, E}.
3. Find the fair dataset. Search for all matched record pairs
4. Calculate the odds-ratio to identify if the testing rule is causal
5. Repeat 2-4 for each variable which is the combination of
28
variables. Only consider combination
of non-causal factors.
Experimental evaluations
Experimental evaluations
CAR
Figure 1: Extraction Time Comparison (20K Records)
CCC
CCU
Experimental evaluations
Causality – Judea Pearl
X1
X2
5.2
…
Xn-1
Xn
7.5
6.5
5.2
5.6
7.2
6.6
5.3
…
…
…
…
5.4
7.1
7.1
5.7
5.7
6.9
6.9
5.8
…
+1
+0.8
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
32
Methods
• IDA
– Maathuis, H. M.,
Colombo, D., Kalisch,
M., and Buhlmann, P.
(2010). Predicting
causal effects in largescale systems from
observational data.
Nature Methods, 7(4),
247–249.
33
Conclusions
• Association analysis has been widely used in data mining,
but associations do not indicate causal relationships.
• Association rule mining can be adapted for causal
relationship discovery by combining some statistical
methods.
– Partial association test
– Cohort study
• They are efficient alternatives for causal Bayesian
network based methods.
• They are capable of finding combined causal factors.
Discussions
• Causality and classification
– Estimate prob (Y| do X) instead of prob (Y|X).
• Feature section versus controlled variable
selection.
• Evaluation of causes.
– Not classification accuracy
– Bayesian networks??
Research Collaborators
•
•
•
•
•
Jixue Liu
Lin Liu
Thuc Le
Jin Zhou
Bin-yu Sun
Thank you for listening
Questions please ??