NL Zhang - Department of Computer Science and Engineering

Download Report

Transcript NL Zhang - Department of Computer Science and Engineering

CCF贝叶斯网络在中国的应用和发展学术沙龙
香港科技大学
BN理论研究和应用的情况
2012-05-22
Page 2
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making, Book
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
Page 3
Bayesian Networks
Page 4
Variable Elimination


Papers:

N. L. Zhang and D. Poole (1994), A simple approach to Bayesian network
computations, in Proc. of the 10th Canadian Conference on Artificial Intelligence,
Banff, Alberta, Canada, May 16-22.

N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian
network inference,Journal of Artificial Intelligence Research, 5: 301-328.
Idea
Page 5
Variable Elimination
Page 6
Variable Elimination
Page 7
Variable Elimination


First BN inference algorithm in
Russell & Norvig wrote on page 529:


Koller and Friedman wrote on page:


“The algorithm we describe is closest to that developed by Zhang and
Poole (1994, 1996)”
“… the variable elimination algorithm, as presented here, first described
by Zhang and Poole (1994), …”
The K&F book cites 7 of our papers
Page 8
Local Structure
Page 9
Local Structures: Causal Independence

Papers:

N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian
network inference,Journal of Artificial Intelligence Research, 5: 301-328.

N. L. Zhang and D. Poole (1994), Intercausal independence and heterogeneous
factorization,i in Proc. of the 10th Conference on Uncertainties in Artificial
Intelligence., Seattle, USA, July 29-31
Page 10
Local Structures: Causal Independence
Page 11
Local Structure: Context Specific Independence

Papers:

N. L. Zhang and D. Poole (1999), On the role of context-specific independence in
Probabilistic Reasoning IJCAI-99, 1288-1293.

D. Poole and N. L. Zhang (2003). Exploiting contextual independence in
probablisitic inference. Journal of Artificial Intelligence Research, 18: 263-313.
Page 12
Other Works

Parameter Learning


N. L. Zhang (1996), Irrelevance and parameter learning in Bayesian
networks, Artificial Intelligence, An International Journal, 88: 359-373.
Decision Making

N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams,
Computational Intelligence , 14(4): 475-497.

N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision
networks, International Journal of Approximate Reasoning, 1994, 11 (2):
83-158. PhD Thesis
Page 13
Other Works
Page 14
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
Page 15
Latent Tree Models: Overview

Concept first mentioned by Pearl 1988

We are the first one to conduct systematic research on LTMs.


N. L. Zhang (2002). Hierarchical latent class models for cluster analysis.
AAAI-02, 230-237.

N. L. Zhang (2004). Hierarchical latent class models for cluster analysis.
Journal of Machine Learning Research, 5(6):697--723, 2004.
Earlier Followers:


Aarlborg U of Denmark, Norwegian University of Science and
Technology
Recent papers from:

MIT, CMU, USC, Goergia Tech, Edinburgh
Page 16
Latent Tree Models

Recent survey by French researcher:
Latent Tree Models (LTM)


Bayesian networks with

Rooted tree structure

Discrete random variables

Leaves observed (manifest variables)

Internal nodes latent (latent variables)
Also known as hierarchical latent class (HLC)
models, HLC models
P(Y1),
P(Y2|Y1),
P(X1|Y2), P(X2|Y2),
…
Page 18
Example

Manifest variables


Math Grade, Science Grade, Literature Grade, History Grade
Latent variables

Analytic Skill, Literal Skill, Intelligence
Theory: Root Walking and Model Equivalence

M1: root walks to X2;
M2: root walks to X3

Root walking leads to equivalent models on manifest variables

Implications:

Cannot determine edge orientation from data

Can only learn unrooted models
Regularity

Regular latent tree models: For any latent node Z with neighbors X1, X2,
…,


Can focus on regular models only

Irregular models can be made regular

Regularized models better than irregular models
The set of all such models is finite.
Page 21
Effective Dimension

Standard dimension:



Number of free parameters
Effective dimension

X1, X2, …, Xn: observed variables

P(X1, X2, …, Xn) is a point in a high-D space for each value of the
parameter

Spans a manifold as parameter value varies.

Effective dimension: dimension of the manifold.
Parsimonious model:

Standard dimension = effective dimension

Open question: How to test parsimony?
Page 22
Effective Dimension

Paper:


N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical
latent class models. Journal of Artificial Intelligence Research, 21: 1-17.
Open question: Effective of LTM with one latent variable
Learning Latent Tree Models
Determine

Number of latent variables

Cardinality of each latent variable

Model Structure

Conditional probability distributions
Search-Based Learning: Model Selection

Bayesian score: posterior probability P(m|D)


P(m|D) = P(m)∫ P(D|m, θ) d θ/ P(D)
BIC Score: large sample approximation
BIC(m|D) = log P(D|m, θ*) – d logN/2

BICe Score:
BICe(m|D) = log P(D|m, θ*) – de logN/2
effective dimension de.

Effective dimensions are difficult to compute

BICe not realistic
Search Algorithms


Papers:

T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based
multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 22462269.

N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class
Models. ICTAI-2004
Double hill climbing (DHC), 2002


Single hill climbing (SHC), 2004


50 manifest variables
EAST, 2011


12 manifest variables
Heuristic SHC (HSHC), 2004


7 manifest variables.
100+ manifest variables
Recent fast algorithm for specific applications.
Illustration of the search process
Page 27
Algorithm by Others


Variable clustering method

S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011).
IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 10871097.

Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical
Bayesian network approach for linkage disequilibrium modeling and datadimensionality reduction prior to genome-wide association studies. BMC
Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16.

Fast, model quality may be poor
Adaptation of Evolution Tree Algorithms

Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky
(2011). Learning latent tree graphical models. Journal of Machine Learning
Research 1 (2011) 1-48.

Fast, has consistence proof, for special LTMs only
Page 28
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
Page 29
Density Estimation

Characteristics of LTMs

Are computationally very simple to work with.

Can represent complex relationships among manifest variables.

Useful tool for density estimation.
Page 30
Density Estimation

New approximate inference algorithm for Bayesian networks (Wang,
Zhang and Chen, AAAI 08, Exceptional Paper)
Sample
sparse
sparse
LTAB Algo
dense
dense
Page 31
Multidimensional Clustering

Paper:


T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional
clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269.
Cluster Analysis

Grouping of objects into clusters so that objects in the same cluster
are similar in some sense
Page 32
How to Cluster Those?
Page 33
How to Cluster Those?
Style of picture
Page 34
How to Cluster Those?
Type of object in picture
Page 35
How to Cluster Those?
Multidimensional clustering / Multi-Clustering

How to partition data in multiple ways?

Latent tree models
Latent Tree Models & Multidimensional Clustering

Model relationship between

Observed / Manifest variables
 Math Grade, Science Grade, Literature Grade, History Grade

Latent variables
 Analytic Skill, Literal Skill, Intelligence

Each latent variable gives a partition

Intelligence: Low, medium, high

Analytic skill: Low, medium, high
ICAC Data
// 31 variables, 1200 samples
C_City:
s0 s1 s2 s3
C_Gov:
s0 s1 s2 s3
C_Bus:
s0 s1 s2 s3
Tolerance_C_Gov:
s0 s1 s2 s3
Tolerance_C_Bus:
s0 s1 s2 s3
WillingReport_C:
s0 s1 s2
LeaveContactInfo:
s0 s1
I_EncourageReport: s0 s1 s2 s3 s4
I_Effectiveness:
s0 s1 s2 s3 s4
I_Deterrence:
s0 s1 s2 s3 s4
…..
// very common, quit common, uncommon, ..
//totally intolerable, intolerable, tolerable,...
// yes, no, depends
// yes, no
// very sufficient, sufficient, average, ...
//very e, e, a, in-e, very in-e
// very sufficient, sufficient, average, ...
-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0
-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0
-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0
….
Latent Structure Discovery
Y2: Demographic info; Y3: Tolerance toward corruption
Y4: ICAC performance; Y7: ICAC accountability
Y5: Change in level of corruption; Y6: Level of corruption
Multidimensional Clustering
Y2=s0: Low income youngsters; Y2=s1: Women with no/low income
Y2=s2: people with good education and good income;
Y2=s3: people with poor education and average income
Multidimensional Clustering
Y3=s0: people who find corruption totally intolerable; 57%
Y3=s1: people who find corruption intolerable; 27%
Y3=s2: people who find corruption tolerable; 15%
Interesting finding:
Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus
Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus
Y3=s0: Same attitude toward C-Gov and C-Bus
People who are tough on corruption are equally tough toward C-Gov and C-Bus.
People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv
Multidimensional Clustering
Interesting finding: Relationship btw background and tolerance toward corruption
Y2=s2: ( good education and good income) the least tolerant. 4% tolerable
Y2=s3: (poor education and average income) the most tolerant. 32% tolerable
The other two classes are in between.
Page 42
Marketing Data
Page 43
Latent Tree Analysis of Text Data

The WebKB Data Set


1041 web pages collected from 4 CS departments in 1997
336 words
Page 44
Latent Tree Model for WebKB Data by BI Algorithm
89 latent variables
Latent Tree Modes for WebKB Data
Page 46
Page 47
Page 48
LTM for Topic Detection


Topic

A latent state

A collection of document
A document can belong to multiple topics 100%
Page 49
LTM vs LDA for Topic Detection

LTM

Topic
 A latent state
 A collection of document


A document can belong to multiple topics 100%
LDA

Topic:
 Distribution over the entire vocabulary.
 The probabilities of the words add to one.

Document:
 Distribution over topics.
 If a document contains more of one topic, then it contains less of other
topics.
Page 50
Latent Tree Analysis Summary

Finds meaningful facets of data

Identify natural clusters along each facet.

Gives clear picture of what is in data.
Page 51
LTM for Spectral Clustering

Original Data Set

Eigenvectors of Laplacian Matrix

Rounding: Eigenvectors to final partition
Page 52
LTM for Spectral Clustering


Rounding:

Determine number of clusters

Determine the final partition

No good method available
LTM Method:
Page 53
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
Page 54
LTM and TCM

Papers

N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and
diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229245. Took 8 years

N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of
TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):5837. (Featured at TCM Wiki).

张连文, 袁世宏,王天芳, 赵燕等. 隐结构分析与西医疾病的辨证分型(I): 基
本原理. 世界科学技术---中医药现代化, 13卷(3期): 498~502, 2011.

张连文, 许朝霞,王忆勤,刘腾飞等. 隐结构分析与西医疾病的辨证分型(II):
综合聚类. 世界科学技术---中医药现代化, 14卷(2期), 2012.
Page 55
LTM and TCM: Objectives

Statistical validation of TCM postulates
[Review of a recent paper]
I am very interested in what these authors are trying to do. They are dealing
with an important epistemological problem.
To go from the many symptoms and signs that patients present, to construct a
consistent and other-observer identifiable constellation, is a core task of the medical
practitioner. A kind of feedback occurs between what a practitioner is taught/finds listed
in books, and what that practitioner encounters in the clinic. The better the constellation
is understood, the more accurate the clustering of symptoms, the more consistent is
the identification of syndromes among practitioners and through time. While these
constellations have been worked into widely-accepted ‘disease constructs’ for
biomedicine for some time which are widely accepted as ‘real,’ this is not quite as true
for TCM constellations. This latent variable study is interesting not only in itself, but also
as providing evidence that what TCM ‘says’ is so, shows up during analysis as
demonstrably so.
Page 56
LTM and TCM: Objectives


TCM postulates explain occurrence of Symptoms:

When KIDNEY YANG is in deficiency, it cannot warm the body and the patient feels
cold, resulting in intolerance to cold, cold limbs, …

Manifest variables:Directly observed: Feel cold, cold limbs

Latent variable: Not directly observed: Kidney Yang deficiency

Latent Structure: Relationships between latent variables and manifest variables
Statistical validation of TCM postulates
Page 57
Latent Tree Analysis of Symptom Data


Similar to WebKB data

Web page containing

Patient
having
words
symptoms
What will be the result of latent tree analysis?

Different facets of data revealed

Natural clusters along each facet identified

Each facet involves a few symptoms



May correspond to a syndrome
Providing validation to TCM postulates
Providing evidence for syndrome differentiation
Page 58
Latent Tree Model for Kidney Data
Latent structure matches relevant TCM postulate

Providing validation to TCM postulate
Page 59
Latent Tree Model for Kidney Data

Work reported in


N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in
traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245.
Email from:

Bridie Andrews: Bentley University, Boston

Dominique Haughton: ditto, Fellow of American Statistics Association

Lisa Conboy: Harvard Medical School
“We are very interested in your paper on “Latent tree models and
diagnosis in traditional Chinese medicine”, and are planning to repeat
your method using some data we have here on about 270 cases of
“irritable bowel syndrome” and their differing TCM diagnoses.”
Page 60
Results on Many Data Sets from 973 Project
Page 61
Providing Evidence of Syndrome Differentiation
Page 62
Providing Evidence of Syndrome Differentiation

How to produce evidence for TCM syndrome diagnosis using latent
structure analysis?
Page 63
Providing Evidence of Syndrome Differentiation

Imagine sub-typing WM disease D from TCM perspective


Expected conclusion:several syndromes among D patients
Also providing a basis for distinguishing syndrome Z patients from
other D patients
Page 64
Picture 2
Page 65
Example: Model for Depression Data
Page 66
Example: Model for Depression Data

Evidence provided by Y8 for syndrome classification:

Two classes: 有胸膈气机不畅, 无胸膈气机不畅

Sizes of the classes: 48%,52%;

Symptoms important for distinguishing between the two classes
(descending order of importance): 憋气、气短、胸闷 and 太息. Others
play little role
Page 67
Latent Tree Analysis of Prescription Data

Data

Guanganman Hospital

1287 formulae prescribed for patients with Disharmony between Liver and
Spleen
Page 68
Latent Tree Model
Page 69
Some Partitions Obtained
Page 70
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
Page 71
Pouch Latent Tree Models (PLTMs)

Probabilistic graphical model with continuous observed variables (X’s)
and discrete latent variables (Y’s).

Tree structure Bayesian network except several observed variables
can appear in the same node, a pouch.

P(X1, X2|Y2), P(X3|Y2), …., P(X7, X8, X9|X4): Gaussian distributions
In general:

P(Y2|Y1), …., P(Y4|Y1), P(Y1): Multinomial distributions
In general:
Page 72
Pouch Latent Tree Models (PLTM)
One possible PLTM for the transcript data


PLTM generalizes Gaussian Mixture Model (GMM), which is PLTM with a
single pouch and single latent variable
Page 73
The UCI Image Data

Each instance represents 3x3 pixel region of an image


with 18 attributes, labeled.
Class labels were first removed and remaining data analyzed using
PLTM.

Pouches capture natural facets well:
 From left to right: Line-Density, Edge, Color, Coordinates

Latent variables represent clusterings
Page 74
The UCI Image Data

Feature curve: Normalized
MI between a latent
variables and attributes

Y2 represents a partition
along line-density facet

Y1 represents a partition
along color facet

Interesting finding: Y1
strongly correlated with
centroid.row

Y1 matches true class partition well.

Y3: partition based on edge and color
Y4: partition based on centroid.col

Page 75
Overview


Early Work (1992-2002)

Inference: Variable Elimination

Inference: Local Structures

Others: Learning, Decision Making
Latent Tree Models (2000 - )

Theory and Algorithms

Applications
 Multidimensional Clustering, Density Estimation, Latent Structure
 Survey Data, Documents, Business Data

Traditional Chinese Medicine (TCM)

Extensions
谢谢!