Brian D. - ShinyVerse
Download
Report
Transcript Brian D. - ShinyVerse
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
What is Mutual Information?
• Essential to Probability and Information Theory
• MI is concerned with quantifying the
independence of two variables
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
What is Mutual Information?
• MI measures the amount of information in
variable x that is shared by variable y
• MI quantifies the distance between the joint
distribution of x and y
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
When is MI important?
• Suppose we know y. If x contains no shared
information with y, then the variables are totally
independent
– Mutual Information: 0
– Entropy of x is very high
– However x is not important since it’s not
informative about y
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
When is MI important?
• Again we know y, but this time all the
information conveyed in x is also conveyed in y
– Mutual Information: 100
– Nothing surprising about x, so entropy is very low
– x not important because we could simply study y
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
When is MI important?
MI is important (and powerful) when
two variables are not independent and
are not identical in the information
they convey
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
Why Apply MI?
• If mutual information is maximized
(dependencies increased), conditional entropy
can be minimized
• Reducing conditional entropy makes the
behavior of random variables more predictable
because their values are more dependent on one
another
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
MI Applications
• Discriminative training procedures for hidden
Markov models have been proposed based on
the maximum mutual information (MMI)
criterion.
– Hidden parameters predicted from known
– Applicable to speech recognition, character
recognition, natural language processing
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
MI Applications
• Mutual information is often used as a
significance function for the computation of
collocations in corpus linguistics.
– Essential to coherent speak
– Easy for humans, hard to artificial systems
– MI has been shown improve connections in AI
systems
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
MI Applications
• Mutual information is used in medical imaging
for image registration.
– Given a reference image (for example, a brain scan),
and a second image which needs to be put the same
coordinate system as the reference image, this image
is deformed until the mutual information between it
and the reference image is maximized.
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
MI Applications
• Mutual information has been used as a criterion for
feature selection and feature transformations in
machine learning and agent-based learning.
– Using MI criteria, it was found that the more input variables
available, the lower the conditional entropy become
– MI-based criteria could effectively select features AND
roughly estimate optimal feature subsets, classic problems in
feature selection
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05
References
• Huang, D., & Chow, T.W.S. (2003). Searching optimal feature subset using
mutual information. Proceedings of the 2003 International Symposium on Artificial
Neural Networks (pp. 161-166). Bruges, Belgium.
• Battiti, R. (1994) Using mutual information for selecting features in
supervised neural net learning. Neural Networks, 5, 537-550
• Bonnlander, B., & Weigend, A.S. (1994). Selecting input variables using
mutual information and nonparametric density estimations. Proceedings of the
1994 International Symposium on Artificial Neural Networks (pp. 42-50). Tainan,
Taiwan.
• Wikipedia entries on “Mutual Information”, “Probability Theory”, and
“Information Theory”
Mutual Information
Brian Dils
I590 – ALife/AI
02.28.05