Transcript Document

Computational Intelligence:
Methods and Applications
Lecture 5
EDA and linear transformations.
Włodzisław Duch
Dept. of Informatics, UMK
Google: W Duch
Chernoff faces
Humans have specialized brain
areas for face recognition.
For d < 20 represent each
feature by changing some face
elements.
Interesting applets:
http://www.cs.uchicago.edu/~wiseman/chernoff/
http://www.cs.unm.edu/~dlchao/flake/chernoff/ (Chernoff park)
http://kspark.kaist.ac.kr/Human%20Engineering.files/Chernoff/Chernoff%
20Faces.htm
Fish view
Other shapes may
also be used to
visualized data, for
example fish.
Ring visualization (SunBurst)
Show tree-like hierarchical representation in form of rings.
Other EDA techniques
NIST Engineering Statistics Handbook has a chapter on
exploratory data analysis (EDA).
http://www.itl.nist.gov/div898/handbook/index.htm
Unfortunately many visualization programs are written for X-Windows
only, are in Fortran, or S or R languages.
Sonification: data converted to sounds!
Example: sound of EEG data.
Java Voice
Think about potential applications! More: http://sonification.de/
http://en.wikipedia.org/wiki/Sonification
CI approach to visualization
Scatterograms: project all data on two features.
Find more interesting directions to create projections.
Linear projections:
•
•
•
Principal Component Analysis,
Discriminant Component Analysis,
Projection Pursuit – “define interesting” projections.
Non-linear methods – more advanced, some will appear later.
Statistical methods: multidimensional scaling.
Neural methods: competitive learning, Self-Organizing Maps.
Kernel methods, principal curves and surfaces.
Information-theoretic methods.
Distances in feature spaces
Data vector, d-dimensions XT = (X1, ... Xd), YT = (Y1, ... Yd)
Distance, or metric function, is a 2-argument function that satisfies:
d  X, Y   X  Y  0; d  X, Y   d  Y, X 
d  X, Y   d  X , Z   d  Z, Y 
Distance functions measure (dis)similarity.
Popular distance functions:
Euclidean distance (L2 norm)
d
Manhattan (city-block) distance
(L1 norm)
1/ 2

2
X  Y 2     X i  Yi  
 i 1

d
X  Y 1   X i  Yi
i 1
Two metric functions
Equidistant points in 2D:
XP
Euclidean case: circle or sphere
L
 YP
L
Manhattan case: square
X2
X2
isotropic
non-isotropic
X1
X1
Identical distance between two points X, Y: imagine that in 10 D !
X
X
Y
Y
All points in the shaded area have the same
Manhattan distance to X and Y!
Linear transformations
2D vectors X in a unit circle with mean (1,1); Y = A*X, A = 2x2 matrix
 Y1   a11 a12   X 1 


Y    a

 2   21 a22   X 2 
The shape and the mean of data distribution is changed.
Scaling (diagonal aii elements); rotation (off-diag), mirror reflection.
Distances between vectors are not invariant: ||Y1-Y2||≠||X1-X2||
Invariant distances
Euclidean distance is not invariant to linear transformations Y = A*X,
scaling of units has strong influence on distances.
How to select scaling/rotations for simplest description of data?
1
Y Y
 2
2



  X   X   A A  X   X  
1
 Y Y
 2
1
2
T
T
Y1  Y  2
T
1
2
Orthonormal matrices: ATA = I, are inducing rigid rotations.
To achieve full invariance requires therefore standardization of data
(scaling invariance) and should use covariance matrix.
Mahalanobis metric will replace ATA by inverse of the covariance matrix.
Data standardization
For each vector component X(j)T=(X1(j), ... Xd(j)), j=1 .. n
calculate mean and std: n – number of vectors, d – their dimension
1 n ( j)
1 n ( j)
Xi   X i ; X  X
n j 1
n j 1
X1
X2
X (1)
X 1(1)
(1)
X2
X (2)
X 1(2)
(2)
X2
X( n )
X 1( n )
(n)
X2
Xd
X d(1)
X d(2)
X d( n )
Vector of mean
feature values.
Averages over
rows.
Standard deviation
Calculate standard deviation:
1 n ( j)
Xi   X i
n j 1

1 n
( j)
i 
X
 Xi

i
n  1 j 1
2
Vector of mean feature values.

2
Variance = square of standard
deviation (std), sum of all
deviations from the mean value.
Why n1, not n ? If true mean was known it should be n, but if the mean
is calculated the formula with n1 converges to the true variance!
Transform X => Z, standardized data vectors:
Zi( j )   X i( j )  X i   i
Standardized data
Std data: zero mean and unit variance.


1 n ( j) 1 n
Zi   Z i   X i( j )  X i  i  0
n j 1
n j 1

2
Z ,i

1 n
( j)

Z
 Zi

i
n  1 j 1

2

1 n
( j)

X
 Xi

i
n  1 j 1

2
 i2  1
Standardize data after making data transformation.
Effect: data is invariant to scaling only; for diagonal transformations
distances after standardization are invariant, are based on identical units.
Note: it does not mean that all data models are performing better!
How to make data invariant to any linear transformations?
Std example
Before std
Mean and std are shown
using a colored bar;
minimum and max values
may extend outside.
Some features (ex. yellow),
have large values; some
(ex: gray) have small
values; this may depend on
units used to measure them.
After std
Standardized data have all
mean 0 and =1, thus
contribution from different
features to similarity or
distance calculation is
comparable.