CI: Methods and applications - Research Group of Vision and Image

Download Report

Transcript CI: Methods and applications - Research Group of Vision and Image

Computational Intelligence:
Methods and Applications
Lecture 1
Organization and overview
Source: Włodzisław Duch; Dept. of Informatics, UMK; Google: W Duch
What is it all about?
•
Many engineering and scientific problems may be solved by
using numerical algorithms; theory is known, equations
formulated, either analytical or numerical solutions are required.
Any examples?
Such problems require high-performance computing, but no
intelligence: just press the key and wait for an answer.
•
Other problems may be easily formulated, but all algorithms
solving them may be NP hard, requiring almost infinite amount of
computations to solve complex cases.
Any examples?
•
Yet other problems have no algorithms at all!
Any examples?
•
Problems, for which effective algorithms cannot be formulated
require intelligence to solve them.
What this course covers
This is a list of topics covered:
•
Computational Intelligence overview, sources of inspiration, types of
adaptive (learning) systems, types of applications. (~2 h)
•
Visualization and exploratory data analysis:
few variables, direct visualization, Principal Component Analysis
(PCA), Multidimensional Scaling (MDS), Self-Organized Mappings
(SOM), parallel coordinates and other visualization algorithms.(~9 h)
•
Theory: overview of statistical approaches to learning, bias-variance
decomposition, expectation maximization algorithm, model
selection, evaluation of results, ROC curves. (~5 h)
What this course covers (cont)
•
Introduction to Yale/WEKA and GhostMiner software packages,
presentation of algorithms available in these packages (~5 h)
•
Statistical algorithms: discriminant analysis - linear (LDA),
Fisher (FDA), regularized (RDA), probabilistic data modeling,
kernel methods (~5 h)
•
Density estimation, expectation maximization, RBF and SFN
networks, and rule induction (~4 h)
•
Similarity based methods, generation of prototypes, similarity
functions, separability criteria (~2 h)
•
Improving CI models: boosting, stacking, ensemble learning,
meta-learning, using information theory and other approaches
for selection of relevant features (~6 h)
Personal information
Name: Agus Zainal Arifin
Head of the:
Laboratory of Vision and Image Processing,
Department of Informatics,
Faculty of Information Technology
Institut Teknologi Sepuluh Nopember (ITS) Surabaya
Detailed information, CV, papers and lecture notes, project
descriptions, photos from many conferences etc, are at WWW:
Course: http://viplab.if.its.ac.id/ci
Personal: http://agusza.if.its.ac.id/
This course page is linked to VIP Lab.
Email: at my webpage.
Recommended books
3 best books covering foundations and various aspects of CI
(with strong statistical bias)
•
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification
(2nd Edition), J Wiley 2000
•
Amit Konar, Computational Intelligence. Principles,
Techniques and Applications. Springer 2005.
New book covering many advanced CI subject.
•
T. Hastie, R. Tibshirani, J. Friedman, The Elements of
Statistical Learning. Springer 2001.
Other useful books
•
•
A. Webb, Statistical Pattern Recognition. Wiley, 2-nd ed. 2002
V. Kecman, Learning and soft computing, MIT Press 2001 Good
intro book on neural, SVM and fuzzy subjects, detailed
explanations, many problems.
•
D. Hand, H. Mannila, P. Smyth, Principles of Data Mining, MIT
Press 2001
•
I.H. Witten, E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations. Morgan
Kaufmann 1999
WEKA intro, but algorithm description is very sketchy.
•
J. P. Marques De Sa, Pattern Recognition: Concepts, Methods,
and Applications. Springer 2001.
Small book, but useful overview.
•
Tom Soukup and Ian Davidson, Visual Data Mining: Techniques
and Tools for Data Visualization and Mining, 2002
•
C. Bishop, Pattern Recognition and Machine Learning. CUP 2006
Schedule per Week (3 sks)
1.
2.
3.
4.
5.
6.
7.
8.
9.
Lecture 1, 2, 3
Lecture 4, 5, 6
Lecture 7, 8, 9
Evaluation 1
Lecture 10, 11, 12
Lecture 13, 14, 15
Lecture 16, 17, 18
Evaluation 2
Lecture 19, 20, 21
10. Lecture 22, 23, 24
11. Lecture 25, 26, 27
12. Evaluation 3
13. Lecture 28, 29, 30
14. Lecture 31, 32, 33
15. Lecture 34, 35, 36
16. Evaluation 4
17. Lecture 37, 38, Summary
18. Final Project
Evaluation
•
•
Work: Personal or group
Type: Quiz (limited open), presentation, discussion, paper
summarization and/or arrangement, final project
8
Computational Intelligence:
Methods and Applications
Lecture 2
Problems requiring CI
Source: Włodzisław Duch; Dept. of Informatics, UMK;
Google: W Duch
Some non-algorithmizable problems
•
•
•
•
•
•
•
•
Understanding meaning of sentences (queries), all problems
related to natural language analysis.
Perception: recognition of signals, phoneme recognition, olfactory
signals – first step in robotics.
Visual perception: face recognition, object recognition and many
computer vision problems.
Hand-written characters recognition for PDAs or security.
Control and planning problems in robotics and control of nonlinear complex systems with many degrees of freedom.
Medical diagnostics, interpretation of medical images and
biomedical signals (EEG, ECG ...), therapy planning.
Playing complex games, like go or strategic war games.
Solving untypical problems, or problems requiring creativity.
CI definition
•
Many bad definitions have been proposed, listing computational
methods used in CI; my working technical definition of CI is:
Computational Intelligence (CI) is a branch of science dealing with
problems that cannot be solved using effective computational
algorithms (this does not mean that they cannot be solved using
computations!).
•
•
•
MIT Encyclopedia of Cognitive Sciences includes it among 6 main
branches of cognitive sciences, aiming at understanding (natural
science perspective), and creating (engineering perspective),
animal-like (and human-like) cognitive systems.
CS include philosophy, psychology, neurosciences, linguistics, CI,
sociobiology (evolutionary perspectives on culture).
In this course only engineering perspective is used.
Relations between CI, AI and CS
Cognitive Sciences includes both low and high-level processes:
Artificial Intelligence (AI) is part of CI that:
•
is based on symbolic representation of knowledge;
•
creates expert systems that help to reason;
•
“knowledge engineering” is its most important branch.
•
•
AI: is focused on higher cognitive processes, such as language,
logic, reasoning, thinking, problem solving, sequential actions.
CI: includes also basic sensory signal processing, low-level
cognition, perception and control, senso-motoric behavior.
CI methods may help to discover knowledge hidden in data.
Only a few hybrid CI-AI systems exist, cognitive robotics needs them!
CI and AI conferences and journals have little overlap.
Pattern recognition problems
Classification, categorization, recognition of:
•
•
•
•
•
•
•
•
•
•
•
images of all kinds, medical images;
phonetic structure of speech signals;
insects, birds, animals by their voices;
biomedical signals, such as USG, fMRI;
sonar, radar and other artificial signals;
hand-sign speech and signals;
machine and human intentions and behavioral patterns;
chemical structures, QSAR (Qualitative Structure-Activity
Relationships), rational drug design;
biological structures, protein types, folding;
silhouettes of cars and other vehicles, planes, ships;
astronomical objects, recognition of types of stars and galaxies ...
Data mining problems
Data mining, or KDD, Knowledge Discovery in Data, aims at:
•
•
•
•
•
•
•
•
•
understanding data, not just pattern classification;
finding short explanation of data structure and association rules;
discovering logical rules (classical and fuzzy) in the data;
creating summaries of databases;
Ex: discovering new rules in analytical chemistry, such as NMR
spectroscopy rules;
searching for knowledge in genomic and biological databases;
automatically creating understandable models (theories) of data;
creating causal models of processes, pathway databases in
biological cells or hormone interactions in human body;
using the knowledge discovered to reason in expert systems.
Selection of information
Attention allows to select relevant perceptual signals,
without attention we could not function.
•
•
•
•
•
•
•
finding relevant features in medical, biological, chemical,
astronomical, technical, business and other data;
finding a right person for the job – what is important?
filtering junk mail by automatically discovering rules (~ 9 G$
losses in US + 2.5 G$ in Europe in 2002, according to CNN), ;
aggregation of features in a linear or non-linear way;
reducing dimensionality of the problem locally or globally;
creating new, more relevant features and concepts that are useful
in reasoning;
separation of signals into interesting components, cleaning of
signals, removing noise and artifacts, separating music from noise
and pictures/video from dirt and scratches ...
Information Retrieval
IR and IE (info extraction): indexing documents, library keywords.
•
•
•
•
•
•
•
•
•
semantic information retrieval;
searching in large databases and in the Web;
search engines discovering preferences of users;
automatic annotation of Web documents, or going from
HTML=> XML => DAML to create semantic Web;
document clustering, categorization and visualization of relations
between documents and keywords;
constructing concept spaces from keywords;
latent semantic analysis and other methods of dimensionality
reduction of document search terms;
NLP (Natural Language Processing) parsing + other techniques;
automatic summarization of texts, dialog and chatter bot systems.
Decision support
Intelligent decision support:
•
•
•
•
•
support for business decisions, like pricing, upgrading,
personalization of services;
increasing response rates of advertising campaigns;
support for medical diagnostics, evidence-based medicine;
creating causal networks to explain inferences;
models of customer and shopping behavior or service user
behavior;
Andreas Weigend (www.weigend.com)
“Amazon.com might be the world's largest laboratory to study
human behavior and decision making.”
Control and planning
CI is needed in:
•
•
•
•
•
•
•
•
•
•
quality control in industrial processes;
tuning of complex equipment, automatic camera focus;
driving a car or airplane without human intervention;
controlling chemical factory;
controlling mob behavior;
controlling a robot with many degrees of freedom;
planning robot actions, surviving in hostile environment;
developing toy robots such as AIBO dogs
planning large-scale constructions, such as highways,
skyscrapers, power plants;
optimization of organization structure and functions ...
Detection of regularities
If we do not know what we are looking for:
•
•
•
•
•
•
•
spontaneous, unsupervised learning;
self-organization of feature detectors, automatic creation of
elementary features;
discovery and analysis of “interesting” clusters and structures in
the data;
detection of (ir)regularities in signals, for example in heart ECG;
discovery of genes and other patterns in DNA strings;
discovery of important active sites in proteins;
use of the regularities discovered for construction of new features
and new concepts, in the systematic reasoning.
Other problems
Many other problems require CI techniques
for their solution:
•
•
•
•
•
•
•
•
strategic games, learning from errors and successes;
multi-criterion optimization problems (have no solution);
determining missing features;
discovering errors and outliers in the data;
learning to solve simple sub-problems and generalizing to
complex problems;
combining data from many sources into one model, like consumer
preferences and TV watching behavior;
prediction of time series, stock behavior, Sun activity, weather,
human decisions and intentions;
understanding of human mind, results of psychological
experiments, ways of reasoning, categorization, planning,
learning, recognizing emotions.
CI inspirations
How to solve such problems?
Humans and many animals are quite good in solving them.
Although many specialized methods exist some adaptive
(learning) algorithms that have wide applications were discovered.
Biological inspirations help to formulate initial models.
•
•
•
•
•
Neurobiological inspirations: how brains are solving such problems?
Artificial Neural Networks (ANN) is a large field, many types of networks,
from function mapping to hierarchical, self organizing, modular dynamical
systems providing solutions to many problems.
Some ANN models are close to statistics and pattern recognition.
Some ANN models are inspired by specific brain structures,
cf. CMAC (Cerebellar Model Arithmetic Computer);
SDM (Sparse Distributed Memory); SOM (Self Organizing Maps).
Computational Cognitive Neurosciences aim at models that are faithful to
biology, results are compared with experiments.
Psychological inspirations
• How minds are solving such problems?
• Although mind is brain function, here instead of single neurons
functions of larger brain structures (neural columns, modules,
maps) are considered and used as inspiration.
• Connectionist models: networks of connected nodes, representing
concepts, processing information in parallel distributed way, general
graphical models and Bayesian causal influences networks.
• Learning through chunking, or hierarchical grouping of smaller units
into more meaningful, complex structures replaced by a single
symbol: from lines, curves, to letters, words, phrases and sentences
to meaning; contrast that with neural learning (perceptrons).
Bio-medical inspirations
•
How biology has solved the problem of species survival?
Evolution leads to survival of the fittest, removing the weak.
•
May be used for complex optimization problems: find parameters
P that minimize some cost function E(P).
May help to learn optimal parameters of a complex adaptive
system (such as ANN), but is evolution really the most efficient?
•
•
Swarm algorithms: find optimal solution by creating many initial
parameters {Pi} and moving them in parameter space, using cost
function E(Pi) to determine interactions between “particles” {Pi}.
•
Ant algorithms: interactions of many ants and pheromone traces
helping to find optimal sets of parameters. Useful if the cost
functions changes during the process (moving target problems).
Immunological system: inspirations from medicine.
•
Inspirations from logic
• Modern logic deals with uncertainty in the data.
Crisp logic: yes or no only, {0,1}, good for symbolic problems.
Q: Failed or passed? Teenager or not?
Fuzzy logic: degrees of truth [0,1] modeled by “membership function”.
Q: old or young? So-so? Perhaps 0.6 old and 0.4 young?
Fuzzy = continuous generalization of multi-valued logic.
Rough logic is based on rough sets:
some objects certainly belong to a set, some certainly don’t,
other objects maybe, to a degree determined by the data.
Dempster-Shafer theory of evidence represents our knowledge of the
truth of sets of possibilities. Uncertainty theory is a large field ...
Math and statistics
• Probabilistic methods are the basis for inference and decision
support.
Object (sample) X = {Xi} is a vector of features.
• Naive Bayesian classifiers treat all features as independent and are
used for classification, estimating “a posteriori probabilities” p(C|X),
of assigning an object (sample) X to some category C.
• Bayesian networks include mutual influences of different features Xi
using Bayes formula: p(C,X) = p(C|X) p(X) = p(X|C) p(C)
• Multivariate statistics has devised many classification and
regression (approximation) methods.
• Clusterization and visualization methods also belong to statistics.
Machine learning
and pattern recognition
•
Pattern Recognition (PR) is an engineering field has developed
many classification and approximation methods.
•
Most popular PR methods are based on the k-NN (k Nearest
Neighbors) rule or decision trees, recursively partitioning data into
smaller subsets.
•
Machine learning (ML) is a part of AI, using inductive methods to
find good symbolic rules that classify or approximate data.
ML method usually deal with symbolic data only, covering
samples from different classes to find best sets of rules.
AI also uses memory-based methods similar to k-NN, calling it
memory-based reasoning or case-based reasoning, that
estimates similarity to memorized cases.
We will be able to cover only few of these methods ...
•
•
•
Soft
Computing
Fuzzy
logic
Pattern
Recognition
AI, Expert
Systems
Neural
networks
Evolutionary
algorithms
Computational
Intelligence:
Data + Knowledge
Visualization
Multivariate
statistics
Machine
learning
Probabilistic
models
Computational Intelligence:
Methods and Applications
Lecture 3
Histograms and probabilities.
Source: Włodzisław Duch; Dept. of Informatics, UMK;
Google: W Duch
Features
AI uses complex knowledge representation methods.
Pattern recognition is based mostly on simple feature spaces.
•
•
•
Set of Objects: physical entities (images, patients, clients,
molecules, cars, signal samples, software pieces), or states of
physical entities (board states, patient states etc).
Features: measurements or evaluation of some object properties.
Ex: are pixel intensities good features?
No - not invariant to translation/scaling/rotation.
Better: type of connections, type of lines, number of lines ...
Selecting good features, transforming raw measurements that are
collected, is very important.
Feature space representation
•
Representation: mapping objects/states into vectors,
{Oi} => X(Oi), with Xj(Oi) being j-th attribute of object Oi
•
Attribute” and “feature” are used as synonyms, although strictly
speaking “age” is an attribute, and “young” is its feature value.
•
Types of features.
Categorical: symbolic or discrete – may be nominal (unordered),
like “sweet, salty, sour”, or ordinal (can be ordered), like colors
or small < medium < large (drink).
Continuous: numerical values.
x3
x(O)
x2
Vector X =(x1,x2,x3 ... xd),
or a d-dimensional point
in the feature space.
x1
Fishy example.
Chapter 1.2, Pattern Classification (2nd ed)
by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000
Singapore Fisheries automates the process of sorting salmon and sea
bass fish, coming on a conveyor belt. Optical sensor and processing
software evaluate a number of features: length, lightness, width, #fins
Step 1: look at the histograms.
• Select number of bins, ex. n=20 (discretize data)
• calculate bin size D=(xmax- xmin)/n,
• calculate N(C,ri) = # samples in class C  {salmon, bass} in each bin
ri = [xmin+(i-1)D, xmin+iD], i=1...n
• normalize P(C,ri)=N(C,ri)/N, where N is the number of all samples.
This gives joint probability P(C,ri) = P(ri|C)P(C)
Fishy histograms.
Example of histograms for two features, length (left) and skin lightness
(right). Optimal thresholds are marked.
P(ri|C) is an approximation to the class probability distribution P(x|C).
How to calculate it?
Discretization: replacing continuous values by discrete bins, or integration
by summation, very useful when dealing with real data.
Alternative: integrate, fit some simple functions to these histograms, for
example a sum of several Gaussians, optimize their width and centers.
Fishy example.
Exploratory data analysis (EDA): visualize relations in the data.
How are histograms created?
Select number of bins, ex. n=20 (discretize data into 20 bins)
• calculate bin size D=(xmax-xmin)/n,
• calculate N(C,ri) = # samples from class C in each bin
ri=[xmin+(i-1)D, xmin+iD], i=1...n
This may be converted to a joint probability P(C,ri) that a fish of the
type C will be found within length in bin ri
• P(C,ri) = N(C,ri)/N, where N is the number of all samples.
Histograms show joint probability P(C,ri) rescaled by N.
Basic probability concepts.
Normalized co-occurrence (contingency) table: P(C,ri)=N(C,ri)/N
 P  C1 , r1 

rows = classes.
 P  C2 , r1 
 P  C3 , r1 
P(C, ri) – joint probability distribution,  P  C4 , r1 
 P C , r 
P of finding C and xri
5 1

P(C, ri) – matrix, columns=bins ri,
P  C1 , r2 
P  C2 , r2 
P  C3 , r2 
P  C4 , r2 
P  C5 , r2 
P  C1 , r3  

P  C2 , r3  
P  C3 , r3  

P  C4 , r3  
P  C5 , r3  
P(C) or a priori class probability, before making any measurements or
learning that xri is in some bin, is obtained by summing the row.
 P  C, x  ri   P C 
i
P(xri) or probability that object from any class is found in bin ri is
obtained by summing the column.
 P C , x  r   P  x  r 
j
j
i
i
PDF and conditional probability
What if there x is a continuous variable and there are no natural bins?
Then P(C,x) is a probability density function (pdf), and for small dx
P(C, [x,x+dx]) = P(C,x) dx
Suppose now that class C is know; what is the probability of finding
xri or for continuous features finding it in [x,x+dx] interval?
P(xri|C) denotes conditional probability distribution, knowing C.
Because sum over all bins gives:  P  x  ri | C   1
i
and for joint probability
 P C, x  r   P C 
i
therefore the formula is
i
P  x  ri | C   P  C, x  ri  / P C 
PC(x)=P(x|C) class probability distribution is simple rescaled joint
probability, divide a single row of P(C,x) matrix by P(C).
Probability formulas
Most probability formulas are simple summations rules!
Matrix of joint probability distributions:
elements in P(C, x) for discrete x, or P(C, xri) after discretization.
Just count how many times N(C,x) is found.
P(C, x) = N(C,x)/N
n
Row of P(C, x) sums to:
therefore P(x|C)=P(C, x)/P(C)
sums to
For x continuous P(C,x) is the probability
density distribution, integrating to P(C)
P  C    P  C , xi  ;
i 1
n
Px
i 1
i
| C   1;
P  C    P  C , x  dx
Bayes formula
Bayes formula allows for calculation of the
conditional probability distribution P(x|C)
and posterior P(C|x) probability distribution.
These probabilities are renormalized
elements of the joint probability P(C,ri).
 P C   1;  P  x  dx  1
Px  1
C
i
i
They sum to 1 since we know that the
object with xri is from C class, and
that given x it must be in one of the C
classes.
Px
Therefore Bayes formula is quite
obvious!
P  C | xi  P  xi   P  xi | C  P  C 
i
i
| C    P  C | x   1;
C
P  xi | C   P  C , xi  / P  C 
P  C | xi   P  C , xi  / P  xi 
Example
Two types of Iris flowers:
Iris Setosa and Iris Virginica
Measuring petal lengths in cm in two intervals, r1=[0,3] cm and r2=[3,6] cm
of 100 samples we may get for example the following distribution:
 36 4 
N (C , r )  

8
52


N  C1   40, N  C2   60
N  r1   44, N  r2   56
Therefore probabilities for finding different types of Iris flowers is:
 0.36 0.04 
P(C , r )  

0.08
0.52


P  C1   0.4; P  r1   0.44
P  C2   0.6; P  r2   0.56
Calculate conditional probabilities and verify all formulas given on the
previous pages!
Do more examples.
Going to 2D histograms
2D histograms in 3D: still useful, although sometimes hard to analyze.
For a single class it is still easy, but for >2 classes rather difficult.
Many visualization software packages create such 3D plots.
Joint probability P(C,x,y) is shown here, for each class on separate
drawing; for small N it may be quite different than real distribution.
Examples of histograms
Histograms in 2D: still useful, although sometimes hard to analyze.
Made by SigmaPlot, Origin, or statistical packages like SPSS.
Illustration of discretization (bin width) influence
http://www.stat.sc.edu/~west/javahtml/Histogram.html
Various chart applets:
http://www.quadbase.com/espresschart/help/examples/
http://www.stat.berkeley.edu/~stark/SticiGui/index.htm
Image histograms – popular in electronic cameras.
How to improve the information content in histograms?
How to analyze high-dimensional data?
2D histograms in bioinformatics
Very popular, two nominal variables (genes, samples) vs. continuous
variable (activity) normalized to [-1,+1].
Example:
gene expression data in blood
B-cell of 16 types.
Use color instead of height.
Intensity = -1 => bright green
Intensity = 0 => black
Intensity =+1 => bright red
Intensity = -1 => inhibition
Intensity = 0 => normal
Intensity =+1 => high expression
Gene activity(gen name, cell type)