Transcript algorithm

Data Mining 2
(ex Análisis Inteligente de Datos y Data Mining)
Lluís A. Belanche
www.lsi.upc.edu/...
/~belanche/docencia/aiddm/aiddm.html
/~avellido/teaching/data_mining.htm
Contents of the course (hopefully)
1. Introduction & methodologies
2. Exploratory DM through visualization
3. Pattern recognition: introduction
4. Pattern recognition: the Gaussian case
5. Feature extraction
6. Feature selection & weighing
7. Error estimation
8. Linear methods are nice!
9. Probability in Data Mining
10. Latency, generativity, manifolds and all that
11. Application of GTM: from medicine to ecology
12. DM Case studies
Sorry guys! … no fuzzy systems …
Error
estimation
Feature extraction, selection and
weighing have many uses
Linear classifiers are nice! (I)
Linear classifiers are nice! (II)
F Transformation
F (x) = [ F1(x), F2(x), … Fm(x) ]
with x = [ x1, x2, …, xn ]
Useful for “ascending” (m>n) or
“descending” (m>n)
with 0 < m,n < oo (integers) … an example?
Linear classifiers are nice! (III)
F Nets
F (x) = [ F1(x), F2(x), … Fm(x) ]
with x = [ x1, x2, …, xn ]
x

F(x)
Utility
• This is a very powerful setting
• Let us suppose:
r>s  increase in dimension
increase in expressive power, ease
the task for almost any learning machine
r<s  decrease in dimension
visualization, compactation, noise
reduction, removal of useless information
Contradictory  !?
On intelligence …
• What is Intelligence?
• What is the function of Intelligence?
 to ensure survival in nature
• What are the ingredients of intelligence?
– Perceive in a changing world
– Reason under partial truth
– Plan & prioritize under uncertainty
– Coordinate different simultaneous tasks
– Learn under noisy experiences
Parking a Car (difficult or easy?)
“Generally, a car can be parked rather easily
because the final position of the car is not
specified exactly. It it were specified to
within, say, a fraction of a millimeter and a
few seconds of arc, it would take hours of
maneuvering and precise measurements of
distance and angular position to solve the
problem.”
 High precision carries a high cost.
The primordial soup
Belief
Networks
Fuzzy Logic
Neural
Networks
Soft Computing
Chaos & Fractals
Rough Sets
Evolutionary
Algorithms
What could MACHINE LEARNING
possibly be?
In the beginning, there was a set of examples …
• To exploit imprecision, uncertainty, robustness,
data dependencies, learning and/or optimization
ability, to achieve a working solution to a
problem which is hard to solve.
• To find an exact (approximate) solution to an
imprecisely (precisely) formulated problem.
So what is the aim?
The challenge is to put these capabilities
into use by devising methods of
computation which lead to an acceptable
solution at the lowest possible cost.
This should be the guiding principle
Different methods = different roles
Fuzzy Logic : the algorithms for dealing with
imprecision and uncertainty
Neural Networks : the machinery for learning
and function approximation with noise
Evolutionary Algorithms : the algorithms for
adaptive search and optimization
uncertainty
arising
from
the
Rough
RS
granularity
in
the
domain
of
Sets
discourse
Examples of soft computing
• TSP: 105 cities,
– accuracy within 0.75%, 7 months
– accuracy within 1%, 2 days
• Compare
– “absoulute best for sure” with “very good with
very high probability”
Are you one of the top guns?
• Consider …
– Search space of size s
– Draw N random samples
– What is the probability p that at least one of
them is in the top t ?
• Answer: p = 1 – (1-t/s)N
• Example: s= 1012, N=100.000, t=1.000
 1 in 10.000 !
On Algorithms
• what is worth?
Specialized algorithms: best performance for special problems
Generic algorithms: good performance over a wide range of
problems
Generic Algorithms
Specialized Algo.
Problems
P
Words are important !
•
•
•
•
•
•
What is a theory ?
What is an algorithm ?
What is an implementation ?
What is a model ?
What does “non-linear” mean ?
What does “non-parametric” mean ?
The problem of induction
• Classical problem in
Philosophy
• Example: 1,2,3,4,5,?
• A more through
example: JT
What are the conditions for
successful learning?
• Training data (sufficiently) representative
• Principle of similarity
• Target function within capacity of the
learner
• Non-dull learning algorithm
• Enough computational resources
• A correct (or close to) learning bias
And the Oscar goes to …
The real problem is not whether machines
think, but whether men do.
B.F. Skinner,
Contingencies of Reinforcement