Data Mining & Analysis
Download
Report
Transcript Data Mining & Analysis
Machine Learning
Márk Horváth
Morgan Stanley
FID
Institutional Securities
Content
•
•
•
•
AI Paradigm
Data Mining
Weka
Application Areas
• Introduce many fields and the whole
paradigm
– No time for details
AI Paradigm
• “The area of computer science which deals with
problems, that we where not able to cope with
before.”
– Computer science is a branch of mathematics, btw.
• “Algorithms solving problems mainly through
interaction with the problem. The programmer
does not have to understand the solution to the
problem itself, but only the details of the learning
algorithm.”
AI Paradigm
• Why AI?
– new, fast expanding science, applicable at most of other
sciences
• it also deals with explaining evidence
– interdisciplinar
•
•
•
•
•
math
computer science
applied math
philosophy of science
biology (many naturally inspired algorithms, thinking machine)
• Why Machine Learning / Data Mining?
– it can be applied on any data (financial, medical, demographical,
…)
AI Paradigm
•
•
•
•
•
1965 John McCarthy => 42 years
Hilbert, theorem proving machine
Occam (XIV.)
Many distinct fields
Many algorithms at each field
• => 1 hour is nothing….
•
•
•
•
Empirical and theoretical science
Intuition needed to use and hybridize
Few proves
Area too big to grasp everything in detail, but concepts are important
– => BIG PICTURE, no formulas!
AI Taxonomy
AI
Logic /
Expert Sys
Machine Learning /
Optimization Control Clustering Model / AGI
PCA, ICA
Data Mining /
…
Function Approximation
Kernel Based /
Decision Tree / Linear Regression /
Naiive Bayes
0R, 1R
Nearest Neighbor Covering
Gradient Methods …
(max likelihood)
Data Mining vs. Statistics
• Statistics
– ~ hypothesis testing
• DM
– search through hypothesizes
• Empirical side
– Many methods work which are proven to not
converge
– Some methods do not work while they should (due to
computation power problems, slow convergence)
Relation, Attribute, Class
(Ω, A, P)
X = MYCT x MMIN x MMAX x CACH x CHMIN x CHMAX (Attribute, Feature)
Y = class (Class, Target)
Ω=XxY
ρ( Y | X ) = ?
@relation 'cpu‘
@attribute MYCT real
@attribute MMIN real
@attribute MMAX real
@attribute CACH real
@attribute CHMIN real
@attribute CHMAX real
@attribute class real % performance
@data
125,256,6000,256,16,128,199
29,8000,32000,32,8,32,253
29,8000,16000,32,8,16,132
26,8000,32000,64,8,32,290
23,16000,32000,64,16,32,381
…
General View of Data Mining
• Language
• Build model / search over the Language
Simple Cases
• 0R
• 1R (nominal class)
• Max likelihood
• Linear Regression
Data Mining Taxonomy
• Regression vs. Classification
(exchangeable)
• Deterministic vs. Stochastic
(~exchangeable: Chebyshev)
• Batch driven vs. Updateable
(~exchangeable, but with cost)
• Symbolic vs. Subsymbolic
Methodology
•
•
•
•
Clean data
Try many methods
Optimize good methods
Hybridize good methods, make meta
algorithms
Evaluation Measures
• Mean Absolute Error / Root Mean Squared
Error
• Correlation Coefficient
• Information gain
• Custom (e.g. weighted)
• Significance analysis (Bernoulli process)
Overfitting, Learning Noise
• Philosophical question
– When do we accept or deny a model?
– No chance to prove, only to reject
• Train / (Validation) / Test
• Cross-validation, leave one out
• Minimum Description Length principle
– Occam
– Kolmogorov complexity
Nearest Neighbor / Kernel
•
•
•
•
Instance based
Statistical (k neighbors)
Distance: Euclidian, Manhattan / Evolved
Missing Attribute: maximal distance
• KD-tree (log(n)), ball tree, metric tree
Decision Trees / Covering
• Divide and Conquer
• Split by the best feature
• User Classifier / REP Tree
Naiive Bayes
• Independent Attributes
• P(X | Y) = P(Y | X) * P(X) / P(Y) =
= Π P(Y | Xi) * P(X) / P(Y)
• Discrete Class
Artificial Neural Networks
• Structure (Weka)
– Theoretical limitations (Minsky, AI winter)
• Recurrent networks for time series
Feedforward Learning Rules
• Learning rules
– Perceptron / Winnow (very simple rules for special cases)
– Various gradient descent methods
• Slower than perceptron
• Faster than doing derivation of the whole expression
• Local search
– Evolution
• Global search
• Bit slower, but easy to hybridize with local search
• Can evolve:
–
–
–
–
Weights
Structure
Transfer functions
Recurrent networks
Perceptron / Winnow
• Perceptron
– Add the misclassified instance to the weight
– Converges if the space is separable
• Winnow
– Binary
– Increase or decrease non zero attribute
weights
Feature extraction
•
•
•
•
•
Discretization
PCA/ICA
Various state space transitions
Evolving features
Clustering
Meta / Hybrid Methods
•
•
•
•
LEGO ;)
Vote (many ways)
Use meta algorithm to predict based on base methods
Embed
– Apply regression in the leaves of decision trees
– Embed decision tree, or training samples in ANN
• Unify
– Choose a general purpose language
– Use conventional training methods to build models
– Hybridize training methods, evolve
• Easy to write articles, countless new ideas
Practical Uses
• New paradigm
• Countless applications
• At all natural sciences
– finance, psychology, sociology, biology,
medicine, chemistry, …
– actually discovering and explaining evidence
is science itself
• Business
– predictive enterprise
Applications in AI
• Optimal Control (model building)
• Using in other AI methods
– Speech recognition
– OCR
– Speech synthesis
– Vision, recognition
– AGI (logic, DM, evolution, clustering,
reinforcement learning, …)
TDK, Article
• Any topic you’ve found interesting…