target function
Download
Report
Transcript target function
Data Mining & Machine Learning
Introduction
Intelligent Systems Lab.
Soongsil University
Thanks to Raymond J. Mooney in the University of Texas at Austin,
Isabelle Guyon
1
Artificial Intelligence (AI): Research Areas
Research
Artificial
Intelligence
Application
Paradigm
Learning Algorithms
Inference Mechanisms
Knowledge Representation
Intelligent System Architecture
Intelligent Agents
Information Retrieval
Electronic Commerce
Data Mining
Bioinformatics
Natural Language Proc.
Expert Systems
Rationalism (Logical)
Empiricism (Statistical)
Connectionism (Neural)
Evolutionary (Genetic)
Biological (Molecular)
2
Artificial Intelligence (AI): Paradigms
Symbolic AI
Rule-Based Systems
Connectionist AI
Neural Networks
Evolutionary AI
Genetic Algorithms
Molecular AI:
DNA Computing
3
What is Machine Learning?
Learning
algorithm
Trained
machine
TRAINING
DATA
Answer
Query
4
Definition of learning
• Definition: A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E
Experience, E
Task, T
Task
Program
Learning
Program
Learned
Program
Performance
Performance, P
5
What is Learning?
• Herbert Simon: “Learning is any
process by which a system improves
performance from experience.”
6
Machine Learning
• Supervised Learning
–
–
–
–
Estimate an unknown mapping from known input- output pairs
f w (x) y f (x)
Learn fw from training set D={(x,y)} s.t.
Classification: y is discrete
Regression: y is continuous
• Unsupervised Learning
– Only input values are provided
– Learn fw from D={(x)} s.t. f w (x) x
– Clustering
7
Why Machine Learning?
• Recent progress in algorithms and theory
• Growing flood of online data
• Computational power is available
• Knowledge engineering bottleneck. Develop systems
that are too difficult/expensive to construct manually
because they require specific detailed skills or
knowledge tuned to a specific task
• Budding industry
8
Niches using machine learning
• Data mining from large databases.
– Market basket analysis (e.g. diapers and beer)
– Medical records → medical knowledge
• Software applications we can’t program by hand
– Autonomous driving
– Speech recognition
• Self customizing programs to individual users.
– Spam mail filter
– Personalized tutoring
– Newsreader that learns user interests
9
Trends leading to Data Flood
• More data is generated:
– Bank, telecom, other
business
transactions ...
– Scientific data:
astronomy, biology, etc
– Web, text, and ecommerce
10
Data Growth
In 2 years, the size of the largest database TRIPLED!
11
Machine Learning / Data Mining
Application areas
• Science
– astronomy, bioinformatics, drug discovery, …
• Business
– CRM (Customer Relationship management), fraud
detection, e-commerce, manufacturing,
sports/entertainment, telecom, targeted marketing,
health care, …
• Web:
– search engines, advertising, web and text mining, …
• Government
– surveillance, crime detection, profiling tax cheaters, …
12
Data Mining for Customer Modeling
• Customer Tasks:
– attrition prediction
– targeted marketing:
• cross-sell, customer acquisition
– credit-risk
– fraud detection
• Industries
– banking, telecom, retail sales, …
13
Customer Attrition: Case Study
• Situation: Attrition rate at for mobile phone
customers is around 25-30% a year !
• With this in mind, what is our task?
– Assume we have customer information for the
past N months.
14
Customer Attrition: Case Study
Task:
• Predict who is likely to attrite next month.
• Estimate customer value and what is the
cost-effective offer to be made to this
customer.
15
Customer Attrition Results
• Verizon Wireless built a customer data warehouse
• Identified potential attriters
• Developed multiple, regional models
• Targeted customers with high propensity to accept the
offer
• Reduced attrition rate from over 2%/month to under
1.5%/month (huge impact, with >30 M subscribers)
16
Assessing Credit Risk: Case Study
• Situation: Person applies for a loan
• Task: Should a bank approve the loan?
• Note: People who have the best credit don’t need the loans,
and people with worst credit are not likely to repay. Bank’s
best customers are in the middle
17
Credit Risk - Results
• Banks develop credit models using variety of
machine learning methods.
• Mortgage and credit card proliferation are the
results of being able to successfully predict if a
person is likely to default on a loan
• Widely deployed in many countries
18
Successful e-commerce – Case Study
• Task: Recommend other books (products) this
person is likely to buy
• Amazon does clustering based on books bought:
– customers who bought “Advances in Knowledge
Discovery and Data Mining”, also bought “Data
Mining: Practical Machine Learning Tools and
Techniques with Java Implementations”
• Recommendation program is quite successful
19
Security and Fraud Detection - Case Study
• Credit Card Fraud Detection
• Detection of Money laundering
– FAIS (US Treasury)
• Securities Fraud
– NASDAQ KDD system
• Phone fraud
– AT&T, Bell Atlantic, British Telecom/MCI
• Bio-terrorism detection at Salt Lake
Olympics 2002
20
Example Problem:
Handwritten Digit Recognition
• Handcrafted rules will result in
large no. of rules and
Exceptions
Wide variability of same numeral
• Better to have a machine that
learns from a large training set
21
Chess Game
In 1997, Deep Blue(IBM) beat Garry Kasparov(러).
– Let IBM’s stock increase
by $18 billion at that year
22
Some Successful Applications of
Machine Learning
• Learning to drive an
autonomous vehicle
– Train computer-controlled vehicles
to steer correctly
– Drive at 70 mph for 90 miles on public
highways
– Associate steering commands with
image sequence
– 1200 computer-generated images as
training examples
• Half-hour training
An additional information from previous image
indicating the darkness or lightness of the road
23
Some Successful Applications of
Machine Learning
•
Learning to recognize spoken words
– Speech recognition/synthesis
– Natural language understanding/generation
– Machine translation
24
Example 1: visual object
categorization
• A classification problem: predict category y based on image x.
• Little chance to “hand-craft” a solution, without learning.
• Applications: robotics, HCI, web search (a real image Google..)
25
Face Recognition - 1
Given multiple angles/
views of a person, learn
to identify them.
Learn to distinguish
male from female faces.
26
Face Recognition - 2
Learn to recongnize emotions,
gestures
Li, Ye, Kambhametta, 2003
27
Robot
• Sony AIBO robot
– Available on June 1, 1999
– Weight: 1.6 KG
– Adaptive learning and growth capabilities
– Simulate emotion such as happiness and anger
28
Robot
• Honda ASIMO (Advanced Step in Innovate MObility)
– Born on 31 October, 2001
– Height: 120 CM, Weight: 52 KG
http://blog.makezine.com/archive/2009/08/asimo_avoids_moving_obstacles.html?CMP=OTC-0D6B48984890
29
Biomedical / Biometrics
• Medicine:
– Screening
– Diagnosis and prognosis
– Drug discovery
• Security:
– Face recognition
– Signature / fingerprint
– DNA fingerprinting
30
Computer / Internet
• Computer interfaces:
– Troubleshooting wizards
– Handwriting and speech
– Brain waves
• Internet
–
–
–
–
Spam filtering
Text categorization
Text translation
Recommendation
7
31
Refreshment !!
• 1. 몇마리의 동물이 있을까요?
2. 어떤 동물이 있는지 ?
32
Refreshment !!
• 1. 몇마리의 동물이 있을까요?
2. 어떤 동물이 있는지 ?
상어,
코끼리,
자라,
뱀,
물개,
말,
개,
고양이,
쥐,
물고기,
촉새
캥거루,
독수리
Total : 11마리
33
Classification
• Assign object/event to one of a given finite set
of categories.
–
–
–
–
–
–
–
–
–
–
–
–
Medical diagnosis
Credit card applications or transactions
Fraud detection in e-commerce
Worm detection in network packets
Spam filtering in email
Recommended articles in a newspaper
Recommended books, movies, music, or jokes
Financial investments
DNA sequences
Spoken words
Handwritten letters
Astronomical images
34
Problem Solving / Planning / Control
• Performing actions in an environment in
order to achieve a goal.
– Solving calculus problems
– Playing checkers, chess, or backgammon
– Driving a car or a jeep
– Flying a plane, helicopter, or rocket
– Controlling an elevator
– Controlling a character in a video game
– Controlling a mobile robot
35
Applications
training
examples5
10
Ecology
Market
Analysis
104
102
10
Text
Categorization
System diagnosis
103
Machine
Vision
OCR
HWR
Bioinformatics
inputs
10
102
103
104
105
36
Disciplines Related with Machine Learning
• Artificial intelligence
– 기호 표현 학습, 탐색문제, 문제해결, 기존지식의 활용
• Bayesian methods
– 가설 확률계산의 기초, naïve Bayes classifier,
unobserved 변수 값 추정
• Computational complexity theory
– 계산 효율, 학습 데이터의 크기, 오류의 수 등의 측정에
필요한 이론적 기반
• Control theory
– 이미 정의된 목적을 최적화하는 제어과정과 다음 상태
예측을 학습
37
Disciplines Related with Machine Learning (2)
• Information theory
– Entropy와 Information Content를 측정, Minimum Description
Length, Optimal Code 와 Optimal Training의 관계
• Philosophy
– Occam’s Razor, 일반화의 타당성 분석
• Psychology and neurobiology
– Neural network models
• Statistics
– 가설의 정확도 추정시 발생하는 에러의 특성화, 신뢰구간,
통계적 검증
38
Definition of learning
• Definition: A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E
Experience, E
Task, T
Task
Program
Learning
Program
Learned
Program
Performance
Performance, P
39
Example: checkers
Task T: Playing checkers.
Performance measure P: % of games won.
Training experience E: Practice games by playing against
itself.
40
Example: Recognizing handwritten letters
Task T: Recognizing and classifying handwritten words
within images.
Performance measure P: % words correctly classified.
Training experience E: A database of handwritten words
with given classifications.
41
Example: Robot driving
Task T: Driving on public four-lane highway using vision
sensors.
Performance measure P: Average distance traveled before
an error (as judged by a human overseer).
Training experience E: A sequence of images and steering
commands recorded while observing a human driver.
42
Designing a learning system
Task T: Playing checkers.
Performance measure P: % of games won.
Training experience E: Practice games by playing against itself.
– What does this mean?
– and what can we learn from it?
43
Measuring Performance
•
•
•
•
Classification Accuracy
Solution correctness
Solution quality (length, efficiency)
Speed of performance
44
Designing a Learning System
1. Choose the training experience/environment
2. Choose exactly what is to be learned, i.e. the target
function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the target function
from the experience.
Train examples…
Environment/
Experience
Test examples…
Learner
Knowledge
Performance
Element
45
Designing a Learning System
1. Choosing the Training Experience
• Key Attributes
– Direct/indirect feedback 을 제공하는가 ?
• Direct feedback: checkers states and correct move
• Indirect feedback: move sequences and final outcomes of
various games
– Credit assignment problem
– Degree of controlling the sequence of training example
• Learner의 자율성, 학습 정보를 얻을 때 teacher의 도움을
받는 정도,
– Distribution of examples
• Train examples의 분포와 Test examples의 분포의 문제
• 시스템의 성능을 평가하는 테스트의 예제 분포를 잘
반영해야 함
• 특수한 경우의 분포가 반영 (The Checkers World
Champion이 격은 특수한 경우의 분포가 반영 될까 ?)
46
Training Experience
• Direct experience: Given sample input and output
pairs for a useful target function.
– Checker boards labeled with the correct move,
• e.g. extracted from record of expert play
• Indirect experience: Given feedback which is not
direct I/O pairs for a useful target function.
– Potentially arbitrary sequences of game moves and their
final game results.
– Credit/Blame Assignment Problem: How to assign credit/
blame to individual moves given only indirect feedback?
47
Source of Training Data
• Provided random examples outside of the learner’s
control. (학습자-알고리즘 판단 밖의 사례를 무작위 추출)
– Negative examples available or only positive?
• Good training examples selected by a “benevolent
teacher.” (학습데이터를 Teacher 가 선택)
• Learner can construct an arbitrary example and
query an oracle for its label. (질의 응답으로 시스템 자체에서
예제를 구축)
– Learner can design and run experiments directly in the
environment without any human guidance.
(사람의 도움없이 시스템이 새로운 문제틀을 제시함)
– Learner can query an oracle about class of an unlabeled
example in the environment. (불안전한 예제의 답을 질문, 즉 (x, ___)
입력은 있으나 결과값이 없는 경우)
48
Training vs. Test Distribution
• Generally assume that the training and test
examples are independently drawn from the
same overall distribution of data.
– IID: Independently and identically distributed
• If examples are not independent, requires
collective classification.
(e.g. communication network, financial transaction network,
social network에 대한 관계 모델구축 시)
• If test distribution is different, requires
transfer learning. that is, achieving cumulative
learning
49
50
Transfer learning
51
참고: Transfer learning
• Transfer learning is what happens when someone
finds it much easier to learn to play chess having
already learned to play checkers;
or to recognize tables having already learned to
recognize chairs;
or to learn Spanish having already learned Italian.
• Achieving significant levels of transfer learning across
tasks -- that is, achieving cumulative learning -- is
perhaps the central problem facing machine learning.
52
Designing a Learning System
1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the target function from the
experience.
Learner
Environment/
Experience
Knowledge
Performance
Element
53
Designing a Learning System
2. Choosing a Target Function
• 어떤 지식(함수)을 학습시키고, 평가 시스템에서 어떻게 이용 할 것인가 ?
(학습의 대상 정의)
• 가정: “장기게임에서, 현재의 장기판에서 타당한 움직임들을 생성하는 함수가
있고, 최고의 움직임을 선택한다”.
– Could learn a function:
1. ChooseMove : B →M(최고의 움직임)
Or
2. Evaluation function, V : B → R
* 각 보드의 위치에 따라 얼마나 유리한가에 대한 점수를 부여함.
* V는 각각의 움직임에 대한 결과의 점수에 따라 선택하는데 이용가능 하여
* 결국은 최고 높은 점수를 얻을 수 있는 움직임을 선택하게 한다.
54
Designing a Learning System
2. Choosing the Target Function
• A function that chooses the best move M for any B
– ChooseMove : B →M
– Difficult to learn
• It is useful to reduce the problem of improving
performance P at task T, to the problem of learning some
particular target function.
• An evaluation function that assigns a numerical score to any B
– V:B→R
55
The start of the learning work
Instead of learning ChooseMove we establish a value function:
target function, V : B → R
that maps any legal board state in B to some real value in R.
어떤 Position에서도 그 Position이 초래하게 되는 Score 를 최대화
시키는 움직임을 선택하도록 한다.
1. if b is a final board state that is won, then V (b) = 100.
2. if b is a final board state that is lost, then V (b) = −100.
3. if b is a final board state that is drawn, then V (b) = 0.
4. if b is not a final board state, then V (b) =
56
The start of the learning work
Instead of learning ChooseMove we establish a value function:
target function, V : B → R
that maps any legal board state in B to some real value in R.
.
어떤 Position에서도 그 Position이 초래하게 되는 Score 를 최대화 시키는
움직임을 선택하도록 한다.
1. if b is a final board state that is won, then V (b) = 100.
2. if b is a final board state that is lost, then V (b) = −100.
3. if b is a final board state that is drawn, then V (b) = 0.
4. if b is not a final board state, then V (b) = V (b’),
b’ : 성취될 수 있는 최고의 마지막 상태 (the best final board state)
(단 상대방도 최적으로 수를 둔다고 가정)
Unfortunately, this did not take us any further!
57
Approximating V(b)
• Computing V(b) is intractable since it involves
searching the complete exponential game tree.
• Therefore, this definition is said to be nonoperational.
• An operational definition can be computed in
reasonable (polynomial) time.
• Need to learn an operational approximation to the
ideal evaluation function.
58
Designing a Learning System
1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the target function from the
experience.
Learner
Environment/
Experience
Knowledge
Performance
Element
59
3. Choosing a Representation for the
Target Function
• Describing the function
–
–
–
–
Decision Tables
Decision Rules
Polynomial functions
Neural nets
• Trade-off in choice
– Expressive power(함수의 표현), attribute수
– Size of training data (예제의 크기)
•
제약조건인 경우 많을 수록 함수의 해의 결과는 더 좋아 진다. 즉 더욱
좋은 근사값을 얻는 함수를 구할 수 있다.
•
그러나 범용의 모델이나 정확한 함수를 얻기 위해서 더 많은 예제,
사이즈가 큰 예제가 필요하게 된다.
60
Approximate representation
w1 - w6: weights
61
Linear Function for Representing V(b)
• Use a linear approximation of the evaluation function.
^
V(b) = w0 + w1x1 + w2x2 + w3x3 +w4x4 + w5x5 + w6x6
w0 : an additive constant
62
Designing a Learning System
1. Choose the training experience
2. Choose exactly what is to be learned, i.e. the target function.
3. Choose how to represent the target function.
4. Choose a learning algorithm to infer the target function from the
experience.
Learner
Environment/
Experience
Knowledge
Performance
Element
63
4. Choosing a Function Approximation
Algorithm
A training example is represented as an ordered pair
<b, Vtrain(b) >
b: board state
Vtrain(b) : training value for b
Instance: “black has won the game
<<x1=3, x2=0, x3=1, x4=0, x5=0, x6=0>, +100>
(x2 = 0) indicates that white has no remaining pieces.
Estimating training values for intermediate board states
^
^
Vtrain(bi) ← V ( bi+1 ) = (Successor(bi))
V
^
V: current approximation to V, (즉 the learned function, hypothesis)
Successor(bi): the next board state, 즉 bi+1 state
* 현재의 bi 보드상태에 대한 training value은 ← 다음단계의 장기판 상태(bi+1)의 가설의 함수값을 사용
64
Vtrain(b)
부연설명: Temporal Difference Learning
• Estimate training values for intermediate (nonterminal) board positions by the estimated value of
their successor in an actual game trace.
Vtrain(b) V (successor( b))
•
where successor(b) is the next board position
where it is the program’s move in actual play.
• Values towards the end of the game are initially more
accurate and continued training slowly “backs up”
accurate values to earlier board positions.
65
DESIGNING A LEARNING SYSTEM
Estimating Training Values
66
How to learn?
67
How to learn?
68
How to change the weights?
69
How to change the weights?
70
Obtaining Training Values
• Direct supervision may be available for the target
function.
• With indirect feedback, training values can be
estimated using temporal difference learning (used
in reinforcement learning where supervision is
delayed reward).
71
Learning Algorithm
• Uses training values for the target function to induce a
hypothesis definition that fits these examples and
hopefully generalizes to unseen examples.
가설 정의를 위한 학습예제 사용 미 학습 예제에도 적용가능
• In statistics, learning to approximate a continuous
function is called regression.
• Attempts to minimize some measure of error (loss
function) such as mean squared error:
72
The LMS(Least Mean Square) weight update rule
•
Due to mathematical reasoning, the following update rule is very sensible.
73
LMS(Least Mean Square) Discussion
• Intuitively, LMS executes the following rules:
– 예제를 적용한 결과(the output for an example )가 정확하다면, 변화를 주지
않는다.
– 예제를 적용한 결과 V (b) 가 너무 높게 나오면, 해당 features의 값에 비례하여
weight값을 낮춘다. 그러면 전반적인 예제의 결과도 줄어들게 된다.
–
예제를 적용한 결과 V (b) 가
너무 낮게 나오면, 해당 features의 값에
비례하여 weight값을 높인다. 그러면 전반적인 예제의 결과도 늘어나게
된다.
• Under the proper weak assumptions, LMS can be proven to
eventually converge to a set of weights that minimizes the mean
squared error.
74
Lessons Learned about Learning
• Learning은 ?
선택된 target function을 근사화(approximation) 하기
위해 direct or indirect experience을 사용한다.
• Function approximation 이란 ?:
a space of hypotheses에서 training data들에 가장
적합한 가설(hypotheses)을 찾아가는 Search에 해당 됨
• Different learning methods assume different hypothesis
spaces (representation languages) and/or employ
different search techniques.
75
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
–
–
–
–
–
Naïve Bayes
Bayesian networks
Hidden-Markov Models (HMMs)
Probabilistic Context Free Grammars (PCFGs)
Markov networks
76
Various Search Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– Probabilistic Context Free Grammars (PCFGs) Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
77
Evaluation of Learning Systems
• Experimental
– Conduct controlled cross-validation experiments to compare
various methods on a variety of benchmark datasets.
– Gather data on their performance, e.g. test accuracy,
training-time, testing-time.
– Analyze differences for statistical significance.
• Theoretical
– Analyze algorithms mathematically and prove theorems
about their:
• Computational complexity
• Ability to fit training data
• Sample complexity (number of training examples needed to
learn an accurate function)
78
Core parts of the machine learning
^
(Initial game board)
( V)
(Game history)
Many machine learning systems can be usefully characterized in terms of
these four generic modules.
79
Four Components of a Learning System (2)
• Generalizer
–
–
–
–
Input: training example
Output: hypothesis (estimate of the target function)
Generalizes from the specific training examples
Hypothesizes a general function
• Experiment generator
– Input - current hypothesis
– Output - a new problem
– Picks new practice problem maximizing the learning
rate
80
Four Components of a Learning System(1)
• Performance system
- Solve the given performance task
- Use the learned target function
- New problem → trace of its solution
• Critic
- Output a set of training examples of the target
function
81
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
–
–
–
–
Neural networks: Perceptron
Pattern recognition
Learning in the limit theory
Minsky and Papert prove limitations of Perceptron
• 1970s:
–
–
–
–
–
–
–
Symbolic concept induction
Winston’s arch learner
Expert systems and the knowledge acquisition bottleneck
Quinlan’s ID3
Michalski’s AQ and soybean diagnosis
Scientific discovery with BACON
Mathematical discovery with AM
82
History of Machine Learning (cont.)
• 1980s:
–
–
–
–
–
–
–
–
–
Advanced decision tree and rule learning
Explanation-based Learning (EBL)
Learning and planning and problem solving
Utility problem
Analogy
Cognitive architectures
Resurgence of neural networks (connectionism, backpropagation)
Valiant’s PAC Learning Theory
Focus on experimental methodology
• 1990s
–
–
–
–
–
–
–
Data mining
Adaptive software agents and web applications
Text learning
Reinforcement learning (RL)
Inductive Logic Programming (ILP)
Ensembles: Bagging, Boosting, and Stacking
Bayes Net learning
83
History of Machine Learning (cont.)
• 2000s
–
–
–
–
–
–
–
–
Support vector machines
Kernel methods
Graphical models
Statistical relational learning
Transfer learning
Sequence labeling
Collective classification and structured outputs
Computer Systems Applications
•
•
•
•
Compilers
Debugging
Graphics
Security (intrusion, virus, and worm detection)
– Email management
– Personalized assistants that learn
– Learning in robotics and vision
84
Remind
• “Learning as search in a space of possible
hypotheses”
• Learning methods are characterized by their
search strategies and by the underlying structure
of the search spaces.
85
Issues in Machine Learning
• 특정 학습예제에 대한 general target function의 학습
가능한 알고리즘 개발
•
훈련용 data는 얼마나 되어야 하는가 ?
• 학습에 대해 지식을 가지고 있다면 인간의 간여가 도움이
되는가 ?
• 각 훈련 단계별로 학습문제의 복잡도(Complexity)를
개선하는 문제
• Function approximation을 위해 학습의 노력을 줄이는 것
• Target function의 표현을 자동으로 개선 혹은 변경하는 것
86