Transcript Ch1-PPT

Statistical Learning
& Inference
Lecturer: Liqing Zhang
Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
Books and References
– Trevor Hastie Robert Tibshirani Jerome Friedman , The Elements
of statistical Learning: Data Mining, Inference, and Prediction, 2001,
Springer-Verlag
– V. Cherkassky & F. Mulier, Learning From Data, Wiley,1998
– Vladimir N. Vapnik, The Nature of Statistical Learning Theory,
2nd ed., Springer, 2000
– M. Vidyasagar, Learning and generalization: with applications to
neural networks, 2nd ed., Springer, 2003
– G. Casella & R. Berger, Statistical Inference, Thomson, 2002
– T. Cover & J. Thomas, Elements of Information Theory, Wiley
2015/4/13
Statistical Learning and Inference
2
Overview of the Course









Introduction
Overview of Supervised Learning
Linear Method for Regression and Classification
Basis Expansions and Regularization
Kernel Methods
Model Selections and Inference
Support Vector Machine
Bayesian Inference
Unsupervised Learning
2015/4/13
Statistical Learning and Inference
3
Why Statistical Learning?



我门被信息淹没,但却缺乏知识。---- R. Roger
恬静的统计学家改变了我们的世界;不是通过发现新的事实或者
开发新技术,而是通过改变我们的推理、实验和观点的形成方式。
---- I. Hacking
问题:为什么现在的计算机处理智能信息效率很低?
– 图像、视频、音频
– 认知、交流
– 语言、语音、文本
– 生物、基因、蛋白
2015/4/13
Statistical Learning and Inference
4
Cloud Computing
Cloud Computing Service Layers
Services
Services
Application
Focused
Services – Complete business services such as PayPal,
OpenID, OAuth, Google Maps, Alexa
Application
Application – Cloud based software that eliminates the
need for local installation such as Google Apps,
Microsoft Online
Development
Development – Software development platforms used to
build custom cloud based applications (PAAS & SAAS)
such as SalesForce
Platform
Infrastructure
Focused
Description
Storage
Hosting
Platform – Cloud based platforms, typically provided
using virtualization, such as Amazon ECC, Sun Grid
Storage – Data storage or cloud based NAS such as
CTERA, iDisk, CloudNAS
Hosting – Physical data centers such as those run by
IBM, HP, NaviSite, etc.
个人用户需要更多的功能:
•疾病监护/心肺疾病
问题心电发
送
•康复训练
•健身指导等
心电采集和初
步诊断
调动社区医院空闲资源
反馈治疗建
议
自动诊断和辅助诊断
数据共享(远程医生)
咨询系统
社区医院
远程诊疗与监护
中心
人工诊断和治
疗建议
诊断结果反馈
数据发送
社区医院
医院医生也是远程种
新的用户
社区医院也需要更多的功能:
•心电、呼吸、血压
•慢性病康复训练
•健身指导等
ML: SARS Risk Prediction
Pre-Hospital
Attributes
2015/4/13
RBC Count
Albumin
Blood pO2
White Count
Chest X-Ray
Age
Gender
Blood Pressure
SARS
Risk
In-Hospital
Attributes
Statistical Learning and Inference
8
ML: Auto Vehicle Navigation
Steering Direction
2015/4/13
Statistical Learning and Inference
9
Protein Folding
2015/4/13
Statistical Learning and Inference
10
The Scale of Biomedical Data
2015/4/13
Statistical Learning and Inference
11
General Procedure in SL
Problem Definition
Predictions
Data Acquisition
ML
Procedure
Model Training
Feature Analysis
EX. Pattern Classification

Objective: To recognize horse in images

Procedure: Feature => Classifier => Cross+Valivation
2015/4/13
Statistical Learning and Inference
13
Classifier
Horse
Non Horse
2015/4/13
Statistical Learning and Inference
14
Function Estimation Model

The Function Estimation Model of learning
examples:
– Generator (G) generates observations x (typically in Rn),
independently drawn from some fixed distribution F(x)
– Supervisor (S) labels each input x with an output value y
according to some fixed distribution F(y|x)
– Learning Machine (LM) “learns” from an i.i.d. l-sample of
(x,y)-pairs output from G and S, by choosing a function that
best approximates S from a parameterised function class f(x,),
where  is in  the parameter set
2015/4/13
Statistical Learning and Inference
15
Function Estimation Model
G

x
S
y
LM
^y
Key concepts: F(x,y), an i.i.d. k-sample on F,
functions f(x,) and the equivalent representation of
each f using its index 
2015/4/13
Statistical Learning and Inference
16
The Problem of Risk Minimization

The loss functional (L, Q)
– the error of a given function on a given example
L : x, y, f   L y, f x,  
Q : z,  
 Lz y , f z x ,  

The risk functional (R)
– the expected loss of a given function on an example
drawn from F(x,y)
– the (usual concept of) generalisation error of a given
function
R     Q  z ,  dF  z 
2015/4/13
Statistical Learning and Inference
17
The Problem of Risk Minimization

Three Main Learning Problems
– Pattern Recognition:
y 0,1and L y, f x,    1y  f x, 
– Regression Estimation:
y   and L y, f x,    y  f x, 
2
– Density Estimation:
y  0,1 and L px,     log px,  
2015/4/13
Statistical Learning and Inference
18
General Formulation

The Goal of Learning
– Given an i.i.d. k-sample z1,…, zk drawn from a fixed
distribution F(z)
– For a function class’ loss functionals Q (z ,), with 
in 
– We wish to minimise the risk, finding a function *
 *  arg min R 

2015/4/13
Statistical Learning and Inference
19
General Formulation

The Empirical Risk Minimization (ERM) Inductive
Principle
– Define the empirical risk (sample/training error):
1 k
Remp     Qzi ,  
k i 1
– Define the empirical risk minimiser:
 k  arg min Remp  
 
– ERM approximates Q (z ,*) with Q (z ,k) the Remp
minimiser…that is ERM approximates * with k
– Least-squares and Maximum-likelihood are realisations
of ERM
2015/4/13
Statistical Learning and Inference
20
4 Issues of Learning Theory
1. Theory of consistency of learning processes
•
What are (necessary and sufficient) conditions for consistency
(convergence of Remp to R) of a learning process based on the ERM
Principle?
2. Non-asymptotic theory of the rate of convergence of learning
processes
•
How fast is the rate of convergence of a learning process?
3. Generalization ability of learning processes
•
How can one control the rate of convergence (the generalization ability)
of a learning process?
4. Constructing learning algorithms (i.e. the SVM)
•
How can one construct algorithms that can control the generalization
ability?
2015/4/13
Statistical Learning and Inference
21
Change in Scientific Methodology
NEW
TRADITIONAL






Formulate hypothesis
Design experiment
Collect data
Analyze results
Review hypothesis
Repeat/Publish








2015/4/13
Design large experiments
Collect large data
Put data in large database
Formulate hypothesis
Evaluate hypothesis on
database
Run limited experiments
Review hypothesis
Repeat/Publish
Statistical Learning and Inference
22
Learning & Adaptation



Any method that incorporates information from training
samples in the design of a classifier employs learning.
Due to complexity of classification problems, we cannot
guess the best classification decision ahead of time, we need
to learn it.
Creating classifiers then involves positing some general
form of model, or form of the classifier, and using examples
to learn the complete classifier.
2015/4/13
Statistical Learning and Inference
23
Supervised learning

In supervised learning, a teacher provides a
category label for each pattern in a training set.
These are then used to train a classifier which can
thereafter solve similar classification problems by
itself.
– Such as Face Recognition, Text Classification, ……
2015/4/13
Statistical Learning and Inference
24
Unsupervised learning

In unsupervised learning, or clustering, there is no
explicit teacher or training data. The system forms
natural clusters of input patterns and classifiers
them based on clusters they belong to .
– Data Clustering, Data Quantization, Dimensional
Reduction, ……
2015/4/13
Statistical Learning and Inference
25
Reinforcement learning

In reinforcement learning, a teacher only says to
classifier whether it is right when suggesting a
category for a pattern. The teacher does not tell
what the correct category is.
– Agent, Robot, ……
2015/4/13
Statistical Learning and Inference
26
Classification




The task of the classifier component is to use the feature
vector provided by the feature extractor to assign the object
to a category.
Classification is the main topic of this course.
The abstraction provided by the feature vector
representation of the input data enables the development of
a largely domain-independent theory of classification.
Essentially the classifier divides the feature space into
regions corresponding to different categories.
2015/4/13
Statistical Learning and Inference
27
Classification



The degree of difficulty of the classification problem
depends on the variability in the feature values for objects in
the same category relative to the feature value variation
between the categories.
Variability is natural or is due to noise.
Variability can be described through statistics leading to
statistical pattern recognition.
2015/4/13
Statistical Learning and Inference
28
Classification

Question: How to design a classifier that can cope
with the variability in feature values? What is the
best possible performance?
Object Representation in Feature Space
S(x)>=0
Class A
S(x)<0
Class B
Noise and Biological Variations Cause Class Spread
X2
(area)
Classification error due to class
overlap
S(x)=0
Objects
(perimeter) X1
2015/4/13
Statistical Learning and Inference
29
Examples





User interfaces: modelling subjectivity and affect,
intelligent agents, transduction (input from camera,
microphone, or fish sensor)
Recovering visual models: face recognition, modelbased video, avatars
Dynamical systems: speech recognition, visual
tracking, gesture recognition, virtual instruments
Probabilistic modeling: image compression, low
bandwidth teleconferencing, texture synthesis
……
2015/4/13
Statistical Learning and Inference
30
Course Web



http://bcmi.sjtu.edu.cn/statLearnig/
Teaching Assistant: Liu Ye
Email: [email protected]
2015/4/13
Statistical Learning and Inference
31
Assignment

To write a report on the topic you are working on,
including:
–
–
–
–
Problem definition
Model and method
Key issues to be solved
Outcome
2015/4/13
Statistical Learning and Inference
32