Fuzzy Set Approaches

Download Report

Transcript Fuzzy Set Approaches

Classification and
Prediction
Fuzzy
Fuzzy Set
Approaches
Fuzzy logic uses truth values between 0.0 and 1.0 to
represent the degree of membership (such as using
fuzzy membership graph)
Attribute values are converted to fuzzy values

e.g., income is mapped into the discrete categories {low, medium,
high} with fuzzy values calculated
For a given new sample, more than one fuzzy value may
apply
Each applicable rule contributes a vote for membership
in the categories
Typically, the truth values for each predicted category
are summed
Fuzzy Sets
Sets with fuzzy boundaries
A = Set of tall people
Crisp set A
1.0
Fuzzy set A
1.0
.9
Membership
.5
function
5’10’’
2015/7/17
Heights
5’10’’ 6’2’’
Heights
3
Membership Functions (MFs)
Characteristics of MFs:
Subjective measures
 Not probability functions

“tall” in Asia
MFs
.8
“tall” in the US
.5
“tall” in NBA
.1
5’10’’
2015/7/17
Heights
4
Fuzzy Sets
Formal definition:
A fuzzy set A in X is expressed as a set of ordered pairs:
A  {( x,  A ( x ))| x  X }
Fuzzy set
Membership
function
(MF)
Universe or
universe of discourse
A fuzzy set is totally characterized by a
membership function (MF).
2015/7/17
5
Fuzzy Sets with Discrete
Universes
Fuzzy set A = “sensible number of children”
X = {0, 1, 2, 3, 4, 5, 6} (discrete universe)
A = {(0, .1), (1, .3), (2, .7), (3, 1), (4, .6), (5, .2),
(6, .1)}
2015/7/17
6
Fuzzy Sets with Cont.
Universes
Fuzzy set B = “about 50 years old”
X = Set of positive real numbers (continuous)
B = {(x, B(x)) | x in X}
B(x) 
2015/7/17
1
 x  50 
1 

 10 
2
7
Fuzzy Partition
Fuzzy partitions formed by the linguistic
values “young”, “middle aged”, and “old”:
2015/7/17
lingmf.m
8
Set-Theoretic Operations
Subset:
A  B  A  B
Complement:
A  X  A  A ( x )  1  A ( x )
Union:
C  A  B  c ( x )  max( A ( x ), B ( x ))  A ( x ) B ( x )
Intersection:
C  A  B  c ( x )  min( A ( x ), B ( x ))  A ( x ) B ( x )
2015/7/17
9
Set-Theoretic Operations
subset.m
2015/7/17
fuzsetop.m
10
MF Formulation
disp_mf.m
2015/7/17
11
Fuzzy If-Then Rules
General format:
If x is A then y is B
Examples:
If pressure is high, then volume is small.
 If the road is slippery, then driving is
dangerous.
 If a tomato is red, then it is ripe.
 If the speed is high, then apply the brake a
little.

2015/7/17
12
Classification and
Prediction
Fuzzy
Support Vector Machine
Support Vector Machine
To search the Optimal Separating
Hyperplane to maximize the margin
Support Vector Machine
To train SVM is equal to solving a
quadratic programming problem
Test phase
Ns
f ( x)   i yi K ( si , x)  b
i 1
si : support vectors, yi : class of si
K(): kernel function, αi b : parameters
Support Vector Machine
Kernel Function

K(x,y) = (x) • (y)
x,y are vectors in input space
(x), (y) are vectors in feature space
d (feature space) >> d (input space)



No need to compute (x) explicitly
Tr(x,y) = sub(x) • sub(y), where sub(x) is a
vector represents all the sub-trees of x.
www.csie.ntu.edu.tw/~cjlin
Classification and
Prediction
Fuzzy
Support Vector Machine
Prediction
What Is
Prediction?
Prediction is similar to classification

First, construct a model

Second, use model to predict unknown value
Major method for prediction is regression

Linear and multiple regression

Non-linear regression
Prediction is different from classification

Classification refers to predict categorical class label

Prediction models continuous-valued functions
Regress Analysis and LogLinear Models in Prediction
Linear regression: Y =  +  X


Two parameters ,  and  specify the line and are to
be estimated by using the data at hand.
using the least squares criterion to the known values of
Y1, Y2, …, X1, X2, ….
Multiple regression: Y = b0 + b1 X1 + b2 X2.
Many nonlinear functions can be transformed into the
above.
Log-linear models:
 The multi-way table of joint probabilities is
approximated by a product of lower-order tables.
 Probability: p(a, b, c, d) = ab acad bcd

Locally Weighted Regression
Construct an explicit approximation to f over a
local region surrounding query instance xq.
Locally weighted linear regression:


The target function f is approximated near xq using the
linear function: f ( x)  w  w a ( x)w a ( x)
n n
0
11
minimize the squared error: distance-decreasing weight
K
1

2
E ( xq ) 

 ( f ( x)  f ( x)) K (d ( xq , x))
2 xk _nearest _neighbors
_of _ xq
the gradient descent training rule:
w j  
K(d ( xq , x))(( f ( x)  f ( x))a j ( x)

x k _ nearest _ neighbors_ of _ xq
In most cases, the target function is approximated
by a constant, linear, or quadratic function.
Classification and
Prediction
Fuzzy
Support Vector Machine
Prediction
Classification accuracy
Classification Accuracy: Estimating
Error Rates
Partition: Training-and-testing


use two independent data sets, e.g., training set (2/3),
test set(1/3)
used for data set with large number of samples
Cross-validation



divide the data set into k subsamples
use k-1 subsamples as training data and one subsample as test data --- k-fold cross-validation
for data set with moderate size
Bootstrapping (leave-one-out)

for small size data
Boosting and Bagging
Boosting increases classification
accuracy

Applicable to decision trees or Bayesian
classifier
Learn a series of classifiers, where each
classifier in the series pays more
attention to the examples misclassified
by its predecessor
Boosting requires only linear time and
constant space
Boosting Technique (II) —
Algorithm
Assign every example an equal weight 1/N
For t = 1, 2, …, T Do



Obtain a hypothesis (classifier) h(t) under w(t)
Calculate the error of h(t) and re-weight the
examples based on the error
Normalize w(t+1) to sum to 1
Output a weighted sum of all the hypothesis,
with each hypothesis weighted according to its
accuracy on the training set
Is Accuracy Enough to Judge?
Sensitivity: t_pos/pos
Specificity: t_neg/neg
Precision: t_pos/(t_pos+f_pos)
Classification and
Prediction
Decision tree
Bayesian Classification
ANN
KNN
GA
Fuzzy
SVM
Prediction
Some issues
Summary
Classification is an extensively studied problem (mainly in
statistics, machine learning & neural networks)
Classification is probably one of the most widely used data
mining techniques with a lot of extensions
Scalability is still an important issue for database
applications: thus combining classification with database
techniques should be a promising topic
Research directions: classification of non-relational data,
e.g., text, spatial, multimedia, etc..