Secure reversible visible image watermarking with authentication
Download
Report
Transcript Secure reversible visible image watermarking with authentication
Integrating induction and deduction for
noisy data mining
報告人:陳重光
Department of Computer Science, National Tsing Hua University, No. 101 Kuang Fu Road, Hsinchu 300, Taiwan
Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu, Taiwan
a Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
b School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009,1China
Outline(1/2)
1. Introduction
2. Noise modeling with associative corruption rules
–
–
A systematic noise handling framework
Problem statement
•
•
–
Input data
Notations and definitions
Method
•
•
Algorithm ACF
Algorithm ACB
2
Outline(2/2)
3. Experiments
–
–
Experiment settings
Experimental results
4. Conclusions
3
1. Introduction(1/3)
1. The main purpose of data mining is from the large
amount of data can be manipulated to find the
knowledge.
2. In many data mining topics,of which the three
most fundamental ones are
–
–
–
Classification
Cluster analysis
Association analysis
3. Other classification methods
–
–
–
Linear regression models
Classification rules
Neural networks
4
1. Introduction(2/3)
4. Data mining techniques have been applied to many
fundamental research domains
–
–
–
Biology
Medicine
Ecology
5. There are two essential driving forces that push data
mining research to move forward energetically
–
–
Large amounts of data
Powerful hardware support
5
1. Introduction(3/3)
6. The main purpose of this paper is perform a study
on integrating induction and deduction for noisy
data mining.
6
Noise modeling with associative
corruption rules
a)
In large scale data mining applications,erroneous
entries in the data are almost unavoidable.
b)
The existence of such noise degrades the dataset’s
truthfulness,which directly affects the data
quality.
c)
The robustness of data mining results crucially
relies on the quality of the underlying data.
7
A systematic noise handling
framework
•
Noise data from different sources,can be traced by
analyzing the erroneous data items,unless they are
totally random.
•
Gaussian noise follows the normal distribution with
some certain mean value and variance,it can be
regarded as a kind of systematic noise.
8
1. To understand the nature
of the noise.
性質
2. To eliminate the noise from the source data so as
benefit the succeeding data mining process.
Data
9
Problem statement
a) To derive the associative rules that corrupt the
original clean dataset Dc1.
b) To eliminate the noise from the noisy data Dobs and
construct a robust learner for supervised learning.
10
Input data
a) A subset of instances that are suspects of noise are
identified based on a certain criterion.
b) proposing error correction rules and performing
error correction on this subset of data.
11
Notations and definitions
• Before providing the definitions of several concepts
for this study,we give some notations as follows:
– A: a set of feature attributes of Dobs; A = {A1; A2; . . . ; AN};
– C: the class attribute of Dobs;
– V: the value space of the corresponding attributes:
V = {V1;V2; . . . ;VN;VC},where Vi corresponds to Ai;
Ai 屬於 (A ∪ {C});
– H: a 2-tuple structure (Ai; vi),where
• Ai 屬於 (A ∪{C});Ai is called the Head of H;
• vi 屬於 Vi;vi is the value of Head,Vi corresponds to Ai;
– T: a 2-tuple structure <p,v>,where
• p = <Ai; vi> is a structure H;Ai is called the Head of T;
• v 屬於 Vi;v is the modified value of Head,Vi corresponds to Ai;
• vi ≠ v.
12
1) Noise Formation
Dobs1
Dc1
Method
Noisy
Clean data
Dcicor2
Dobs
Dobs2
Dcor2
Problem description
2) Noise Correction
13
• In this study,propose a deductive learning
procedure to derive these corruption rules.
• The idea follows a two-step fashion.
– Firstly,we propose an algorithm called ACF(Associative Corruption
Forward) to learn the noise formation mechanism from Dc1 to Dobs1.
– Secondly,we propose an algorithm called ACB (Associative
Corruption Backward) that corrects Dobs2.
14
Algorithm ACF
• Algorithm ACF is used to infer the set of AC rules
R1 that corrupts Dc1.
• Employ the method of classification rule induction.
15
Algorithm ACB
• Algorithm ACB (Associative Corruption Backward)
is used for noise correction.
• It is not a strict one to one mapping.
• ACB builds a Naive Bayes learner based on Dc1 for
each noise corrupted attribute.
16
Experiments
• The objectives of our experiments focus on two
aspects.
– Firstly,we want to examine whether the algorithm ACF could
accurately derive the AC rules.
– Secondly,we seek to verify whether our noise correction procedure
ACB could produce a higher quality dataset Dcor2 in terms of
supervised learning.
17
Experiment settings
• Evaluate the system performances on datasets
collected from the UCI database repository And
References [22] compared .
• In order to evaluate the performance of the proposed
method we first separate Dclean into two parts:
– A dataset Dbase.
– Corresponding testing set Dtest.
18
Experimental results
• In the set of AC rules R that corrupts the original
clean dataset,more than one AC rule are allowed.
However;restrictions are applied to R as follows:
– Every rule in R is an AC rule;
– For any two rules in R,the right-hand side of them differs from each
other;
– If P => Q 屬於 R,where Q = <p,v>,then predicate p does not exist
in both the left and the right-hand sides of any other rules in R.
19
Basic information on the datasets for the
experiment.
20
Shows the comparative results of five models
m0 through m4. m0 is a benchmark learner
built on noise-free data dclean.
21
Conclusions(1/2)
• Bring up a systematic noise handling framework into
discussion,where the deductive reasoning on noise
information and inductive learning from the input
data are integrated neatly.
• proposed a method to handle the noise caused by
Associative Corruption (AC) rules for supervised
learning.
22
Conclusions(2/2)
• In order to propose a method to correct Dobs2,we
design a two-step method that includes algorithms
ACF and ACB.
• In this experiments,we show that our method could
infer the noise formation mechanism accurately and
perform a noise correction process appropriately,so
as to enhance the quality of the original dataset.
23