fazifikacija statističkih pokazatelja opština i gradova

Download Report

Transcript fazifikacija statističkih pokazatelja opština i gradova

SOFT COMPUTING
TECHNIQUES FOR
STATISTICAL DATABASES
Miroslav Hudec
INFOSTAT – Bratislava
MSIS 2009
Introduction
•
•
•
•
•
•
Soft computing (by fuzzy logic)
Database query (SQL - fuzzy)
case study
Data classification (usual - fuzzy)
case study
Conclusion
Soft computing
The essential property of soft computing (SC) is to
“soften” hard computing (HC) techniques for coping
with the imprecision, ambiguity and uncertainty.
HC uses two-valued logic (e.g. the element satisfies or
not the criterion)
Fuzzy logic as a part of SC uses many valued logic (e.g.
the element can partly satisfy the criterion)
Computing with words is inspired by the human
capability to perform a wide variety of tasks without
exact measurements and computations. (Flexible
database query. Interesting for statistical IS?)
Database queries (SQL)
two-valued logic
attribute_r
select *
from Table
where attribute_p > P
and attribute_r < R.
R
0
P
attribute_p
SQL and fuzzy queries
two-valued logic
SQL
conditions
>=, <=, =
many-valued logic
fuzzy
µ(B)
µ(S)
1
0
n
µ(A)
1
Ld
Lp
attribute
big
0
1
Lp
L g attribute
0 Ld Lp
small
Lq
Lg
attribute
about
WHERE (a i  Lix )
i 1
and

 or
logical operators
and, or:
1 and 1 =1
0 and 1 =0
one function for
and and or
operator
 ai  Lid ,
a i is Big

ai  Lix  ai  Lig ,
a i is Small
 a  L and a  L , a is About
id
i
ig
i
 i
0,7 and 0,358=?
minimum : min(  i (a i )) , i  1,..., n (0.358)
product :  ( i (a i )), i  1,..., n
(0.2506)
for {0,1} logic minimum and product
become ordinary and operator
Case study
select district, roads, area
from T
where roads is Big and area is Small
The length of road indicator is represented by „Big value“
fuzzy set with these parameters Ld=200km and Lp
=300km. The „Small value“ fuzzy set with parameters
Lp=450km2 and Lg =650km2 describes the area of district
attribute.
Solution
If SQL was
used, this
additional
valuable
information
would remain
hidden.
Discussion
For the very soft gradation, the infinite number of SQL
queries has to be used. In case of fuzzy queries, one query
is sufficient.
The advantages of this approach for users are as follows:
• the connection to a database (connection string) and data
accessing (SQL command) do not have to be modified;
• users do not need to learn a new query language;
• the interface supports (quasi) natural language;
• presenting of obtained data is in similar way as from SQL
but with additional valuable information;
• users see data “behind the corner“ (colored areas in table)
and can take into account possible interested data.
Data classification
two-valued logic
Snow [days]
124
T4
C3
C4
T3
67
T2
C1
C2
T1
0
0
30
60
Roads [km]
How to solve this problem without additional calculation?
Approximate reasoning and fuzzy logic
Data classification
many-valued logic
I2
C3
classify_into [classCx]
select [attributes]
from [tables, views]
C4
75
K
n
60
WHERE  (ai  Lix )
C1
k 1 i 1
C2
 (x
)
25
35
I1
The same GLC
Case study
In this case study municipalities are classified according to the
percentage of needs for the winter road maintenance.


(x)
1
S
B
25
35
P1 - length of
roads [km]
(x)
1
S
B
60
75
P2 - number of
days with snow
This example contains following fuzzy rules :
If Road is Small and Snow is Small Then Maintenance is Small; (0.1)
If Road is Small and Snow is Big Then Maintenance is Medium;
If Road is Big and Snow is Small Then Maintenance is Medium; (0.5)
If Road is Big and Snow is Big Then Maintenance is Big. (0.9)
Case study
classify_into S
select *
from Table
where roads is Small and snow is Small;
classify_into M
select *
from Table
where (roads is Small and snow is Big) or (roads is Big and snow is Small);
classify_into B
select *
from Table
where roads is Big and snow is Big.
Case study
If classical classification were used, this additional
valuable information would remain hidden (Softer
classification between objects T1-T4).
Implementation
User
Selection
Knowledge
base
IF-THEN
rules
Fuzzy
SQL
Data
base
User
Ci
Cj
Classification
SQL and fuzzy approach
SQL queries are useful when a clean and exact boundary
between selected and non selected data is required (faster and
less calculations).
Fuzzy queries provide flexibility for the definition of query
and inclusion of records that almost meet the query criterion
(more operations, more information).
User decides which type of query is better for each task.
Tools based
on HC
Data
base
Tools based
on SC
Conclusion
This approach allows users of statistical information systems
to use their approximate reasoning during work with data.
When users work with usual software tools they have to
change their many-valued logical thinking (approximate
reasoning) into the two-valued computer logic.
This fuzzy approach supports work with linguistic
expressions on the client side, nevertheless it does not need
any modification of relational databases.
Thank you for your attention