fuzzy sql for statistical databases

Download Report

Transcript fuzzy sql for statistical databases

FUZZY SQL FOR
STATISTICAL DATABASES
Miroslav Hudec
INFOSTAT – Bratislava
MSIS 2008
Introduction
•
•
•
•
•
Classical SQL and its disadvantages
Fuzzy improvement
Generalised “where” clause
Case study for statistical database
Conclusion
Classical SQL
select n, a1, a2,…an
from T
where a1>A1 and a2<A2
a2
A2
0
A1
a1
Classical SQL
select n, a1, a2,…an
from T
where a1>A1-p
and a2<A2+q
a2
A2
0
A1
a1
Fuzzy improvement of the SQL
1.
Accesses relational databases in the unchanged
structure
2.
Supports queries based on linguistic expressions on
the client side
The query is modified as follows:
select n, a1, a2,…an
from T
where a1 is Big and a2 is Small
Fuzzy sets for queries
Big (greather than)
Small (less than)
attribute
attribute
1
1
0 Ld
Lp
B
0
Middle (equal,…)
attribute
1
Lp
Lg
0
Ld
Lp
Lq
S
Lg
M
___________
___________
___________
The query:
select n, a1, a2,…an
from T
where a1:>=Ld
The query:
select n, a1, a2,…an
from T
where a1:<=Lg
The query:
select n, a1, a2,…an
from T
where a1>=Ld and
a1<=Lg
Generalised logical condition
n
WHERE clause with fuzzy conditions only: where  (ai  Lix )
i 1
where n denotes number of attributes with fuzzy constraints in a WHERE
clause of a query,
and

 or
where and and or are fuzzy logical operators
 ai  Lid ,
a i is Big

ai  Lix  ai  Lig ,
a i is Small
 a  L and a  L , a is About
id
i
ig
i
 i
where ai is a database attribute and L is the parameter of a fuzzy set
WHERE clause with fuzzy and classical constraints
n
where  (ai  Lix ) [and/or][atribute_m LIKE “*String”][and/or] [atribute_l<Date]
i 1
Calculation of the QCI
The QCI values for selected records are calculated in next two steps:
1. Calculation of memebership degree to fuzzy sets
2. Calculation of query satisfaction:
- for logical And operator
min: QCI  min(  F (x ai )) i=1,...,n
- for logical Or operator
max: QCI  max(  F (x ai )) i=1,...,n
Example
select district, unemployment, area
from T
where unemployment is Big and area is Small.
Unemployment is described with„Big value“ fuzzy
set and its parameters are: Ld=8% and Lp =10%.
Area is represented with „Small value“ fuzzy set
with parameters Lp=300km2 and Lg =650km2.
The query has this form:
select district, unemployment, area
from T
where unemployment >8 and area <650
Solution
Structure of the fuzzy SQL
Data
base
Generating fuzzy
SQL query and
calculating the
lower bound of
query satisfaction
Selected
records from DB
Calculation the
QCI for selected
records
Solution
User Interface
Conclusion
This fuzzy approach supports work with linguistic expressions on the
client side, nevertheless it does not change structure and processes on the
server side of relational databases.
Fuzzy improving of SQL queries has advantages in cases when the user
can not unambiguously separate data he is interested in from data he is
not interested in by sharp boundaries or when the user wants to obtain
data that are very close to satisfy queries. In other cases classical SQL
fulfils the requirements for data.
In further use the meaning of the query is not changed only shapes and
boundaries of linguistic expressions are changed to catch new requests.
The state of art of this approach depends also on the theoretical and
practical development of fuzzy database management systems.
Thank you for your attention