Bitmap Index

Download Report

Transcript Bitmap Index

BITMAP INDEX
จัดทำโดย
นำยชนำกำนต์ สันติคุณำภรณ์
นำยธฤษพงศ์ ศิรบิ ูรณ์
นำงสำวศุภำภรณ์ ถ่ำนคำ
WHAT IS BITMAP INDICES ?
A bitmap index is a special kind of database
index that uses bit array.
Identifier
Gender
Bitmaps
F
M
1
Female
1
0
2
Male
0
1
3
Male
0
1
4
Unspecified
0
0
5
Female
1
0
Credit : Wikipedia.org
EXAMPLE
To illustrate how bitmap indices work, we
use the Titanic database as an example. Titanic
is a table containing 2,201 tuples described by
four attributes Class, Age, Gender and Survivor
(Table 1). A bitmap index built on the Survivor
attribute is presented in Table 2.
Table 1. Titanic Database
RowID
….
Class
Age
Gender
Survivor
1
1st
Adult
Female
Yes
2
3rd
Adult
Male
Yes
3
2nd
Child
Male
Yes
4
3rd
Adult
Male
Yes
5
1st
Adult
Female
Yes
6
2nd
Adult
Male
No
7
1st
Adult
Male
Yes
8
Crew
Adult
Female
No
9
Crew
Adult
Female
Yes
10
2nd
Adult
Male
No
11
3rd
Adult
Male
12
Crew
Adult
….
….
Male
Class
….
No
RowID
No
Crew
….
1st
Age
Gender
Survivor
Table 2. Bitmap Indices for Titanic’s Database
1
2
3
4
5
6
7
8
9
10
11
12
..
0
0
0
0
0
0
0
1
1
0
0
1
..
1
0
0
0
1
0
1
0
0
0
0
0
..
2nd
0
0
1
0
0
1
0
0
0
1
0
0
..
3rd
0
1
0
1
0
0
0
0
0
0
1
0
..
Child
0
0
1
0
0
0
0
0
0
0
0
0
..
Adult
1
1
0
1
1
1
1
1
1
1
1
1
..
Female
1
0
0
0
1
0
0
1
1
0
0
0
..
Male
0
1
1
1
0
1
1
0
0
1
1
1
..
No
0
0
0
0
0
1
0
1
0
1
1
1
..
Yes
1
1
1
1
1
0
1
0
1
0
0
0
..
PROPERTIES


Queries are answered using bit-wise operations
such as intersection (AND), and union (OR).
For some select queries "SELECT COUNT()...WHERE ... AND ...”
Bitmap(Survivor=”Yes”) AND Bitmap(Gender=”Male”)
Rowid
1
2
3
4
5
6
7
8
9
10
11
12
..
Survivor =“YES”
1
1
1
1
1
0
1
0
1
0
0
0
..
Gender = “Male”
0
1
1
1
0
1
1
0
0
1
1
1
..
AND
0
1
1
1
0
0
1
0
0
0
0
0
..
BITMAP-BASED DECISION TREES METHODS

The aim is to predict which classes of the Titanic
passengers are more likely to survive the wreck.
Those passengers are described by different
attributes which are:




Class={1st, 2nd, 3rd, Crew};
Age={Adult, Child};
Gender={Female, Male};
Survivor={No, Yes};
Bitmap Indices for Titanic’s Database
Class
Age
Gender
Survivor
Rowid
1
2
3
4
5
6
7
8
9
10
11
12
..
Crew
0
0
0
0
0
0
0
1
1
0
0
1
..
1st
1
0
0
0
1
0
1
0
0
0
0
0
..
2nd
0
0
1
0
0
1
0
0
0
1
0
0
..
3rd
0
1
0
1
0
0
0
0
0
0
1
0
..
Child
0
0
1
0
0
0
0
0
0
0
0
0
..
Adult
1
1
0
1
1
1
1
1
1
1
1
1
..
Female
1
0
0
0
1
0
0
1
1
0
0
0
..
Male
0
1
1
1
0
1
1
0
0
1
1
1
..
No
0
0
0
0
0
1
0
1
0
1
1
1
..
Yes
1
1
1
1
1
0
1
0
1
0
0
0
..
Survivor
COUNT1(Bitmap(Survivor = “Yes”))
COUNT1(Bitmap(Survivor = “No”))
Yes
No
711
1490
Gender
COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “Yes”))
COUNT1(Bitmap(Gender = “Male”) AND (Bitmap(Survivor = “No”))
Yes 367
No 1364
Yes 344
No 126
Male
Female
Survivor =“YES”
1
1
1
1
1
0
1
0
1
0
0
0
..
Gender = “Male”
0
1
1
1
0
1
1
0
0
1
1
1
..
AND
0
1
1
1
0
0
1
0
0
0
0
0
..
Survivor = “No”
0
0
0
0
0
1
0
1
0
1
1
1
..
Gender = “Male”
0
1
1
1
0
1
1
0
0
1
1
1
..
AND
0
0
0
0
0
1
0
0
0
1
1
1
..
IMPLEMENTATION

The stored procedure allows us to create the necessary bitmap
indices for a given training set and then build the decision tree.

The nodes of the decision tree are built by using an SQL query
that is based on the AND operation applied on its own bitmaps
and its parent bitmaps.

Then, the obtained And_bitmaps are used to count the
population frequency of each class in the node with simple
COUNT queries.
COMPLEXITY STUDY

Let’s
N: the total number of tuples in the training set
 K the number of attributes
 L: the average length, in bits, of each attribute
 A: the average number of values of each attribute

The size of the initial training set is N ∗ L ∗ K bits

Thus K bitmap indices are created with an average
number of A bitmaps for each index. Each bitmap
has a size of N bits
The size of the training set is N ∗A∗K bits
COMPLEXITY STUDY

In terms of time spent to data reading, we
consider that a bit is read in one time unit.

The total number of nodes on the ith depth level
can be approximated by Ai−1. Then, build the
whole decision tree, in the classical approach,
the reading time is : KLA  ( K  i) * A
K
i
i 1

To evaluate the gain in time, we build the
following ratio :
G : The polynomials
of higher degree
R−1 is of complexity
1
K
: The insignificant
ADVANTAGES
• Fast operations
— The most common operations are the bit-wise logical
operations
— They are well supported by hardware
• Easy to compress, potentially small index size
• Each individual bitmap is small and frequently
used ones can be cached in memory
• Available in most major commercial DBMS
[Database Management System]