Transcript lecture 9

Index
Sen Zhang
INDEX
• When a table contains a lot of records, it
can take a long time for the search engine
of oracle (or other RDBMS) to look
through the table to locate specific
records.
Index Structures
• Secondary access structure used to speed
up the retrieval of records in response to
certain search conditions.
• Indexes are useful because they help you
to locate specific target within a large
amount of data without having to look
through every object.
A data table
First name
Last name
Resid
state
Year of
born
Year of
entering
John
Edward
VM
1989
2001
Kathy
Alex
NY
1987
2002
Joseph
bush
NJ
1983
2000
George
Clinton
CA
1983
1999
Alex
Jordan
VA
1991
2000
Bill
Herbert
AZ
1984
1999
James
Perl
SC
1986
1998
Narian
Geller
NC
1992
1999
Frank
Thomason
FL
1994
2005
Index table vs. data table
First
Index
pointer
First name
Last name
Resid
state
Year of
born
Year of
entering
Alex
5
John
Edward
VM
1989
2001
Bill
6
Kathy
Alex
NY
1987
2002
Frank
9
Joseph
bush
NJ
1983
2000
George
4
George
Clinton
CA
1983
1999
James
7
Alex
Jordan
VA
1991
2000
John
1
Bill
Herbert
AZ
1984
1999
joseph
3
James
Perl
SC
1986
1998
kathy
2
Narian
Geller
NC
1992
1999
Narian
8
Frank
Thomason
FL
1994
2005
• Logically speaking at a conceptual level, the
index is simply the row number.
• But physically, the index are pointers to the
precise position on external storage , i.e. disks
or memory (when loaded into memory.)
• They are pointers to the offsets of records in the
physical file.
• The file system actually will map these offsets to
a specific offsets in a specific sector in a specific
track on a specific disk.
Which attributes should be
indexed?
• A table could be associated with multiple
indexes.
• Which attribute(s) should be indexed?
The choice depends on user requirement,
what do you want from those indexes, and
depend on your applications, depend upon
each individual database designer or
database programmer’s understanding
toward the application.
How index will work in a dynamic
environment.
• Once you created an index on a table, oracle
automatically keeps the index synchronized with that
table.
• That means, when you insert a new record to the data
table, oracle will insert a pointer to the index table at the
right position at the same time. So every insertion will
take a little bit more time.
• Similarly, when delete, update are involved, the index
tables will also be involved, thus take a little bit more
time in this sense, but it is tricky.
• As a tradeoff, index will expedite (speed up) other data
manipulation operations such as delete, update and
select.
• Insertion to tables with indexes will take
longer time!
• So, it is not always a good idea to enforce
indexes to tables, especially for OLTP.
OLTP
• OLTP (On-Line Transaction Processing)
– supports a business? day-to-day activities
– high insertion rate, simple queries
– MB or GB
OLAP
• OLAP (On-Line Analytical Processing)
system or Data Warehouse
– analyzes operational data
– low update rate, complex queries
– GB or TB
– a strategic business decision: high rewards,
but low chance of success
DDL statement to create index
• Create index index_name on table_name(column_name_list);
• An index is an auxiliary way to organize
your data file based on the characteristics
of values contained in your data file.
Why index works
• Proper data structure
• Data manipulation related operations such
as inserting, deleting, updating, and
searching algorithm can achieve better
time complexities.
• For example, searching in an unsorted list
could be O(n); but searching in a sorted
list could be O(lgn), binary search for
example.
Binary search tree
• A binary search tree is a binary tree
where every node has a value, every
node's left subtree has values less than
the node's value, and every right subtree
has values greater. A new node is added
as a leaf.
Are these BSTs?
50
1 a BST?
25
12
75
45
66
50
2 a BST?
25
12
90
75
55
73
90
• Note that the worst case of this
build_binary_tree routine is O(n2) - if you feed it
a sorted list of values, it chains them into a
linked list with no left subtrees.
• For example, build_binary_tree([1, 2, 3, 4, 5])
yields the tree (None, 1, (None, 2, (None, 3,
(None, 4, (None, 5, None))))).
• There are a variety of balanced schemes for
overcoming this flaw with simple binary trees.
• Because databases cannot typically be maintained entirely in
memory (512M main memory is good ), b-trees or b* trees are often
used to index the data and to provide fast access.
• Theoretically speaking, searching an unindexed and unsorted
database containing n key values will have a worst case running
time of O(n); if the same data is indexed with a b-tree, the same
search operation will run in O(log n).
• For example, to perform a search for a single key on a set of one
million keys (1,000,000), a linear search will require at most
1,000,000 comparisons at the worst case. If the same data is
indexed with a b-tree of minimum degree 10, 114 comparisons will
be required in the worst case.
B*-tree
• It is a B-tree in which nodes are kept 2/3
full by redistributing keys to fill two child
nodes, then splitting them into three
nodes.
B-tree
Definition: A balanced search tree in which every
node has between m/2 and m children, where
m>1 is a fixed integer. m is the order. The root
may have as few as 2 children. This is a good
structure if much of the tree is in slow memory
(disk), since the height, and hence the number
of accesses, can be kept small, say one or two,
by picking a large m.
• Also known as balanced multiway tree.
• Generalization :balanced tree, search tree.
• Clearly, indexing large amounts of data
can significantly improve search
performance. Although other balanced tree
structures can be used, a b-tree also
optimizes costly disk accesses that are of
concern when dealing with large data sets.
view