CS257_Sec1_118_PPT2_Chapter_15.1

Download Report

Transcript CS257_Sec1_118_PPT2_Chapter_15.1

Query Execution
Section 15.1
Anilkumar Panicker
CS257: Database Systems
ID: 118
Section 1
Agenda
• Query Processor
• Query compilation
• Physical Query Plan Operators
– Scanning Tables
• Table Scan
• Index scan
•
•
•
•
•
Sorting while scanning tables
Model of computation for physical operators
Parameters for measuring cost
I/O cost for scan operators
Iterators
Query processor
• The query processor is the group of
components of a DBMS that turns user
queries and data-modification commands
into a sequence of database operations
and executes those operations
• query processor is responsible for
supplying details regarding how the query
is to be executed
The major parts of the query
processor
Query compilation
• Query compilation itself is a multi-step
process consisting of :
– Parsing: in which a parse tree representing
query and its structure is constructed
– Query rewrite: in which the parse tree is
converted to an initial query plan
– Physical plan generation: where the abstract
query plan is turned into a physical query plan
Outline of query compilation
Physical Query Plan Operators
• Physical query plans are built from
operators
• Each of the operators implement one step
of the plan.
• Physical operators can be
implementations of the operator of
relational algebra.
• They can also be non relational algebra
operators like “scan” which scans tables.
Scanning Tables
• One of the most basic things in a physical
query plan.
• Necessary when we want to perform join
or union of a relation with another relation.
Two basic approaches to locating
the tuples of a relation R
• Table-scan
– Relation R is stored in secondary memory
with its tuples arranged in blocks
– it is possible to get the blocks one by one
– This operation is called Table Scan
Two basic approaches to locating
the tuples of a relation R
• Index-scan
– there is an index on any attribute of Relation
R
– Use this index to get all the tuples of R
– This operation is called Index Scan
Sorting While Scanning Tables
• Why do we need sorting while scanning?
– the query could include an ORDER BY
clause. Requiring that a relation be sorted
– Various algorithms for relational-algebra
operations require one or both of their
arguments to be sorted relation
– physical-query-plan operator sort-scan takes
a relation R and a specification of the
attributes on which the sort is to be made, and
produces R in that sorted order
Model of Computation for
Physical Operators
• Choosing physical plan operators wisely is
an essential for a good query processor.
• Cost for an operation is measured in
number of disk i/o operations.
• If an operator requires the final answer to
a query to be written back to the disk, the
total cost will depend on the length of the
answer and will include the final write back
cost to the total cost of the query.
Improvements in cost
• Major improvements in cost of the physical
operators can be achieved by avoiding or
reducing the number of disk i/o operations
• This can be achieved by passing the
answer of one operator to the other in the
main memory itself without writing it to the
disk.
Parameters for Measuring Costs
• Parameters that affect the performance of
a query
– Buffer space availability in the main memory
at the time of execution of the query
– Size of input and the size of the output
generated
– The size of memory block on the disk and the
size in the main memory also affects the
performance
I/0 Cost for Scan Operators
• If Relation R is clustered, i.e. it is stored in
approximately B blocks then the total
number of disk operations required is B
• If R is clustered but requires a two phase
multi way merge sort then the total number
of disk i/o required will be 3B.
• However, if R is not clustered, then the
number of required disk I/0's is generally
much higher.
Iterators for Implementation of
Physical Operators
• Many physical operators can be
implemented as an iterator
• It is a group of three functions that allows
a consumer of the result of the physical
operator to get the result one tuple at a
time
Iterator
• The three functions forming the iterator
are:
Open:
• This function starts the process of getting
tuples.
• It initializes any data structures needed to
perform the operation
Iterator
GetNext
• This function returns the next tuple in the
result
• Adjusts data structures as necessary to
allow subsequent tuples to be obtained
• If there are no more tuples to return,
GetNext returns a special value NotFound
Iterator
Close
• This function ends the iteration after all
tuples
• it calls Close on any arguments of the
operator