Tree-Structured Indexes

Download Report

Transcript Tree-Structured Indexes

Introduction to Compilers
Jianlin Feng
School of Software
SUN YAT-SEN UNIVERSITY
Computers run 0/1 strings
(machine language program)
0010001000000100
0010010000000100
0001011001000010
0011011000000011
1111000000100101
0000000000000101
0000000000000110
0000000000000000
A machine language program
that adds two numbers
•First 4 bits for opcode
•Last 12 bits for operands
Source:
Louden and Lambert’s book:
Programming Languages
Programmers write more readable
character strings
An assembly language program that adds two numbers, from Louden and Lambert’s book.
Even more readable character strings:
high-level languages

Imperative Languages: specifies HOW







Fortran
ALGOL
PASCAL
C
C++
Java
Declarative Languages: specifies WHAT

SQL, ML, Prolog
Models of Computation in Languages
Underlying most programming languages is a model of
computation:
Procedural: Fortran (1957)
Functional: Lisp (1958)
Object oriented: Simula (1967)
Logic: Prolog (1972)
Relational algebra: SQL (1974)
Source: A. V. Aho. Lectures of Programming
Languages and Translators
What is a compiler?

A Compiler is a translator


between computers and programmers
More generally speaking, a Compiler is a
translator between source strings and target
strings.



between assembly language and Fortran
between Java and Java Bytecode
between Java and SQL
Assembly language vs Fortran
Source: Stephen A. Edwards.
Lectures of Programming Languages
and Translators
The Structure of a Compiler
1.
2.
3.
4.
5.
6.
Lexical Analysis
Syntax Analysis (or Parsing)
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Translation of an assignment
statement (1)
Translation of an assignment
statement (2)
Translation of SQL query

Query can be converted to relational algebra
Relational Algebra converts to tree, joins form branches
Each operator has implementation choices

Operators can also be applied in different order!


SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5
(sname)(bid=100  rating > 5) (Reserves  Sailors)
sname
bid=100
rating > 5
sid=sid
Reserves
Sailors
Cost-based Query Sub-System
Queries
Select *
From Blah B
Where B.blah = blah
Query Parser
Usually there is a
heuristics-based
rewriting step before
the cost-based steps.
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Catalog Manager
Schema
Query Executor
Statistics
Motivating Example
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5




Cost: 500+500*1000 I/Os
By no means the worst plan!
Misses several opportunities:
 selections could be`pushed’
down
 no use made of indexes
Goal of optimization: Find faster
plans that compute the same
answer.
Plan:
(On-the-fly)
sname
bid=100
rating > 5
(On-the-fly)
(Page-Oriented
sid=sid Nested loops)
Sailors
Reserves
Alternative Plans – Push Selects
(No Indexes)
(On-the-fly)
sname
(On-the-fly)
sname
bid=100
bid=100
rating > 5
(On-the-fly)
(On-the-fly)
(Page-Oriented
sid=sid Nested loops)
(Page-Oriented
sid=sid Nested loops)
Sailors
rating > 5
(On-the-fly) Reserves
Reserves
500,500 IOs
Sailors
250,500 IOs
Alternative Plans – Push Selects
(No Indexes)
sname
(On-the-fly)
(On-the-fly)
sname
bid=100
(On-the-fly)
(Page-Oriented
sid=sid Nested loops)
(Page-Oriented
sid=sid Nested loops)
rating > 5
rating > 5
bid = 100
(On-the-fly)
(On-the-fly)
(On-the-fly) Reserves
Sailors
250,500 IOs
Sailors
Reserves
4250 IOs
500 + 1000 + 250 + 250*10