Transcript PPT

Real-time
Query Processing
Roger Blake
CSC 536
May 2, 2005
Real-time Query Processing - Agenda
•
•
•
•
Description of RTDBs
Aspects of real-time query processing
Real-time query approaches
Query processing in conventional and
real-time DBMS’s
Description of real-time databases
• In order to be a real-time database, it
needs to have:
– Timing constraints (deadlines), and
– Temporal data (data validity related to
time)
• But you already knew that…what
may not have been so clear are some
of the misconceptions about real-time
databases…
Description of real-time
databases
• Here are a few misconceptions about realtime databases that have persisted
– Must be special purpose and highly specialized
– Must be fast
– Must be in-memory
• There is not complete consensus,
especially in practice, about what a realtime database even is
Aspects of real-time query processing
Soft:
Firm:
Timing
A query missing a
A query missing a
constraints
deadline
deadline means the
results in less
results are useless
(deadlines)
value than if
Hard:
A query missing a
deadline means
system failure
the deadline
were met
Topology
Distributed: A Single node:
Periodicity
and
timings
query
references
data on more
than one
machine
Embedded:
A query uses data on a
single machine,
which is possibly at
least partially on
disk
The real-time database
are queries are
entirely in-memory
and make use of a
dedicated
processor
Dynamic:
Extended Static:
Static:
Queries and their
timings are
not known in
advance, and
can not be
estimated or
bounded
Queries and their timings
are not known in
advance but can
either be bounded or
estimated with a
degree of certainty
Queries and their
timings are known
in advance and are
deterministic
Real-time query approaches
• Priority Memory Management (PMM)
algorithm
–
–
–
–
Prioritizes queries by EDF
Queues and admits queries
Allocates memory to each query based on min/max
Adjusts memory allocations based on resource
utilization
Real-time query approaches
• B-tree indexing for real-time queries
– Starting point is a conventional B-tree
with queries prioritized by EDF
Real-time query approaches
• B-tree indexing for real-time queries (cont’d)
– As each query searches a node, it locks that node
and all its child nodes for either read or write.
– If there is a conflict, the query with the lowest
priority restarts at the top of the tree
– Rebalancing operations have the lowest priority
Real-time query approaches
• R-tree – similar to algorithms used
for non-real-time distributed queries
4 physical locations mapped by the R-tree
Real-time query approaches
• R-tree adapted for real-time queries:
Real-time query approaches
• Freshness for base
and derived real-time
data
– Prioritizes queries by
EDF
– Calculates Access
Update Ratio (AUR)
for each data item
(access frequency /
update frequency)
– Uses (and adjusts) an
AUR threshold for
when data is updated
Query processing in conventional
and real-time DBMS’s
• A conventional DBMS has several
cross purposes with a real-time
DBMS.
– Multiple platforms or versions 
abstraction from hardware/OS
– Measured by TPC-D  performance
rules
– ACID properties inviolate  may be
incompatible with real-time constraints
Query processing in conventional
and real-time DBMS’s
• But conventional DBMS’s are
sometimes used for data with
temporal aspects, e.g. decision
support systems (DSS)
select city.name,
sum(package.weight)from package
/*real-time*/, city/*stored*/where
package.weight > 100 and
package.service = 'priority' and
package.zip = city.zip group by
city.name
Query processing in conventional
and real-time DBMS’s
Conventional DBMS's
Real-time DBMS's
Timing constraints
(deadlines)
Soft
Firm
Hard
Topology
Distributed
Single node
Embedded
Periodicity and
timings
Dynamic
Extended Static
Static
Real-time query processing
• Algorithms have been developed for many
aspects of real-time query processing
• Conventional DBMS’s don’t handle realtime queries except by ‘brute force’ and
possibly some of the algorithms we’ve
seen
• Challenges remain for real-time query
processing
–
–
–
–
Security
Fault tolerance
Scalability
Generalization