Cache Tables: Paving the way for an Adaptive Database Cache
Download
Report
Transcript Cache Tables: Paving the way for an Adaptive Database Cache
Cache Tables: Paving
the way for an Adaptive
Database Cache
Mehmet Altınel, Christof Bornhövd, C. Mohan,
Hamid Pirahesh, Berthold Reinwald
(IBM Almaden Research Center)
Sailesh Krishnamurthy
(Computer Science Division,UC Berkeley)
Presented by:
Umar Farooq Minhas
October 04, 2006
Motivation
Wide-spread use of Transactional Web Applications (TWA) in enterprise
applications
Broad range of components e.g. network load balancers, HTTP servers,
application servers, … , databases etc.
Issues
Response time
Scalability
Solutions
Caching of static HTML pages
Multiple level caches
2
Motivation contd..
Static Caching, Drawbacks
Run business logic in remote application servers close to end users
TWAs tend to be more & more dynamic
High volumes of data
Highly personalized contents
Reduced response time
Reduced load on in-house systems
Benefits are limited by the frequency with which remote server needs
to access backend DB
Proposed Solution: DBCache
Allows DB caching at mid-tier nodes, remote data centers and edge
servers
3
DBCache: Overview
Built using full-fledged DBMS, DB2
Reduced development effort
Allows caching of related DB objects
Triggers, constraints, indices , stored procedures, …
Makes use of existing distributed query execution
Provides cache transparency
Supports both full-table and partial-table caching
On-demand caching
Adapts to dynamically changing loads
Exploits typical characteristics of TWA queries
4
DBCache: Contributions
Database cache model
Introduces a new DB object ‘Cache Table’
Dynamic/static caching support
Novel query re-write scheme
Cache load and maintenance mechanisms
5
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
6
Cache Tables
A Cache Table is a database object by which an end user can specify
that a table (cache table) in a database (cache database) is a cache of
a table (backend table) in another database (backend database)
Cache
Table
Back
end
Table
Cache DB
Backend DB
Two types of cache tables supported:
Declarative/Static Cache tables
Dynamic Cache tables
7
Declarative/Static Cache Tables
When table contents static and known upfront
Use declarative cache tables
Similar to materialized views
Entire table cached in absence of predicate definition
Exploits existing materialized view support in DB2
8
Dynamic Cache Tables
Populated on-demand
Provides adaptability
Can choose to cache only “hot” items
9
DBCache Schema Setup
Cache schema exact mirror of backend DB schema
Each backend DB table represented by
Cache Table or
Nickname (caching disabled)
Requires no change in existing queries
Allows caching of other relevant logical and physical objects
10
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
11
Dynamic Cache Model
Key concepts
Cache Keys
Defined on cache table column
Can be non-unique
Must be ‘domain-complete’
Unique/Primary key columns complete by definition
Guarantees correctness of equality predicates
12
Dynamic Cache Model
Key concepts contd..
Referential Cache Constraints (RCCs)
Defined between any cols of two cache tables
Creates a cache-parent/cache-child relationship
Guarantees the correctness of equi-join predicates
Somewhat similar to referential integrity constraints
13
Dynamic Cache Model
Key concepts contd..
Cache Groups
Set of related cache tables whose content is (directly or transitively)
populated by the values of one or more cache keys of a single cache
table, called the root table.
Tables reachable by RCC constraints from the root table are called
member tables
Advantages
Application context recognized more easily
Helps avoiding conflicting cache constraints
14
Dynamic Cache Model
Key concepts contd..
Cache Groups contd..
Represented by a directed graph called cache group graph, nodes
denote cache tables and edges denote RCCs
Direction of an edge for RCC is from a cache-parent to a cache-child
Bi-directional edges possible
Two or more groups can be overlapping
Captured in connectivity graphs
15
Dynamic Cache Model
Issues with Cache Constraints
Can cause unexpected cache loads resulting in a phenomena called
recursive cache load problem
A cache group is called safe if it avoids this problem
How to ensure group safety ?
16
Dynamic Cache Model
Rules for cache group safety
Rule-1: A cache group graph must not include any heterogeneous cycles.
Rule-2: A cache table must not have more than one non-unique domaincomplete column.
A new cache constraint is created only if it doesn’t violate Rule 1 and Rule 2.
17
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
18
Query Compilation
Declarative Cache Tables
Existing materialized view matching mechanism in DB2 is exploited
Name switching
Dynamic Cache Tables
Generate two plans local plan and remote plan
Choose at run-time through a switch operator which uses the probe query to
decide which leg to execute
Janus (two-headed) plan: derived from Roman Mythology
God of gates, doors, doorways, beginnings and endings. Month of January ?
19
http://en.wikipedia.org/wiki/Janus_%28mythology%29
Query Compilation
Constructing a Janus Plan:
1
Initial Query
Plan
Replace Cache Table
names with Nicknames
Remote Query
Plan
2 Generate a probe query by checking all equality predicates that
can potentially participate in probe query condition
if none found then ABORT ( remote query plan gets executed )
3
Cloned Input
Query Graph
Replace Nicknames with
eligible Cache Table names
from step - 2
Local Query
Plan
4 Insert switch operator on top of remote, local and probe query plans
20
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
21
Cache Table Population & Maintenance
Declarative Cache Tables
Relies on DPropR utility:
IBM’s asynchronous data
replication tool
Dynamic Cache Tables
On-demand loading
Cache key values failing
probe query are used to
extract data
Extracted data populated
asynchronously by a cache
daemon
Cache invalidation
Generate invalidation
messages and send to
cache daemon
Cache daemon generates
and executes deletes against
cacheDB
Updated rows get loaded
with new requests
22
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
23
Performance Evaluation
Focus: Evaluate overhead of Janus plans for dynamic tables
Overhead of probe query and switch operator
Overhead of on-demand loading
Experimental settings
24
Performance Evaluation
Cache Hit Case
Janus plan vs. pure local queries
Difference gives the overhead for probe query and the switch operator
Cache table loaded with all the data from backend table
25
Performance Evaluation
Cache Miss Case
Janus plan vs. pure remote queries
Difference gives the overhead
Cache table initially empty
26
Outline
Motivation
DBCache: Overview
Cache Tables
Dynamic Cache Model
Query Compilation
Cache Table Population and Maintenance
Performance Evaluation
Conclusions & Future Work
Discussion
27
Conclusions & Future Work
Significant contributions
Provides a new frame-work to implement DB caching for TWAs and tends to
provide:
Seamless integration with current applications
Supports static/dynamic cache tables
Adapts to the changing workloads in TWAs
Re-uses the functionality of a full-fledged DBMS i.e. DB2
What next ?
Provide efficient, scalable, zero-admin DBCache
Development of new tools to ease deployment
Improve adaptability and maintenance
28
Comparison
vs. amco05:
Relies on asynchronous data propagation utility
Not completely transparent
May not work for heterogeneous DBMSs
Allows stale data
vs. gula04:
Cache constraints against C&C constraints
Doesn’t provide any guarantees of freshness/consistency
Relatively more transparent
Maintenance-centric vs. query-centric
Both deployed as mid-tier level caches
Both use a full-fledged DBMS
Both use Materialized views
Both use two-headed query plans
29
Discussion
Is it really that good ?
Using full-fledged DBMS at each middle-tier node, drawbacks ?
How is data freshness specified/guaranteed ?
Is it adaptable ? Weakly ? Strongly ?
When can cache constraints become bottleneck ?
Size of dynamic cache tables ?
Caching of other physical & logical DB Objects ?
Updates to those objects in backend DB?
Message traffic between Cache Daemon & Backend DB ?
Cache replacement policies/cleansing mechanisms?
Very frequent updates in backend DB
Local updates ?
Flaws in performance evaluation ?
30