The Third Extended File System with Copy-on

Download Report

Transcript The Third Extended File System with Copy-on

Automated Physical Design in
Database Caches
T. Malik, X. Wang, R. Burns
Johns Hopkins University
D. Dash, A. Ailamaki
Carnegie Mellon University
Hopkins Storage Systems Lab, Department of Computer Science
Outline




Target Application: Proxy caches for SkyQuery
Physical Design in Proxy Caches
– Need for vertical partitioning
– Workload evolution
Online Vertical Partitioning
– Simple scenario: Two configuration
– General scenario: N configurations
Experiments
Hopkins Storage Systems Lab, Department of Computer Science
SkyQuery




Publicly accessible federation of sky surveys (a
virtual telescope with terabyte data sets)
Autonomous, heterogeneous, and
geographically distributed sites
Data intensive, read-only workload
Scaling through proxy caching
–
–
Minimize network traffic
Offload query processing
Hopkins Storage Systems Lab, Department of Computer Science
Bypass Caching (Malik et al., ICDE’05)

Proxy database cache for SkyQuery
–
–
Brings columns closes to users
Economic model for bypassing queries
Hopkins Storage Systems Lab, Department of Computer Science
The Need for Vertical Partitioning

Poor I/O performance in the cache
–
Mirrors the backend DB design

–
Index-free environment


Largest relations groups 446 columns
Auxiliary data structures (indices/views) pollute cache
Offsets response time benefits from network savings
–
6x benefit with partitioning alone

Performance without redundant data
Hopkins Storage Systems Lab, Department of Computer Science
Is Partitioning Feasible?
Hopkins Storage Systems Lab, Department of Computer Science
Why Not Existing Solutions?
(ie. DB tuning advisor, Autopart)

Require representative workloads
–
–

Offline in nature
–
–
–

Not readily available
Astronomy workloads exhibit evolution
Invoked periodically
Costly to run
Ignore the cost of partitioning
Static design
–
–
Output a single configuration for the input workload
Ignores incremental changes within the workload
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Online Vertical Partitioning Problem
Hopkins Storage Systems Lab, Department of Computer Science
Two Configuration Scenario


Algorithm: Given current config C and an alternative C’,
transition if remaining in C incurs substantial overhead
Capturing overhead
–
–
–


Penalty
:
Cumulative Penalty
Max cumulative penalty
:
:
Transition if
3-competitive
–
–
After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C))
OPT incurs at least (k/2)(d(C,C’)+d(C’,C))
Hopkins Storage Systems Lab, Department of Computer Science
NConf: Extending to N-Configurations

Let Cy be the current config., Cx be the previous
config., transition to the first Cz (Cz≠Cy) satisfying:

Number of configurations is exponential (51 trillion
ways to partition 20 attributes)
Pruning heuristics

–
–
Neighboring configurations
Attribute Groups
Hopkins Storage Systems Lab, Department of Computer Science
Neighboring Configurations
Curr. Config: Cy
A1
A2
A3
A4
A1
A2
A1
A3
A2
A4
A4
A3
Neighbors of Cy


Small, incremental partitions
Lower threshold to overcome
Hopkins Storage Systems Lab, Department of Computer Science
A1
A3
A4
A2
Attribute Groups
qk: {A1,A3,A4}
Curr. Config: Cy
A1
A2
A3
A4
Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4}
A1
A2
A3
A4
weight+=qk(Cy)


A1
A2
A4
A3
weight+=qk(Cy)
A1
A2
A3
A4
weight+=qk(Cy)
A1
A2
A3
weight+=qk(Cy)
Candidate config if n.weight > d(Cx, Cy)+d(Cy,n)
Candidates with high weight benefits from repartitioning
Hopkins Storage Systems Lab, Department of Computer Science
A4
Experiments



TPC benchmark in SQL Server 2000
Partition orders relation using select queries
Two 10k workloads
–
–

WkldSky: Evolving access pattern that
approximates SkyQuery
WkldConst: Access pattern remains unchanged
AutoPart: an offline partitioning tool
(Papadomanolakis et al.)
Hopkins Storage Systems Lab, Department of Computer Science
Query Performance
Hopkins Storage Systems Lab, Department of Computer Science
Estimated I/O and Transitions
Hopkins Storage Systems Lab, Department of Computer Science
Future Work

Impact of cache replacement policy
–
–



Database state change periodically
Reuse work to find new partitions
Scaling to SkyQuery with thousands of attributes
Fast techniques for cost estimation
Integration of index selection in caches
Hopkins Storage Systems Lab, Department of Computer Science
Conclusion



Proxy caches present a dynamic environment
Vertical partitioning improves performance
without adding redundant data
Online vertical partitioning
–

Balances query execution performance with cost of
transitioning
Experiments show 17% improvement by
partitioning a single table
Hopkins Storage Systems Lab, Department of Computer Science
Questions
???
Hopkins Storage Systems Lab, Department of Computer Science