The Third Extended File System with Copy-on
Download
Report
Transcript The Third Extended File System with Copy-on
Automated Physical Design in
Database Caches
T. Malik, X. Wang, R. Burns
Johns Hopkins University
D. Dash, A. Ailamaki
Carnegie Mellon University
Hopkins Storage Systems Lab, Department of Computer Science
Outline
Target Application: Proxy caches for SkyQuery
Physical Design in Proxy Caches
– Need for vertical partitioning
– Workload evolution
Online Vertical Partitioning
– Simple scenario: Two configuration
– General scenario: N configurations
Experiments
Hopkins Storage Systems Lab, Department of Computer Science
SkyQuery
Publicly accessible federation of sky surveys (a
virtual telescope with terabyte data sets)
Autonomous, heterogeneous, and
geographically distributed sites
Data intensive, read-only workload
Scaling through proxy caching
–
–
Minimize network traffic
Offload query processing
Hopkins Storage Systems Lab, Department of Computer Science
Bypass Caching (Malik et al., ICDE’05)
Proxy database cache for SkyQuery
–
–
Brings columns closes to users
Economic model for bypassing queries
Hopkins Storage Systems Lab, Department of Computer Science
The Need for Vertical Partitioning
Poor I/O performance in the cache
–
Mirrors the backend DB design
–
Index-free environment
Largest relations groups 446 columns
Auxiliary data structures (indices/views) pollute cache
Offsets response time benefits from network savings
–
6x benefit with partitioning alone
Performance without redundant data
Hopkins Storage Systems Lab, Department of Computer Science
Is Partitioning Feasible?
Hopkins Storage Systems Lab, Department of Computer Science
Why Not Existing Solutions?
(ie. DB tuning advisor, Autopart)
Require representative workloads
–
–
Offline in nature
–
–
–
Not readily available
Astronomy workloads exhibit evolution
Invoked periodically
Costly to run
Ignore the cost of partitioning
Static design
–
–
Output a single configuration for the input workload
Ignores incremental changes within the workload
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Workload Evolution
Hopkins Storage Systems Lab, Department of Computer Science
Online Vertical Partitioning Problem
Hopkins Storage Systems Lab, Department of Computer Science
Two Configuration Scenario
Algorithm: Given current config C and an alternative C’,
transition if remaining in C incurs substantial overhead
Capturing overhead
–
–
–
Penalty
:
Cumulative Penalty
Max cumulative penalty
:
:
Transition if
3-competitive
–
–
After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C))
OPT incurs at least (k/2)(d(C,C’)+d(C’,C))
Hopkins Storage Systems Lab, Department of Computer Science
NConf: Extending to N-Configurations
Let Cy be the current config., Cx be the previous
config., transition to the first Cz (Cz≠Cy) satisfying:
Number of configurations is exponential (51 trillion
ways to partition 20 attributes)
Pruning heuristics
–
–
Neighboring configurations
Attribute Groups
Hopkins Storage Systems Lab, Department of Computer Science
Neighboring Configurations
Curr. Config: Cy
A1
A2
A3
A4
A1
A2
A1
A3
A2
A4
A4
A3
Neighbors of Cy
Small, incremental partitions
Lower threshold to overcome
Hopkins Storage Systems Lab, Department of Computer Science
A1
A3
A4
A2
Attribute Groups
qk: {A1,A3,A4}
Curr. Config: Cy
A1
A2
A3
A4
Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4}
A1
A2
A3
A4
weight+=qk(Cy)
A1
A2
A4
A3
weight+=qk(Cy)
A1
A2
A3
A4
weight+=qk(Cy)
A1
A2
A3
weight+=qk(Cy)
Candidate config if n.weight > d(Cx, Cy)+d(Cy,n)
Candidates with high weight benefits from repartitioning
Hopkins Storage Systems Lab, Department of Computer Science
A4
Experiments
TPC benchmark in SQL Server 2000
Partition orders relation using select queries
Two 10k workloads
–
–
WkldSky: Evolving access pattern that
approximates SkyQuery
WkldConst: Access pattern remains unchanged
AutoPart: an offline partitioning tool
(Papadomanolakis et al.)
Hopkins Storage Systems Lab, Department of Computer Science
Query Performance
Hopkins Storage Systems Lab, Department of Computer Science
Estimated I/O and Transitions
Hopkins Storage Systems Lab, Department of Computer Science
Future Work
Impact of cache replacement policy
–
–
Database state change periodically
Reuse work to find new partitions
Scaling to SkyQuery with thousands of attributes
Fast techniques for cost estimation
Integration of index selection in caches
Hopkins Storage Systems Lab, Department of Computer Science
Conclusion
Proxy caches present a dynamic environment
Vertical partitioning improves performance
without adding redundant data
Online vertical partitioning
–
Balances query execution performance with cost of
transitioning
Experiments show 17% improvement by
partitioning a single table
Hopkins Storage Systems Lab, Department of Computer Science
Questions
???
Hopkins Storage Systems Lab, Department of Computer Science