Transcript 6.Peter_SW
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Dr. Daniel Barbará
Dr. Tomasz Imielinski
About Me
Peter Rosegger
5th year Computer Science
Specialization: Databases
Graduation: December 2007
Sleepers & Workaholics
Caching Strategies in Mobile Computing
Dr. Daniel Barbará
Professor at George Mason University
Several patents associated with mobile caching
Dr. Tomasz Imielinski
Professor at Rutgers University
Senior VP: Search Technology at Ask.com
1994
16 million cellular subscribers in US
1994
The Future of Mobile Computing
Use Habits:
Large # of users
Check weather, stocks, scores, etc.
Mobile between cells (& wireless networks)
Hardware:
Low-powered palmtop machines
Poor battery life
Narrow bandwidth
The Future of Mobile Computing
Query complex databases, but…
Frequently powered off to save battery
Frequently changing cells
Network traffic must be minimized to
conserve bandwidth
Why Caching is Important
Conserve:
1. COMPUTATIONAL RESOURCES
2. BATTERY LIFE
3. BANDWIDTH
Traditional Strategies Fail
Server lacks knowledge of:
Which units are in its cell
Which units are powered ON
Client caches cannot be tracked
The Solution
Purpose of Sleepers & Workaholics:
"…to propose a taxonomy of different
cache invalidation strategies and study
the impact of clients' disconnection
times on their performance."
Strategies
Timestamps (TS)
Amnesic Terminals (AT)
Signatures (SIG)
Control Strategy:
No Cache (NC)
Timestamps
-Cache entries have timestamps
-Synchronous, history based, uncompressed reports
SERVER:
Notify clients of identifiers of items changed within last w
seconds
CLIENT:
For each item in cache:
If in report, purge from cache
If NOT in report, update timestamp to current time
Amnesic Terminals
-Cache entries have identifiers
-Synchronous, history based, uncompressed reports
SERVER:
Notify clients of identifiers of items changed within last w
seconds
CLIENT:
For each item in cache:
If in report, purge from cache
If NOT in report, do nothing
Signatures
-Checksums calculated over value of data to form Signature
-Signatures combined using XOR
-Synchronous, state based, compressed reports
SERVER:
Server broadcasts the set of combined signatures
CLIENT:
Item in cache is declared invalid if it belongs to “too many”
unmatching signatures (suspected of being out of date)
Analysis
Calculate THROUGHPUT for each strategy…
L = time between invalidation report broadcasts
W = bandwidth
B C = # bits in the broadcast (invalidation reports)
# bits available for answering queries (cache misses)
LW BC
Analysis
T = THROUGHPUT; queries per interval handled by the system
h = cache hit rate, expressed [0, 1]
b q = # bits for a query
b a = # bits to answer a query
Traffic (in bits) due to cache misses
T(1 h)(bq ba )
Throughput
T(1 h)(bq ba ) LW BC
LW BC
T
(1 h)(bq ba )
Effectiveness of a Strategy
T
e
Tmax
Maximal Throughput
Server knows:
-What units are in the cell
-What those units have in their caches
Server can:
-instantaneously notify units when an item changes
BC 0
h MaximalHitRatio
Maximal Hit Ratio
The Hit Ratio achieved in ideal conditions:
MHR
e
e d
0
MHR
Maximal Throughput
BC 0
Tmax
h MaximalHitRatio
LW
(1 M.H.R.)(bq ba )
No Caching
-No invalidation report
-No intervals
BC 0
h 0
LW
Tnc
(bq ba )
Timestamps
LW n c (log( n) bT )
TTS
(bq ba )(1 hts)
Amnesic Terminals
LW n L log( n)
TAT
(bq ba )(1 hat )
Signatures
Consider the probability of false diagnosis:
Probability of a false positive
Probability of a false negative
1
TSIG
LW 6g( f 1)(ln( ) ln( n))
(bq ba )(1 hsig )
Asymptotic Analysis
Analyze throughput in extreme cases:
As probability of sleeping s0, s1
Analyze throughput as system parameters vary:
Database size
Update frequency
Bandwidth
Etc.
Workaholics
Unit sleeps less and less: s0
All hit ratios approach the same value
SIG lags behind TS and AT by a factor of
BEST THROUGHPUT:
AT, because its report is the shortest
pnf
Sleepers
Unit sleeps more and more: s1
All hit ratios approach 0
BEST THROUGHPUT:
No Caching eventually wins as s becomes very large
For practical purposes, SIG is the best choice
Infrequent Updates
Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth
Effectiveness as s ranges from 0 to 1
Update Intensive
Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth
Effectiveness as s ranges from 0 to 1
Conclusions on Effectiveness
Strategy depends on circumstances:
SIG is best for sleepers
TS is best for query-intensive scenarios, but…
AT is best for workaholics
How can we improve effectiveness?
Relax: Consistency of the Cache
Depending on data type, data may not need to
be exact…
EX: stocks, weather, etc.
Makes shorter invalidation reports possible
How Do We Decide to Update?
- Consider cached copies to be quasi-copies
- Each quasi-copy has a coherency condition
attached to it
Coherency Conditions:
Delay Condition - updated based on time
Arithmetic Condition - updated based on difference
between data and quasi-copy
Adaptive Invalidation Reports
-Start with TS strategy
Use algorithms to optimize strategy.
Examples:
If an item is queried very often by units that sleep
a lot, include it in reports for longer
If an item changes frequently, do not bother
caching
Criticism
Units rarely powered down
Battery life better than predicted
Battery life does not dictate use
Units still lose reception frequently
Today’s most common “sleeper” condition -explicitly excluded from definition in S&W
Bandwidth better than predicted
However…
Adjust “sleeper” to include lost reception
Caching is still important
Endless demand for computational resources
Endless demand for battery life
Endless demand for more bandwidth