Transcript lecture13

S3: A Secure Scalability Service
for Dynamic Content
Bruce Maggs
Carnegie Mellon University
and
Akamai Technologies
Joint work with Charlie Garrod and Amit Manjhi
and Natassa Ailamaki, Phil Gibbons, Todd Mowry, Chris Olston, and Anthony Tomasic.
Number of requests a website
receives is unpredictable
CNN, NY Times, ABC News
unavailable from 9-10 AM
(Eastern Time)
Page views/day
(in millions)
CNN.com
150
9/11*
100
50
Usual
0
Content providers’ dilemma: how many resources to provision?
Need on-demand scalability
Content Delivery Network (CDN) Solution
CNN.com
Normal
Page views/day
(in millions)
800
50k
12-Sep-01
600
400
1.2k
200
0
50k
Election day
(Nov 2), 2004
Page was 1.2k instead
of 50k on 12 Sep, 01
Used Akamai on
Election day
Source: http://www.tcsa.org/lisa2001/cnn.txt
http://www.akamai.com/en/html/about/press/press479.html
Typical Web-Site Architecture
Request
Users
Execute Access
code
DB
DB
App
Web
Server Server
Home server
Response
CDN Architecture
Internet core
Users
CDN nodes
Content
providers
CDNs excel at delivering static content.
Advantages of CDNs
• Large infrastructure handles load spikes
• Clients charged on a per-usage basis
– no need to guess what resources to provision
• Moves data closer to end-users
– decreases latency and increases throughput
CDN Application Services
CDN’s can also run applications
Internet
DB
Users
but for data-intensive dynamic applications…
database server becomes the bottleneck!
Methods to scale the database
component
• In-house database scalability: [DBCache, DBProxy,
MTCache, NEC Cache Portal]
– Must provision for peak load
• Database outsourcing: Database as a service
[Hacigumus+ ICDE ’02, SIGMOD ’02]
– Have to cede control of data
• Database Scalability Service (DBSS): Shared
infrastructure that caches applications’ data
[INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07]
S3 Database Scalability Service
• CDN-like proxy nodes cache results of
database queries
– reduces load on central database servers
• All database updates sent to central server
– clients don’t cede ownership of their data
• Uses publish/subscribe system to maintain
data consistency
– avoids additional load at the central server
• Content provider may encrypt database
requests/responses to protect sensitive data
Database Scalability Service
users:
Content Delivery
Network
DBSS
Internet
home server
databases:
Database Scalability Service
users:
Internet
Web and application
servers
DBSS
home server
databases:
Database Scalability Service
client apps:
DBSS
Internet
home server
databases:
Outline
•
•
•
•
Need for on-demand scalability
S3 invalidation mechanism
Security-scalability tradeoff
Reducing latency
Addressing consistency
• TTL is wasteful:
– Often refresh cached data unnecessarily (workloads
dominated by reads)
– Must set TTL=0 for strong consistency!
• Solution: update or invalidate cached data only
when affected by updates
– Naïve approach: home organizations notify proxy
servers of relevant updates  not scalable
Our approach:
Fully-distributed, proxy-to-proxy
update notification mechanism
Distributed Consistency Mechanism
update
users
update notification
proxy node
Multicast
Environment
update
notification
• Distributed app-level multicast environment, e.g., Scribe
• Forward all updates to backend home servers
Configuring Multicast Channels
• Key observation: Web applications typically
interact with DB via a small, fixed set of
query/update templates (usually 10-100)
• Example:
SELECT qty FROM inv WHERE id = ?
UPDATE inv SET qty = ? WHERE id = ?
Templates: natural way to configure channels
Options:
Channel-by-query or Channel-by-update
Channel-by-Query Option
• One channel per query template Q: C(Q)
Begin caching
result(s) of query
template Q
Subscribe to C(Q)
Evict only query
result for Q
Unsubscribe from C(Q)
Issue update
Determine which query templates
Q1, …, Qn affected; send notification
on each C(Qi)
• Few subscriptions/cached result
• Many invalidation notifications/update
Conflicts determined lazily (upon update)
Channel-by-Update Option
• One channel per update template U: C(U)
Begin caching
result(s) of query
template Q
Determine which update templates
U1, …, Un apply; subscribe to each
C(Ui)
Evict only query
result for Q
Unsubscribe from all C(Ui) above
Issue update using
Send notification on C(U)
template U
• Many subscriptions/cached result
• Few invalidation notifications/update
Conflicts determined eagerly (when caching Q)
Parameter-Specific Channels
• Optimization: consider parameter bindings
supplied at runtime … for example:
• Q5: SELECT qty FROM inv WHERE id = ?
– When issued with id = 29, create extra parameterspecific channel C(5, 29)
– Subscribe to both C(5) and C(5, 29)
• Upon update:
– If update affects a single item with id = X, send
notification on channel C(5, X)
• Saves work if X  29
– Updates affecting multiple items sent to C(5)
S3 Prototype
•
•
•
Tomcat as proxy web server/servlet container
Proxy database cache written in Java
Queries: access cached data when possible
–
–
•
•
•
•
Cache JDBC query results (i.e., materialized views)
Index results by JDBC query representation
MySQL4 as back-end database
Updates: sent to back-end database
Invalidation notifications delivered via Scribe
Experiments on Emulab (Utah) – Thanks!
Benchmark Applications
• Bookstore (TPC-W, from UW-Madison)
– Online bookseller, a standard web benchmark
– Changed the popularity of books
• Auction (RUBiS, from Rice)
– Modeled after Ebay
• Bulletin board (RUBBoS, from Rice)
– Modeled after Slashdot
Benchmarks model popular websites
Selective: cache queries only if subscribed to
parameter-dependent groups
Impact of Cooperative Caching
Throughput (WIPS)
250
200
NoProxy
150
NoCache
100
SimpleCache
Ferdinand
50
0
bookstore brow sing mix
bookstore shopping mix
auction
Outline
•
•
•
•
Need for on-demand scalability
S3 invalidation mechanism
Security-scalability tradeoff
Reducing latency
Guaranteeing security in a DBSS setting
Limit ability to observe an application’s data by:
– DBSS administrator
– Unauthorized application through the DBSS
Security-Scalability tradeoff in the DBSS setting
Analyzing the code helps in managing this tradeoff
A simple solution for guaranteeing security
• Outsource database scalability
– Home server: master copies of all data—
handles updates directly
• No query execution on the DBSS
– DBSS caches query results (read-only)—kept
consistent by invalidation
All data passing through the DBSS can be encrypted:
Query, Update, Query results
A Simple Example
toys (toy_id, toy_name)
No Invalidations
Q1:toy_id=15
Q1: toy_id=15
Empty
Q1
U1
11 Barbie
Nothing is
encrypted
15 GI Joe
DBSS
Home server Database
Q1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe”
U1: DELETE FROM toys WHERE toy_id=5
Invalidate
EmptyResult
Q1:
Q1
U1
Q1: Result
11 Barbie
15 GI Joe
Results
are
encrypted
More encryption leads to more invalidations
Challenge: providing scalability
while guaranteeing security
When updates occur, DBSS needs to invalidate
Application faces a dilemma in what data to encrypt (secure)
More encryption
Less encryption
Conservative Invalidation
Precise Invalidation
Security
Scalability
Security-scalability tradeoff
Opportunity for managing the tradeoff
Not all data is equally sensitive
Data Sensitivity
Completely
insensitive
Moderately
sensitive
Extremely
sensitive
Bestsellers
list
Inventory records,
customer records
Credit Card
Information
Don’t care
Care but worried about
scalability impact
Secure at
all costs
But for most data, nontrivial to assess:
1. Data-sensitivity
2. Scalability impact of securing the data
Key Insight: arbitrary queries and
updates not possible
function get_toy_id ($toy_name) {
$template:=“SELECT toy_id FROM toys
WHERE toy_name=?”;
$query:=attach_to_template ($template, $toy_name);
execute ($query);
…
}
Given templates:
Can statically identify data
not needed for precise invalidation
Data not useful for invalidation: examples
Example 1:
Q1: SELECT toy_id FROM toys WHERE toy_name=?
Q2: SELECT toy_name FROM toys WHERE toy_id=?
No data is needed for precise invalidation
Example 2:
Q1: SELECT toy_id FROM toys WHERE toy_name=?
U1: DELETE FROM toys WHERE toy_id=?
Query parameters are not needed for precise invalidation
(the query result is needed though)
Security without hurting scalability
Data not needed for invalidation
Can secure “for free” (without hurting scalability)
Security Conscious Scalability Approach
[SIGMOD ’06]
As a result,
Tradeoff has to be only managed over remaining data
Sample experiment: methodology
• Scalability: max # concurrent users with
acceptable response times
• Security: # templates with encrypted results
Users
5 ms
100 ms
Home server
CDN and DBSS
• California Privacy Law determined sensitive data
• Non-transactional invalidation
• Start with a cold cache
Benchmark Applications
• Bookstore (TPC-W, from UW-Madison)
– Online bookseller, a standard web benchmark
– Changed the popularity of books
• Auction (RUBiS, from Rice)
– Modeled after Ebay
• Bulletin board (RUBBoS, from Rice)
– Modeled after Slashdot
Benchmarks model popular websites
Security-Scalability Tradeoff
Q1
SELECT toy_id FROM toys WHERE toy_name=?
Q2
SELECT qty FROM toys WHERE toy_id=?
Q3
SELECT cust_name FROM customers WHERE cust_id=?
U1: DELETE FROM toys WHERE toy_id=5
Template
x
Scalability
Security
Blind
Template
Parameters Query
result
x
x
Statement
x
x
x
View
Invalidations
All Q1, Q2, Q3
All Q1, Q2
All Q1,
Q2 with toy_id=5
Q1 with toy_id=5
Q2 with toy_id=5
X denotes encrypted,
visible
Scalability (number of
concurrent users supported)
Magnitude of Security-Scalability tradeoff
View
Statement
Template
Blind
900
600
300
00
0
Auction
Bboard
Benchmark Applications
Bookstore
Security Results
Query data that can be encrypted “for free”
Parameters
and result
4
6
17
7
7
7
Result
18
12
14
Nothing
Auction
Bboard
Bookstore
Security Results in Detail
• Auction: The historical record of user bids was not
exposed
• Bboard: The rating users give one another based on
the quality of their posting
• Bookstore: Book purchase association rules
discovered by the vendor – customers who
purchase book A also purchase book B
Scalability (Number of
concurrent users supported)
Scalability Conscious Security Approach
(SCSA) to managing the tradeoff
900
Nothing
encrypted
SCSA
600
Everything
encrypted
300
0
0
5
10
15
20
25
Security (Number of query templates with encrypted results)
1. Easy to either get good scalability or good security
2. SCSA presents a shortcut to manage the tradeoff
30
Outline
•
•
•
•
Need for on-demand scalability
S3 invalidation mechanism
Security-scalability tradeoff
Reducing latency
Contributors to User Latency
Request, high latency
Response, high latency Web server App server
Database
Traditional architecture
high latency
CDN
DBSS
Database
DBSS architecture
A single HTTP request  Multiple database requests
42
Sample Web Application Code
function find_comments ($user_id) {
$template:=“SELECT from_id, body FROM comments
WHERE to_id=?”
$query:=attach_to_template ($template, $user_id)
$result:=execute ($query)
foreach ($row in $result)
print (get_body ($row), get_name (get_id ($row)))
}
(N+1) queries are issued because:
• Convenient for programmers to abstract database values
• No effect in the traditional setting
Found many examples in the benchmark applications
43
Reducing User Latency in a DBSS
Setting
Transformations to reduce number of round-trips
1. Group execution of queries: MERGING transformation
2. Overlap execution of queries: NONBLOCKING
transformation
44
Web Application Code
Transformed Code
Procedural
program with
embedded SQL
Transformed
program and SQL
Holistic
transformations
using src-to-src
compilers
The MERGING Transformation
www.ebay.com
John
Names of users who
have posted comments
about John
Content Delivery Network
1 Query
1. Find user_ids who
have made comments
2. For each user_id, find
name of the user
45
N Database
Queries Scalability
Service
High latency
The MERGING
Transformation
Find names of users who have commented about John
Names of users who
have posted comments
about John

1. Find user_ids who
have made comments
2. For each user_id, find
name of the user
SELECT from_id,
u.name
FROM comments, users
u
WHERE from_id = u.id
AND to_id = ?
Assuming constant cache hit rate, the #round-trips
to the database decreases by a factor of (N+1)
46
The NONBLOCKING
Transformation
www.amazon.com
John
Home page
Content Delivery Network
1. Greet user
2. Get names of
related books
Database Scalability Service
High latency
47
Issue queries concurrently to reduce latency
Applicability of the Transformations
% of dynamic runtime interactions
MERGING
NONBLOCKING
EITHER
100
75
50
25
0
Auction
Bboard
Bookstore
Either transformation applies to 25% (Auction), 75% (Bboard),
and 50% (Bookstore) dynamic runtime interactions
48
BBOARD Application:
Impact on Latency
Average latency in ms
Database
DBSS-DB latency
client-DBSS
1400
1050
700
350
0
No
49
Both
Transformations
Overall latency
decreases by 38%,
the DBSS-DB latency decreases by 65%
Impact of Latency on Scalability
Improved scalability
Scalability
Threshold
Latency curve
Latency
Reduced latency curve
Simultaneous users supported
Reducing latency improves scalability
50
Scalability (number of
concurrent users supported)
Effect of the Transformations on Scalability
Applying both transformations yield the best scalability
51