transparencies - Indico

Download Report

Transcript transparencies - Indico

Data consistency for Applications
using FroNtier/Squid
Luis Ramos, CERN
3D Meeting, January 2006
Agenda
1.
2.
3.
4.
5.
6.
Frontier Basics
Cache consistency issues with Frontier/Squid
Inconsistency Scenarios
Application Restrictions Summary
Conclusions
Appendix: Invalidation Mechanism
3D Meeting, January 2006
Luis Ramos, CERN
2
Frontier Basics



Frontier servlet generates query results as XML
documents from database queries submitted by clients
Frontier Client is an C/C++ API to send requests to
the Frontier servlet
FrontierAccess (Frontier POOL “plug-in”) uses
Frontier Client to access Frontier servlets
In this context, Frontier is a web-based approach
for generic DB access
3D Meeting, January 2006
Luis Ramos, CERN
3
Example - Frontier servlet (HTTP)
QUERY:
http://pcitdb03.cern.ch:8080/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOB&p1=...(SQL query encoded in base64)
REPLY:
3D Meeting, January 2006
Luis Ramos, CERN
4
Squid

Squid cache servers are placed between clients
and the Frontier servlet

Squid caches query results (XML documents)
and serves them to clients that ask for exactly
the same query
3D Meeting, January 2006
Luis Ramos, CERN
5
Cache Consistency Problem

Squid caches database query results for a fixed
time (HTTP TimeToLive)
set by Frontier server (7 days)
 time-based cache invalidation


Backend database change

Squid keeps serving stale data to clients
3D Meeting, January 2006
Luis Ramos, CERN
6
Cache Consistency Problem

If tables are created in the database, new queries will
refer them and results will not be in cache as tables are
new, no problem

If tables are dropped the cached results will be
wrong

BUT, if inserts or updates are made in existing
tables, cached data in Squids becomes stale!
3D Meeting, January 2006
Luis Ramos, CERN
7
Scenario: CREATE TABLE - OK

Cached query:


Database change


Create table tab3 (…);
New query:


Select * from tab1, tab2 where …
Select * from tab1, tab3 where …
Query not cached, OK
3D Meeting, January 2006
Luis Ramos, CERN
8
Scenario: DROP TABLE - KO

Cached query:


Database change


Drop table tab1 (…);
New query:


Select * from tab1, tab2 where …
Select * from tab1, tab2 where …
If query is cached, KO: wrong result
3D Meeting, January 2006
Luis Ramos, CERN
9
Scenario: INSERT - KO

Cached query:


Database change


insert into tab1 values (…);
New query:


Select * from tab1, tab2 where …
Select * from tab1, tab2 where …
If query is cached, KO: stale data
3D Meeting, January 2006
Luis Ramos, CERN
10
Scenario: UPDATE - KO

Cached query:


Database change


Update tab1 set … where …;
New query:


Select * from tab1, tab2 where …
Select * from tab1, tab2 where …
If query is cached, KO: stale data
3D Meeting, January 2006
Luis Ramos, CERN
11
Scenario: new object OK

Cached query:


Database change



insert into objs values (Y, …);
insert into attribs values (.., Y, …);
New query:


Select * from obj, attribs
where objs.ID = attribs.OBJ_ID and objs.ID = X
select * from objs, attribs
where objs.ID = attribs.OBJ_ID and objs.ID = Y
Query for object Y is not cached, OK
Queries on IDs of static objects, static cache is OK
3D Meeting, January 2006
Luis Ramos, CERN
12
Scenario: new attribute KO

Cached query:


Database change


insert into attribs values (.., X, …);
New query:


Select * from objs, attribs
where objs.ID = attribs.OBJ_ID and objs.ID = X
select * from objs, atribs
where objs.ID = attribs.OBJ_ID and objs.ID = X
Query for object X might be cached, KO
Queries on IDs of non static objects, static cache is KO
3D Meeting, January 2006
Luis Ramos, CERN
13
Restrictions with static cache



Table drops can lead to wrong query results
Data updates can lead to wrong query results
Inserts need special care
ID based queries are OK
 Otherwise, KO


when inserting in “attribs”, a force refresh is needed at
user application level for queries over “objs”
Will user applications respect these restrictions?
3D Meeting, January 2006
Luis Ramos, CERN
14
Present Status - problem

POOL Frontier plug-in has two types of
queries:


DB dictionary data and user data
To avoid stale cached data, the plug-in does
client side cache refresh for metadata queries
Stale data in cache may appear in user data queries
3D Meeting, January 2006
Luis Ramos, CERN
15
Invalidation Mechanism

Build a cache content invalidation mechanism over
Squid/Frontier/OracleDB


A way to invalidate cached query results when respective
tables are changed
Invalidation mechanism basic steps are:




Detect database changes
Detect which cache content is stale
Send invalidation messages to Squids
Purge cached content in Squids
3D Meeting, January 2006
Luis Ramos, CERN
16
Conclusions

Frontier alone does not grant data consistency

Applications must follow a set of rules to keep
data consistency (see slide 14)

Invalidation mechanism could be developed

Some ideas follow in appendix
3D Meeting, January 2006
Luis Ramos, CERN
17
Appendix - Invalidation Steps

1. Database changes detection

2. Stale cached queries detection

3. Invalidation propagation to Squids

4. Purge cached content in Squids
3D Meeting, January 2006
Luis Ramos, CERN
18
1. Database changes detection

Options:

Database triggers


View ALL_TAB_MODIFICATIONS


This view is updated off-line with up to 3 hours delay between table
update and registration in all_tab_modifications
Database auditing


data manipulation triggers (DML operations) can only be setup on table
level (not on database or schema level)
AUDIT INSERT TABLE, UPDATE TABLE, DELETE TABLE
BY ACCESS
WHENEVER SUCCESSFUL;
Oracle Log Miner


More info available and less performance overhead than auditing
Not so simple as DB auditing and implies setup time overhead
3D Meeting, January 2006
Luis Ramos, CERN
19
1. Database changes detection

Database auditing
Simple to configure
 Trigger over the table sys.aud$
 Trigger fires a stored procedure to start the
invalidation procedure

3D Meeting, January 2006
Luis Ramos, CERN
20
2. Stale cached queries detection

How to find pages to invalidate in Squids given the name of a
modified table?


A mapping between tables and queries
Frontier servlet query strings could be modified to ease this mapping

Whenever there’s a query to the servlet it must store the query
and the tables somewhere

When a table is modified all queries with that table are
invalidated


Danger of invalidating objects that are still valid (over-invalidation)
Invalidation procedure can be tricky (invalidation rules)
3D Meeting, January 2006
Luis Ramos, CERN
21
2. Stale cached queries detection

Logging queries, clients and tables affected

Two logging options:

Log module in Frontier servlet (as a servlet wrapper)


OR
Some script running over Apache logs
3D Meeting, January 2006
Luis Ramos, CERN
22
3. Invalidation propagation to Squids

After having a list of queries to invalidate we
need to know:

What caches requested the query?


Easy to register except with hierarchical caches
Where are those caches?
Caches must be registered in server
 The cache hierarchy (topology) must be also registered

3D Meeting, January 2006
Luis Ramos, CERN
23
4. Purge cached content in Squids

Two options:

Purge HTTP command


Squid purge tool


one object at a time
regular expressions for purging multiple objects with one
command
Performance tests could be done
3D Meeting, January 2006
Luis Ramos, CERN
24
Questions
3D Meeting, January 2006
Luis Ramos, CERN
25