Transcript Database

PicoDBMS: Scaling down
Database Techniques for the
Samartcard
Abstract
• Smartcards are the morst secure computing device today
• As smartcadrs get more powerful, and become multi
application, the need for database management arises.
• However smartcards have severe hardware limitations
The mojor problem is
SCALING DOWN DATABASE
TECHNIQUES SO THEY PERFORM
WELL UNDER THESE LIMITATIONS
Outline
• Introduction
• Demand for database in smartcard
applications
• Analysis of smartcard hardware
limitations&problem definition
• PicoDBMS storage model and query
execution model
• Conclusion
Introduction
• Smartcards are the most secure portable device today
• The first smartcard was developed by BULL for the French
banking system
• Smartcards can be used in many applications involving
money, proprietary data, personal data, TV or GSM
subsciber, identification, healthcare,etc...
• Standards for multiapplicacion support
– JavaCard
– Microsoft’s smartcard for Windows
Why do we need DBMS on smartcards?
•
•
•
•
•
Multiapplication, versatility
Improvement of prosessor power
Volume of data
Complexity of queries
Meet Transaction ACID properties
More generally;
DATABASE MANAGEMENT HELPS TO
SEPERATE DATA MANAGEMENT CODE FROM
APPLICATION CODE, THEREBY MAKING
APPLICATION CODE SMALLER
ACID properties
• Atomicity: In a transaction involving two or more discrete
pieces of information, either all of the pieces are
committed or none are.
• Consistency: A transaction either creates a new and valid
state of data, or, if any failure occurs, returns all data to its
state before the transaction was started
• Isolation: A transaction in process and not yet committed
must remain isolated from any other transaction
• Durability: Committed data is saved by the system such
that, even in the event of a failure and system restart, the
data is available in its correct state
Hardware limitations due to
•
•
•
•
Small size
Flexible plastic card
Low cost
Security
Sample Smartcard Block Diagram
Hardware Limitations
• Memory bottleneck (96Kb of ROM, 4Kb of RAM,
128Kb of stable storage like EEPROM)
• EEPROM has a very fast read time but very slow
write time (10msword)
• Lack of autonomy, no asenchronious and
disconnected prosessing
With these extreme constraints of the smartcard,
the major problem is scaling down database
techniques
Some solutions to scaling down
•
•
•
•
•
Sysbase Adaptive Server Anywere
Oracle 8i Lite
PDA
DB2 Anywhere
ISOL’s SQL Java machine DBMS
SCQL, The standart for smartcard database
language
Smartcard Applications
• Database Management Requirements
• Case Study:The Healthcard application
Types of smartcard applications
• Money and identification (Credit card)
Tiny volume of data , automicity requirement
• Downloadable databases (list of hotels)
High volume of data, availibility requirement
• User environment (address book)
Medium volume of data, access right, automicity,durability
requirement
• Personal folders (health care, car maintenance)
High volume of data, access rights
The Health Card application
• Personal Folder Application
• High volume of data is required ( Doctor
info, blood type, prescriptions)
• Different users can query, modify, create
data (Doctors, insurance agents)
Requirements due to observations
• Amount of data is significant
• Queries can be rather complex (Ex: doctor asks for
the last antibiotic prescribed to the patient)
• Sophisticated access right management
(aggregation, views)
• Automicity must be preserved
• Durability is mandatory
Storing database in a smart card rather than centralized
database contributes data availability, security and simplicity
Problem formulation
• Smartcard constraints
• Impact on the PicoDBMS Architecture
• Problem Statement
Smartcard Constraints
•
•
•
•
•
•
•
•
32 bir RISC processor about 30 MIPS (monolithic chip)
Very limited storage capacity
Very slow write time in EEPROM
2
Extremely reduced size of RAM
Lack of autonomy
High security level
Tiny die size (about 25 mm²)
Low cost (less than 5 dollars)
As the chip size is limited MORE RAM means LESS EEPROM
How Smartcard properties impact on
DBMS Architecture
• Highly Secure – access right management
• Highly portable- high availability
• Limited storage resources-memory
consumption during execution
• Stable storage is not main main memory
• Non autonomous
Design Components for a PicoDBMS
•
•
•
•
Storage manager
Query manager
Transaction manager
Access right manager
Desin criterias for Smartcard DBMS
•
•
•
•
•
•
Compactness rule (data,index)
RAM Rule
Write Rule
Read Rule
Access Rule
Security Rule
PicoDBMS Storage Model
• Existing Storage Models
• Storage Cost Evaluation
Existing Storage Models
• Flat storage
• Domain Storage
• Ring Storage
Flat Storage
• Simplest way to
organize data
• Tuples are stored
sequentially
• Has 2 drawbacks
– Space consuming
– Inefficient
• Indexed FS
Relation R
A xxxx
A yyyy
Domain Storage
• Data Compactness
• The amount of data to be
written is much smaller
• The length of the attribute
and length of the pointer
• Cardinality of the relation
• All tuples of all relations
become fixed size
• Tuple to value pointers
Relation R
Relation S
Value 1
Value 2
Value n
Ring Storage
• Index compactness + data
compactness
• Storing Value to Tuple
pointers instead of Tuple
to Value Pointers
• One pointer per domain
value whatever the
cardinality of the relation
is
Index on
S.a
Value 1
Value 2
Value n
Join Indices
• Typically based on key
attributes
• Ring index on a foreign
key attribute
• Unidirectional storage
model (Domain Storage)
• Bidirectional storage
Model (Ring Index)
• The cost of bi-directional
join is restricted to a
single pointer per R tuple
Storage Cost Evaluation
PicoDBMS storage model combines FS, DS,
and RS.
-Need for indexing attributes - the choice is
between RS and Indexed FS.
- No need for indexing attributes- the choice is
between FS and DS.
Comparison of the storage cost for a single
attribute
•Cost(FS) =Cardrel*a
•Cost(DS) =cardRel*p+
S*cardRel*a
•Cost(Indexed-FS) =Cost(FS)+
S*Cardrel*a+ Cardrel*p
•Cost(RS) =Cost(DS)+
S*cardRel*P
• CardRel: cardinality of the
relation holding the attribute
• A: Average length of attribute
(bytes)
• P: Pointer size (3 bytes..)
• S:selectivity factor of the
attribute =<1 (redundancy of the
attribute) S=CardDom/CardRel
• CardDom=Cardinality of the
attribute domain extensions
Domain Storage Cost
Relation R
Relation S
Value 1
Value 2
Value n
•Cost(DS) =cardRel*p+ S*cardRel*a
Ring Storage Cost
Relation S
S.a
Index on
S.a
Value 1
Value 2
Value n
•Rind index on a regular attribute
•Cost(RS) =Cost(DS)+ S*cardRel*P
FS=DS Curve (without ring index)
Cost(FS)=Cost(DS)
S=(a-p)/a
Where p=3
RS=DS Curve with Index
Cost(rS)=Cost(DS)
S=a/p
S 1
FS
0.8
RS more compact
Where p=3
0.6
0.4
0.2
0
0
2
4
6
8
10
13
16
19
22
25
28
31
Length
in bytes
Query Processing
Traditional Query Processing uses large main
memory for storing
• Intermediate Results
• Temporary data structures
They use state of the art algorithms to resort
metarialization on disk to avoid memory
overflow
Query Processing
• Basic Query Execution without RAM
• Complex Query Execution withot RAM
Why PicoDBMS cannot use those state
of the art algorithms
• Lifetime of stable memory
• Unable to estimate RAM size
• Those algorithms are complex
•TO SOLVE THIS PROBLEM, WE PROPOSE
QUERY PROCESSING TECHNIQUES THAT DO
NOT USE ANY WORKING RAM NOR INCUR ANY
WRITES IN THE STABLE MEMORY
Basic Query Exeqution
• Query Optimizer generates Optimal Query
Execution Plan (QEP)
• Query Engine
uses a library of relational operators
•
implements
Execution model
Basic Query Execution
query
parser and
translator
relational algebra
expression
optimizer
query
output
evaluation engine
data
execution plan
statistics
about data
Query Optimizer can consider different
shapes of query execution plan
•
•
•
•
Left-deep tree
Right-deep tree
Bushy trees
Extreme right-deep tree (PicoDBMS
offered)
Example:Healtcare relations
•Doctor(DocId,name,specialty,...)
•Prescription(VisitId,DrugId,qty,...)
•Visit(VisitId,DocId,date,diagnostics,...)
•Drug(DrugId,name,type)
Q1:WHO PERSCRIBEN ANTIBIOTICS IN 1999
Left-deep tree
⋈
• Operators are
executed
sequentially
• Each intermediate
result is metarialized
doc
⋈
⋈
Ϭ
Ϭ
presc
visit
drug
Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999
Right-deep tree
•Execute operators in a
pipelined fashion (avoid
intermediate metarialization
•All left- operands must be
metarialized
⋈
⋈
doc
Ϭ
⋈
visit
Ϭ
drug
Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999
presc
Bushy trees
⋈
Offer opurtinities to deal with the
size of intermediate results and
memory consumption
⋈
⋈
Ϭ
Ϭ
visit
doc drug
Q1:WHO PERSCRIBEDANTIBIOTICS IN 1999
presc.
Iterator model
• Pipeline execution can be easily achieved
using Iterator Model.
• Each operator is an iterator that supports 3
procedure calls
– Open-prepare an operator for producing an item
– Next- to produce an item
– Close-perform final clean up
• Starts from the root.
Extreme right deep trees
•Uses pipelining on every
operator(including select)
•As left operands are always base
relations, they are already
metarialized in stable memory.
⋈
doc
Ϭ
⋈
visit
⋈
NO RAM CONSUMPTION
presc
Q1:WHO PERSCRIBED ANTIBIOTICS IN 1999
Ϭ
drug
The cost of select,project and join
with extreme right deep trees
• These operators(SPJ) can be executed either sequentially
or with a ring index
• Select-ring index on select is allowed in the first base
operation.(drug.type)
• Project-project operator are pushed up to the tree since no
metarialization is required
• Join-The cost of the join depends on the way indices are
traversed.
– Unidirectional index
– Bidirectional index
Unidirectional index join cost
•Doctor
Doc.Id
•Doc.Id Visit
•n
•m
•Start with Doctor--m*n
•Start with Visit--m
Bidirectional index join cost
•Doctor
Doc.Id
•Doc.Id Visit
•n
•m
•Start with Doctor--m+n
•Start with Visit--m*m/2n
Complex Query Execution without RAM
Comlex Query Operators:
• Aggregate(count,max,min)
• Sort
• Duplicate removal
The intermediate results need to be
metarialized.Due to the No RAM rule such
metarialization cannot occur in PicoDBMS.
PicoDBMS solution
• Aggregate and duplicate removal can be done in pipeline if
the incoming tuples are yet grouped by gistinct values
• Pipeline operators are order-preserving since they consume
tupls in the arrival order.
THUS, ENFORCING AN ADEQUATE CONSUMPTION AT
THE LEAF OF THE EXECUTİON TREE ALLOWS
PIPELIED AGGREGATION AND DUPLICATE
REMOVAL
Q2:Number of Antibiotics prescribed per
doctor in1999 count
Ϭ
•Doctor contains distinct doctors
•Aggregation
⋈
drug
⋈
Ϭ
presc
⋈
visit
doc
Q3:Number of prescription per type of
drug
count
•An additional join is
required between relation
Drug and domain
Drug.type
⋈
⋈
presc
drug
drug.type
Q4:Number of prescriptions per doctor and
type of drug
count
•Cartesian make n times
more response time where n
is the number of distinct
type of drugs
•“group by”
⋈
⋈
drug
⋈
presc
x
visit
doc
drug.type
Q5:Distinct doctors and type of drug
distinct
•Q5 retreives the distinct
couples of doctor and type
of precsribed drugs
⋈
⋈
drug
⋈
presc
x
visit
doc
drug.type
Performance Evaluation
• Reasonable response time is the main concern
• Evaluating response time in samrtcards is complex
depending on platform parameters (Processor speed,
caching strategy, RAM and EEPROM speed)
• Which acceleration techniques are really mandatory
– ring indices
– query optimization
Test Platform
• Intel 486 (with no cache, CISC)
• 3 types of data volumes is considered (Small,
medium,large)
• for the platform, the system parameters (clock frequency,
cache) are varied and experimentation test is performed)
(best/worst configuration)
• Ring indices are tested, how it makes change in the
response time
• 2 query executions are tested
– Q1:Who prescribes antibiotics in 1999
– Q4:Number of prescriptions per doctor and type of
drugs
Performance results of Q1:Who prescribes
antibiotics in 1999
1
0.8
no ring-worst
no ring-best
ring-worst
ring-best
0.6
0.4
0.2
0
Small DB
Medium DB
Large DB
Performance results of Q4: Number of
prescriptions per doctor and type of drugs
25
20
15
no ring-worst
no ring-best
ring-worst
ring-best
10
5
0
Small DB Medium Large DB
DB
Test Results
• Join indices are mandatory in all classes
• Query optimiztion is useful only with large
databases and complex queries
CONCLUSION
As smartcards becomebecome more and more
versatile, multiapplications and powerful,
the need for database techniques arises.
Performance evaluations show that
PicoDBMS’s query and execution models
meet the smartcard requirments.