RAW tests - hep-proj-database Site

Download Report

Transcript RAW tests - hep-proj-database Site

CERN/IT/DB
An evaluation of Data
Storage and Analysis With
Oracle DBMS
Eric Grancher, CERN IT/DB
Database Workshop, CERN
July 11th-13th 2001
CERN/IT/DB
Outline
Initial study, using Oracle 8i
RAW data tests
 Various methods
 Bandwidth
 Overhead
TAG data analysis
 Type of queries
 Bitmap indexes
 Binning and function based indexes
CERN/IT/DB
RAW Data Tests
Purpose
 Verify on a reduced scale that oracle is not
too consuming when stressed to acquire a
large data volume
 First design options
 Ideas about the overhead
CERN/IT/DB
Model
Typical model (NA 45) and random generator
thanks to Dirk Düllmann
 Run, burst, event and label types
Many ways to map it in an ORDBMS
*
RUN
1
*
BURST
1
1
*
*
EVENT
1
1
*
1
*
LABEL
1
*
Vector<long>
CERN/IT/DB
Storage
We implemented:
 Run, Burst and Event: relational tables
 Label: one Binary Large Object in order to
store the doubles
There is more than one way to do it !
CERN/IT/DB
Tests
We used a workgroup server, sun E420R with
a D1000 (ultra SCSI) disk subsystem
Veritas volume manager (manageability and
tuning)
RAW devices tests show that disk subsystem
saturates with 4 disks stripe (38MBytes/s
compared to the 40MBytes/s specification)
CERN/IT/DB
Host Language
There are many ways to connect to
Oracle DBMS, supported ones:
C/C++ and Oracle8i: OCI, Pro*C and
ODBC
Pro*C: pre-compiler (embedded SQL)
ODBC: generic interface
CERN/IT/DB
Choice: OCI
The low level C native interface, all
facilities
Encapsulated into C++
Typically quite complex (callback and
C++) and “verbose”
CERN/IT/DB
Setup
Linux (lxplus)
Sun E420R
+ D1000
Oracle
100Mbit/s
Random
CERN/IT/DB
Optimisation
We use “NO CACHE” tables,
“NOLOGGING” hint
Using VM, we could see that control
files were the hot spot -> we moved
them to free disks, factor of 5 gain in
speed
Networking subsystem is the bottleneck
(100 Mbit/s), several Linux clients
CERN/IT/DB
Data files and redo log files
Log history
Current log sequence
Checkpoint information
Control Files
Instance
SGA
Shared pool
Server
process
LGWR CKPT
DBWR
User
process
Parameter
file
Modified slide from
Bjørn Engsig / Oracle
Redo log
files
Data files
Database
Control
files
CERN/IT/DB
Overhead
Overhead: 4.1% over the C++ data
NUMBER type in Oracle
LOB are multiple of the block size
(8kB), size of the Label was often
around 20k long
(ie 80kB on Linux/i386+)
CERN/IT/DB
Results and perspectives
Sustained 10.88 MB/s during 2000s,
8 Linux client machines
The server networking sub-system is the
bottleneck
Study during this summer with Gbit/s
Ethernet and large Linux PC
Larger machines later (how does “Oracle Real
Application Cluster” scales for such an application?)
CERN/IT/DB
Conclusion
At this scale, it works
LOB -> byte ordering dependency
Other tests with non-LOB types
 more overhead
 better facilities to analyse the data
Overhead has to be further studied
Other bindings (OCCI and refs)