Transcript PPT - S3Lab
An Efficient and Transparent Transaction
Management based on the Data Workflow of
HVEM DataGrid
Im Young Jung
Seoul National University
Introduction
Transaction Management for a safe data update and insertion on e-Science
DataGrid
Heterogeneous storages according to the characteristics and the size of data
Based on workflow, the storing precedence of data across heterogeneous
storages in a transaction
In this paper
2
An efficient and transparent transaction management on HVEM DataGrid
Dividing the transaction into sub-transactions according to the transaction states and
Classifying them
Transaction hierarchy and parallelism provide
efficient and safe large data upload to HVEM DataGrid
transparency in the transaction including simultaneous access to heterogeneous
storages
Automatic garbage collection
HVEM Grid
High Voltage Electron Microscope(HVEM)
Let scientists realize the 3D structure analysis of new materials in micrometer-
scale
HVEM Grid
3
Remote users can perform the same tasks as on-site scientists.
Remote controlling of HVEM
Storing, retrieval and search data through HVEM DataGrid
Processing data through HVEM Computational Grid
HVEM
DataGrid
Designed for Biologic experiments using HVEM
A logical view of one storage for DB and file storage
The small metadata is stored at DB
Information for materials, material handling methods, HVEM experiments, Images,
4
experimenters
The large files are stored in file storages
2D or 3D image files, the documents related to HVEM experiments
Internal process to find files
After finding their logical path in the file storage by searching the DB, users can
retrieve the files they want in the file storage
HVEM DataGrid
A unified data management
The storing precedence
among data
When store all biological
information for the images, we
should keep the images in HVEM
Grid at the same time
The relational semantics
between various data stored in
distributed heterogeneous
storages
To upload many large files to
HVEM DataGrid efficiently
and safely
Upload dependency &
Serialization
Ensure the transactions for safe
5
parallel uploads
An efficient and transparent transaction
management
Requirement for the transactions on HVEM DataGrid
Consider the semantic of HVEM DataGrid
A project is composed of several experiments
The data for an experiment should be inserted according to its data workflow
The file and its metadata should be stored to HVEM DataGrid simultaneously.
Otherwise, all of them should be deleted
Support
the long lifetime transaction according to the timelimit of experiment or project
the short lifetime transaction which stores the data to HVEM DataGrid physically
The optimization for the upload of large files to reduce the blocking time
should ensure safe transactions
An asynchronous and parallel upload scheme should protect upload dependency and
ensure safe transactions
6
An efficient and transparent transaction
management
Transaction hierarchy
For Project
The transaction units as checkpoints on
For Experiment
incomplete data insertion
Confine the rollback extent
For a group
of TnSs
Parallel Processing
For storing data to physical storage
When the data for an experiment or a
project is not inserted to HVEM
DataGrid until each timelimit, the
experiment or the project should be
vanished by the rollback of TnE or TnP
TnS((((1)2)5)2)
(1) represents the identity of TnP it
belongs to
The next index ‘2’ indicates the identity of
TnE and so on
Support Autonomous garbage collection
It is dependent on users to insert data or delete it on HVEM DataGrid.
When they do not insert experimental data any more due to any reason without deleting the
7 related data, HVEM DataGrid would have a big garbage.
Transaction management Scheme
HVEM DataGrid forks two processes to connect DB and file storage each.
In the light failure(LF) due to temporary failures on network or server,
When the connections succeed, it gets the next requests and so on.
retry the transaction fixed times
When
jSiS the
jSiD(the
notification
from
DB), jSiF(the
notification
from
the file storage)
jSiE (both
of
retries
fail,
a
serious
failure(SF)
is
assumed
rollback
process
8
them arrive) : TnS completes
The state change of TnS(((())j)i)
Evaluation
Analysis
Transparency
Through transaction hierarchy and fine grained state management
the transaction manager in HVEM DataGrid enables the transparent transaction to upload the
image files to the file storage and store their metadata to DB simultaneously.
Serializability
Many TnSs are upload serializable because their state changes are logged through transaction index.
To keep the upload dependency,
the transaction manager protects the first user entering TnW.
o If he withdraws the TnW, then an other user can initiate the TnW
Transaction performance
Support the transaction scheme asynchronism and parallelism
Experiment Setting
Because the sub-transaction time on DB is negligible compared with that on file storage due to
data size, we only considered the upload time for image file
Considering the semantic of the data workflow in HVEM DataGrid
For an asynchronous file transfer, the request intervals for file transfer are chosen randomly
within 50 sec
The physical locations of the file storages are assumed to be distributed
9
Evaluation
Overhead
10
Log management cost
The cost for TnP, TnE and TnW; The general transaction management requires the log for TnS
The log size for TnP, TnE and TnW is smaller than that for TnS because they function as
checkpoint rather than real transaction units.
Rollback cost
The cascade rollback of TnS in TnW due to the upload dependency on parallel processing of TnS
At LF, if the retry succeeds, the gain from transaction parallelism can be very large especially for
large file handling
There are not many SFs or LFs because e-Science DataGrid is not popular as the multimedia
storage
Conclusion
A transaction management on HVEM Grid
Safety
Ensure a safe transaction considering the data workflow in HVEM DataGrid
Efficiency
Improve the performance to upload large files by asynchronism and parallelism
Transparency
Data management across the heterogeneous storages
Automatic garbage collection
Reduce garbage
11