slides - Computer Science & Engineering

Download Report

Transcript slides - Computer Science & Engineering

Non-Traditional
Databases
Reading
1.
2.
Scientific data management at the Johns Hopkins institute
for data intensive engineering and science Yanif Ahmad,
Randal Burns, Michael Kazhdan, Charles Meneveau, Alex
Szalay, Andreas Terzis, February 2011 SIGMOD Record ,
Volume 39 Issue 3 ,
http://dl.acm.org/citation.cfm?id=1942776.1942782&coll
=DL&dl=ACM&CFID=66206057&CFTOKEN=48992457
Migrating a (large) science database to the cloud Ani
Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of
the 19th ACM International Symposium on High
Performance Distributed Computing ,
http://dl.acm.org/citation.cfm?id=1851539&bnc=1
Farkas
CSCE 824 - Spring 2011
2
Reading
3.
Farkas
M. Stonebaker, U. Cetintemel, One Size Fits All": An
Idea Whose Time Has Come and Gone, in Proceeding
of CDE '05 Proceedings of the 21st International
Conference on Data Engineering, IEEE Computer
Society Washington, DC, USA, 2005,
http://www.computer.org/portal/web/csdl/abs/pro
ceedings/icde/2005/2285/00/22850002abs.htm
CSCE 824 - Spring 2011
3
Traditional Database
Management Systems



Farkas
Focus on business data
management
Provide uniform capabilities
regardless of the data
characteristics
Need: capabilities to meet new
application requirements
CSCE 824 - Spring 2011
4
Examples of New Needs



Farkas
Stream Data Processing
Large scale scientific databases
Data warehousing
CSCE 824 - Spring 2011
5
Streaming Data



Farkas
Sensor-based applications
– Real-time systems: sophisticated alerting,
location-based services,
– Historical data
Financial applications
– Support applications, such as electronic
trading, legal compliance, real-time
marker analysis, etc.
Performance requirements
CSCE 824 - Spring 2011
6
Performance SDMS vs. RDMS


Empirical results (see reference paper #3)
Issues:
– Inbound processing model
– Correct primitives for stream processing
(aggregates, “timeout,” “slack”)
– Seamless integration of DBMS processing
with application processing (client-server vs.
embedded applications)
– Transactional behavior (weaker notion of
recovery, tolerance, no ACID requirements)
Farkas
CSCE 824 - Spring 2011
7
Security for Streaming
Data?


What is the difference between
the security needs of streaming
vs. traditional (e.g., relational)
data?
How to enforce security?
– Security punctuation
Farkas
CSCE 824 - Spring 2011
8
Scientific Databases


Massive amount of data
Heterogeneous data
– Sensor data, satellite, scientific
simulation data, etc.

Goal: better understanding of
physical phenomena
– Genomic database, geological
exploration, astronomy, etc.
Farkas
CSCE 824 - Spring 2011
9
Scientific Databases

Need efficient analysis and querying
capabilities
– Multi-dimensional indexing (e.g., genomic
sequence indexing)
– Specific applications (e.g., visualization of
seismic data)
– Specific aggregations (e.g., data mining for
biological correlation)
– Efficient data archiving, staging, lineage,
and error propagation techniques
Farkas
CSCE 824 - Spring 2011
10
Example Scientific Data
Management


Reference #1
Basic research:
1. formation of hypotheses and theories
2. designing experiments for their
validation
3. collecting data by experimentation
4. analyzing data to guide new insights for
further research
Farkas
CSCE 824 - Spring 2011
11
Scientific Computing


Steps 3 and 4 are data intensive
Need to improve computational
power
–
–
–
–
Farkas
Parallel processing
Grid and supercomputers
Special application logic
Preservation of scientific data
CSCE 824 - Spring 2011
12
Current Technologies and
Scientific Databases




Farkas
Reference #2: How to migrate
large scale scientific database to
cloud environment?
Difficult engineering process
Limited capabilities of database
user
Based on commercial cloud
CSCE 824 - Spring 2011
13
Data Warehousing

Repository of data providing
organized and cleaned enterprisewide data (obtained form a
variety of sources) in a
standardized format
– Data mart (single subject area)
– Enterprise data warehouse (integrated
data marts)
– Metadata
Farkas
CSCE 824 - Spring 2011
14
Data Warehousing



Farkas
Difference between OLTP and
OLAP
Data management: updates,
indexing, dependencies, etc.
OLAP: needs Read Optimized
storage
CSCE 824 - Spring 2011
15
Next Class
Geographical Databases
Farkas
CSCE 824 - Spring 2011
16