Research Projects in DSRG Lab

Download Report

Transcript Research Projects in DSRG Lab

Elke A. Rundensteiner
Database Systems Research Group
Email:
Office:
Phone:
WebPages:
[email protected]
Fuller 238
Ext. – 5815
http://www.cs.wpi.edu/~rundenst
http://davis.wpi.edu/dsrg
Project Topics in a Nutshell:
Distributed Data Sources:
EVE : Data Warehousing over
Distributed Data
TOTAL-ETL : Distributed
Extract Transform Load
[NSF’96,NSF02,NSF05?]
XML/Web Data Systems:
RAINBOW : XML to Relational
Databases
MASS : Native XQuery
Processing System
[Verizon,IBM,NSF05, NSF05?]
Databases & Visualization:
Scalable Visual High-Dim.
Data Exploration
Data and Visual Quality
Support in XMDV
[NSF’97,NSF01,NSF05]
Stream Monitoring System:
Scalable Query Engine for
Data Streams
Fire Prediction and
Monitoring Appl.
[NSF05a?, NSF05b?]
CAPE : Engine for Querying
and Monitoring Streaming Data
Example of Stream Data Applications:
• Market Analysis
–Streams of Stock Exchange Data get rich
• Critical Care
–Streams of Vital Sign Measurements – save lives
• Physical Plant Monitoring
–Streams of Environmental Readings – protect env
Databases Upside Down
static data
data
data
data
Standing queries
Query
Query
Query
Query
data
one-time queries
data
data
streams
of data
Stream Query Processing
Register
Continuous
Queries
Receive
Answers
High workload of
queries
Real-time and
accurate responses
required
Streaming Data
Distributed Stream
Query Engine
May have timevarying rates and
high-volumes
Memory- and CPU
resource limitations
Streaming Result
Available resources for
executing each operator
may vary over time.
Run-time Distribution and Adaptations required.
Good news … for a research student
We can lean on the oldie and goodie,
Yet so many new and unsolved problems at our
finger tips due to new light !
Interesting (yet doable) research challenges
Even possibilities for start-up (if you are so
inclined)
Research Contributions
Scalable Query Operators (Punctuations)


Adapt and select among tasks such as memory purging, stream reading, memoryto-disk shuffling, punctuation propagation, index selection, etc.
Synchronized Plan Spilling


Operators selectively spill data to disk to off-set the system overload with adaptive
re-load to improve performance
Adaptive Operator Scheduling


Selector scores alternate scheduling algorithm based on their effect on
QoS requirements, and selects candidate.
On-line Query Plan Migration


On-line plan restructuring and then online migration to the new plan even
for stateful operators.
Distributed Plan Execution


Adaptively distribute computations across multiple machines to optimize
QoS requirements without information loss
We got it all . . . and more 
If you like theory
 algorithms for np-complete optimization, graph theory
If you like systems
 distributed allocation, scheduling, and parallelism of query
execution
If you like networking
 quality-of-query, load-shedding, grid-computing
If you like AI
 learning of scheduling selection, run-time adaptation
If you like software engineering
 huge query engine code base, we really need you 
So where is the database in this stuff?
One answer :
Who cares ? If it’s fun, it’s database
stuff 
Second answer :
Development of a new generation
of “data query engine”
A driving application: FIRE
Sensors in Rooms
Engineering Data for Fire Science
Futuristic Monitoring Queries ?
Track a smoke cloud (moving cluster) in terms of its speed
and severity ?
Find the scope and direction of fire spreads ?
Match given sensors readings of fire with a fire stream
simulation to determine similarity ?
Is this a prank (outlier), or are we dealing with an actual
fire ?
What path should people be leaving this building ?
Any sensor readings are faulty, and should be ignored?
FireEngine : Fire Stream Processing
If Questions, email me:
[email protected]
Better, drop by
DSRG Labs : Fuller 319 & 318
My office : Fuller 238