Research Projects in DSRG Lab
Download
Report
Transcript Research Projects in DSRG Lab
Elke A. Rundensteiner
Database Systems Research Group
Email:
Office:
Phone:
WebPages:
[email protected]
Fuller 238
Ext. – 5815
http://www.cs.wpi.edu/~rundenst
http://davis.wpi.edu/dsrg
Project Topics in a Nutshell:
Distributed Data Sources:
EVE : Data Warehousing over
Distributed Data
TOTAL-ETL : Distributed
Extract Transform Load
[NSF’96,NSF02,NSF05?]
XML/Web Data Systems:
RAINBOW : XML to Relational
Databases
MASS : Native XQuery
Processing System
[Verizon,IBM,NSF05, NSF05?]
Databases & Visualization:
Scalable Visual High-Dim.
Data Exploration
Data and Visual Quality
Support in XMDV
[NSF’97,NSF01,NSF05]
Stream Monitoring System:
Scalable Query Engine for
Data Streams
Fire Prediction and
Monitoring Appl.
[NSF05a?, NSF05b?]
CAPE : Engine for Querying
and Monitoring Streaming Data
Example of Stream Data Applications:
• Market Analysis
–Streams of Stock Exchange Data get rich
• Critical Care
–Streams of Vital Sign Measurements – save lives
• Physical Plant Monitoring
–Streams of Environmental Readings – protect env
Databases Upside Down
static data
data
data
data
Standing queries
Query
Query
Query
Query
data
one-time queries
data
data
streams
of data
Stream Query Processing
Register
Continuous
Queries
Receive
Answers
High workload of
queries
Real-time and
accurate responses
required
Streaming Data
Distributed Stream
Query Engine
May have timevarying rates and
high-volumes
Memory- and CPU
resource limitations
Streaming Result
Available resources for
executing each operator
may vary over time.
Run-time Distribution and Adaptations required.
Good news … for a research student
We can lean on the oldie and goodie,
Yet so many new and unsolved problems at our
finger tips due to new light !
Interesting (yet doable) research challenges
Even possibilities for start-up (if you are so
inclined)
Research Contributions
Scalable Query Operators (Punctuations)
Adapt and select among tasks such as memory purging, stream reading, memoryto-disk shuffling, punctuation propagation, index selection, etc.
Synchronized Plan Spilling
Operators selectively spill data to disk to off-set the system overload with adaptive
re-load to improve performance
Adaptive Operator Scheduling
Selector scores alternate scheduling algorithm based on their effect on
QoS requirements, and selects candidate.
On-line Query Plan Migration
On-line plan restructuring and then online migration to the new plan even
for stateful operators.
Distributed Plan Execution
Adaptively distribute computations across multiple machines to optimize
QoS requirements without information loss
We got it all . . . and more
If you like theory
algorithms for np-complete optimization, graph theory
If you like systems
distributed allocation, scheduling, and parallelism of query
execution
If you like networking
quality-of-query, load-shedding, grid-computing
If you like AI
learning of scheduling selection, run-time adaptation
If you like software engineering
huge query engine code base, we really need you
So where is the database in this stuff?
One answer :
Who cares ? If it’s fun, it’s database
stuff
Second answer :
Development of a new generation
of “data query engine”
A driving application: FIRE
Sensors in Rooms
Engineering Data for Fire Science
Futuristic Monitoring Queries ?
Track a smoke cloud (moving cluster) in terms of its speed
and severity ?
Find the scope and direction of fire spreads ?
Match given sensors readings of fire with a fire stream
simulation to determine similarity ?
Is this a prank (outlier), or are we dealing with an actual
fire ?
What path should people be leaving this building ?
Any sensor readings are faulty, and should be ignored?
FireEngine : Fire Stream Processing
If Questions, email me:
[email protected]
Better, drop by
DSRG Labs : Fuller 319 & 318
My office : Fuller 238