20160102r1-BigDawgOverview
Download
Report
Transcript 20160102r1-BigDawgOverview
The BigDAWG Polystore System
Database Challenges
• Enterprises encounter many databases and data models.
• Specialized systems provide performance, but add complexity.
Database Challenges
• Enterprises encounter many databases and data models.
• Specialized systems provide performance, but add complexity.
• BigDAWG goals:
–
–
Provide as much location (database) transparency as possible
Support a single query notation and interface with limited
extensions
BigDAWG
BigDAWG Design
Many “Sizes” Support for heterogeneous storage and
database engines
Low Latency
Support for real time streaming databases for
Internet of things
Location
Allow users to operate on data without explicit
Transparency knowledge of location
Semantic
Support the widest number of database
completeness operations with efficient connectors
BigDAWG Design
Many “Sizes” Support for heterogeneous storage and
database engines
Low Latency
Support for real time streaming databases for
Internet of things
Location
Allow users to operate on data without explicit
Transparency knowledge of location
Semantic
Support the widest number of database
completeness operations with efficient connectors
BigDAWG Design
Many “Sizes” Support for heterogeneous storage and
database engines
Low Latency
Support for real time streaming databases for
Internet of things
Location
Allow users to operate on data without explicit
Transparency knowledge of location
Semantic
Support the widest number of database
completeness operations with efficient connectors
BigDAWG Design
Many “Sizes” Support for heterogeneous storage and
database engines
Low Latency
Support for real time streaming databases for
Internet of things
Location
Allow users to operate on data without explicit
Transparency knowledge of location
Semantic
Support the widest number of database
completeness operations with efficient connectors
Semantic Islands as the Tradeoff
• Islands are the trade-off between functionality
and location transparency.
• Islands have:
- A Data Model
- A Language or Set of Operators
- A Set of Candidate Database Engines
Semantic Islands as the Tradeoff
• Islands are the trade-off between functionality
and location transparency.
User specifies the Island:
RELATIONAL(select avg(temp) from device)
• Islands have:
- A Data Model
- A Language or Set of Operators
- A Set of Candidate Database Engines
ARRAY(multiply(A,B))
Semantic Islands as the Tradeoff
• Islands are the trade-off between functionality
and location transparency.
User specifies the Island:
RELATIONAL(select avg(temp) from device)
• Islands have:
- A Data Model
ARRAY(multiply(A,B))
- A Language or Set of Operators
- A Set of Candidate Database Engines
* Islands do
Intersection of
engines
* BigDAWG does
Union of Islands
* Islands are logical
Hackathon to
Prototype BigDAWG
• BigDAWG Goal: Harness the power of advanced
database engines through a unified interface
• BigDAWG is the vision of the ISTC Big Data to
develop future technologies and interfaces that
support knowledge extraction big data
• Recent Hackathon at MIT BeaverWorks
produced a BigDAWG prototype
Using BigDAWG Polystore for Medical
Big Data
• Data Explorer
• Tell Me Something Interesting
• Text Analytics
• Heavy Analytics
• Streaming Analytics
S-PI Overview Screen
Big DAWG Prototype - Island Types
-Text AnalyticsD4M
-ExplorerScalaR
-Tell SomethingSeeDB
Searchlight
-Heavy AnalyticMyria
-StreamingS-Store
S-PI
-WatchWearables
S-PI
Client
Big DAWG API
Server
Islands
Engines
D4M
Associative Arrays
PostgreSQL
Tabular Clinical
Data
SciDB
Historical Waveform
Data
DataModel
Model
Data
Data
Model
Island
Island
Island
(i.e.ARRAY,
ARRAY,TEX)
TEX)
(i.e.
(i.e.
ARRAY, TEXT)
Accumulo
Text
Clinical Data
(i.e. chart notes)
Myria
(Iterative)
Streams
MyriaX
S-Store
Intermediate
results
Streaming
Waveform Data