Transcript Slide 1
MapReduce and Parallel DMBSs: Friends or
Foes?
Michael Stonebraker, Daniel Abadi, David J. Dewitt, Sam Madden, Erik
Paulson, Andrew Pavlo, Alexander Rasin
Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.
Presentation and slides by Elisa Tvete, Jim Avery
PARALLEL DBMS ARCHITECTURE
Multiple nodes running database software
“Shared-nothing nodes” - separate CPU, memory, disks
Data horizontally partitioned across all nodes
Each node runs query on own data
Results returned to central processing node
Central node calculates final result
MAPREDUCE ARCHITECTURE
Several computing nodes used
Data not pre-loaded
Query has “Map” and “Reduce” components
Key/value data is distributed to nodes
Nodes perform “Map” step
Results are returned to central processing node
PERFORMANCE TRADE-OFFS
DEMONSTRATION
•
Three systems:
–
–
–
•
Hadoop MR Framework
Vertica, a column-store relational database
DBMS-X, a row-based database
Three tasks:
–
Original MR Grep task
•
–
Web log task
•
–
SELECT * FROM Data WHERE field LIKE `%XYZ%';
SELECT sourceIP, SUM(adRevenue) FROM UserVisits GROUP BY
sourceIP;
Join task
DEMONSTRATION RESULTS
Performance Trade-Offs
Demonstration Results
1400
Time (in seconds)
1200
1000
800
Hadoop
600
DBMS-X
Vertica
400
200
0
Grep
Web Log
Task
Join
MR COMPLEMENTS PARALLEL
DBMS
MR good at extract-transform-load queries
Can perform complex analytics more easily
Extract raw data, process it, load into DBMS
Queries not suitable for single SQL query
Can use data without strictly defined schema
MR functions can enhance parallel DBMS!
CONCLUSION
•
Architectural Differences
–
–
–
–
•
•
Repetitive record parsing
Compression
Pipelining
Scheduling
Discussion
Coexistence
RESOURCES
•
•
M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A.
Pavlo, and A. Rasin, "MapReduce and Parallel DBMSs: Friends or
Foes?," Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden,
and M. Stonebraker. A Comparison of Approaches to Large-Scale
Data Analysis. Brown University Data Management Research Group,
26 Feb. 2013. Web. 24 Aug 2011.
<http://database.cs.brown.edu/projects/mapreduce-vs-dbms/>