Transcript Slide 1

MapReduce and Parallel DMBSs: Friends or
Foes?
Michael Stonebraker, Daniel Abadi, David J. Dewitt, Sam Madden, Erik
Paulson, Andrew Pavlo, Alexander Rasin
Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.
Presentation and slides by Elisa Tvete, Jim Avery
PARALLEL DBMS ARCHITECTURE


Multiple nodes running database software
 “Shared-nothing nodes” - separate CPU, memory, disks
Data horizontally partitioned across all nodes
 Each node runs query on own data
 Results returned to central processing node
 Central node calculates final result
MAPREDUCE ARCHITECTURE


Several computing nodes used

Data not pre-loaded
Query has “Map” and “Reduce” components

Key/value data is distributed to nodes

Nodes perform “Map” step

Results are returned to central processing node
PERFORMANCE TRADE-OFFS
DEMONSTRATION
•
Three systems:
–
–
–
•
Hadoop MR Framework
Vertica, a column-store relational database
DBMS-X, a row-based database
Three tasks:
–
Original MR Grep task
•
–
Web log task
•
–
SELECT * FROM Data WHERE field LIKE `%XYZ%';
SELECT sourceIP, SUM(adRevenue) FROM UserVisits GROUP BY
sourceIP;
Join task
DEMONSTRATION RESULTS
Performance Trade-Offs
Demonstration Results
1400
Time (in seconds)
1200
1000
800
Hadoop
600
DBMS-X
Vertica
400
200
0
Grep
Web Log
Task
Join
MR COMPLEMENTS PARALLEL
DBMS

MR good at extract-transform-load queries


Can perform complex analytics more easily



Extract raw data, process it, load into DBMS
Queries not suitable for single SQL query
Can use data without strictly defined schema
MR functions can enhance parallel DBMS!
CONCLUSION
•
Architectural Differences
–
–
–
–
•
•
Repetitive record parsing
Compression
Pipelining
Scheduling
Discussion
Coexistence
RESOURCES
•
•
M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A.
Pavlo, and A. Rasin, "MapReduce and Parallel DBMSs: Friends or
Foes?," Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010.
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden,
and M. Stonebraker. A Comparison of Approaches to Large-Scale
Data Analysis. Brown University Data Management Research Group,
26 Feb. 2013. Web. 24 Aug 2011.
<http://database.cs.brown.edu/projects/mapreduce-vs-dbms/>