Dryad: Distributed Data-Parallel Programs from Sequential Building

Download Report

Transcript Dryad: Distributed Data-Parallel Programs from Sequential Building

Dryad: Distributed Data-Parallel
Programs from Sequential
Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu,
Andrew Birrell, Dennis Fetterly
Microsoft Research, Silicon Valley
Eurosys 2007
Overview

Mechanism to express parallel computation



Large scale internet services
Chip multiprocessors
Drawing on previous work




Condor
GPU shader languages
MapReduce
Parallel databases
Main Idea

Represent the computation as a DAG
of communicating sequential
processes
Claims

The Dryad execution engine:






Schedules across resources
Optimizes the level of concurrency in a node
Manages failure resilience
Delivers data where needed
Performance is good and scales
The programming abstraction is at the right
level


API mastery only takes a couple of weeks
Higher level abstractions have been built on
Dryad
Dryad System

Name server enumerates all resources


Including location relative to other
resources
Daemon running on each machine for
vertex dispatch
Communication
Constructing the Job

Use graph operators implemented in
C++ to describe the graph.
Database Query Example
Execution


Job manager not currently fault tolerant
Vertices may be scheduled multiple times







Each execution versioned
Execution record kept- including versions of
incoming vertices
Outputs are uniquely named (versioned)
Final outputs selected if job completes
Non-file communication may cascade failures
Vertices specify hard constraints or
preferences for placement
Scheduling is greedy assuming only one job
Run-time Graph Refinement
Results – I


SQL Query
10 Machines





2 dualcore 2 GHz
8 GB Mem
1 Gb Ethernet
4x400GB disks
Winows Server
2003
Results – II





Map then reduce style
Builds histogram of MSN Search query
frequency
1800 Machines
10.2 TB source data
11072 Vertices
Refinement
Dryad