LoadAtomizer

Download Report

Transcript LoadAtomizer

LoadAtomizer: A Locality and I/O Load
aware Task Scheduler for MapReduce
2012,
in Proc. of IEEE 4th International Conference on Cloud Computing Technology
and Science,
M. Asahara, S. Nakadai, and T. Araki
2015.11.26
Network & Database Lab.
김민수
Index
•
•
•
•
•
Background
Introduction
LoadAtomizer
Simulation and Evaluation
Conclusions and Future work
Network & Database Lab
2
BackGround- Hadoop[/]
• Hadoop – HDFS+ MapReduce
 HDFS(Hadoop Distributed File System) – storage of data
Network & Database Lab
3
BackGround- Hadoop[/]
• Hadoop – HDFS+ MapReduce
 HDFS(Hadoop Distributed File System) – Components
Network & Database Lab
4
BackGround- Hadoop[/]
• Hadoop – HDFS+ MapReduce
 MapReduce – processing of data
Network & Database Lab
5
BackGround- Hadoop[/]
• Hadoop – HDFS+ MapReduce
 MapReduce – components
Network & Database Lab
6
BackGround- Hadoop[/]
• Hadoop – WorkFlow
Network & Database Lab
7
Introduction
• Data-intensive computing problem.
I/O bottlenecks
• data locality based task scheduling policy
 To place a computing task near its input data
Data locality is NOT good enough
• Heavy I/O load by the other task
Network & Database Lab
8
LoadAtomizer - motivation
• When all jobs running on a cluster have the same or
similar I/O characteristics
 Locality-aware task shedulers work effectively
• When the I/O characteristics of jobs differ from each
other
 Locality-aware task shedulers do not always work good
Network & Database Lab
9
LoadAtomizer - motivation
• When the I/O characteristics of jobs differ from each
other
Network & Database Lab
10
LoadAtomizer - motivation
• If the schedulers were also aware of I/O loads of
storages and the network
Network & Database Lab
11
LoadAtomizer - motivation
• Two MapReduce jobs on a cluster with eight slaves
 TeraSort job : s1 is Reduce task(shuffle and merge phase)
 Grep job : s2 is map task
I/O loads
Network traffic
Network & Database Lab
12
LoadAtomizer – Design Issues
• I/O Load aware Task Assignment
• I/O Load aware Storage Selection
• Network Load aware Scheduling
• Locality Awareness
Network & Database Lab
13
LoadAtomizer
• Maintaining I/O Load Information
 Collects the I/O load information from slaves and network
switches
 The State of storages and the network
-> heavily loaded or lightly loaded
 Topology-aware load tree
Network & Database Lab
14
LoadAtomizer
• Maintaining I/O Load Information
Network & Database Lab
15
LoadAtomizer
• Storage Selection and Task Scheduling
 Job scheduling policy is independent from locality and I/O
load aware task scheduling
-> use other scheduling
 Selection lightly loaded storage
 connect lightly network path
 Map task chooses storage
Network & Database Lab
16
LoadAtomizer
• The algorithm chooses quickly a lightly loaded storage
that meets three conditions
 the storage stores input data of at least one map task
 the slave can connect to the storage through a lightly loaded
network path
 the storage is closer to the slave than others
Network & Database Lab
17
LoadAtomizer
• Storage Selection and Task Scheduling
Network & Database Lab
18
IMPLEMENTATION
• Prototype of LoadAtomizer into Apache Hadoop 0.23.1
on Linux 2.6.26
• modified some modules of the Hadoop MapReduce
framework and HDFS to command a slave to read the
storage selected by LoadAtomizer
• Storage monitor : /proc/diskstats
• Network nomitor : /proc/net/dev
• It uses a threshold approach to determine if the
loading state becomes heavily or lightly loaded
Network & Database Lab
19
IMPLEMENTATION
• I/O-heavy job : TeraSort(40GB)
• I/O-lightly job : grep(64GB) , word count(32GB)
• 2GHz quad-core CPU, 12GB RAM, 300GB 10k-rpm SAS disk,
gigabit Ethernet port.
• Threshold storage : 100 I/Os
• Threshold network : 80Mbps
•
•
•
•
The block size of HDFS : 256MB
eight reduce tasks
Three replicas
Allocated 1GB memory to each map and reduce task
Network & Database Lab
20
IMPLEMENTATION
Network & Database Lab
21