Resource capacity - Indico

Download Report

Transcript Resource capacity - Indico

FP7 HARNESS: Managing
Heterogeneous Resources
for the Cloud
Hardware- and Network-Enhanced Software Systems for Cloud Computing
EGI Conference 2015
Cloud PaaS
http://www.harness-project.eu/
Gabriel Figueiredo
Imperial College London
United Kingdom
20th May 2015
PaaS Design Drivers
Tenant perspective
State-of-the-art:
– minimise development costs
– minimise
Optimised
for operating
horizontalcosts
scale-out
– maximise
performance
over
homogeneous
resources
standardised APIs
optimal deployment
scale out / scale up
specialised resources
Provider
prospective
Application
requirements
Examples
commoditised resources
– minimise
ownership
costs
Fast
job completion
time with
scientific computing
interdependent
“big data”
– maximise usage
time-series virtualised
analysisresources
Fresh
results within
seconds
retrieval
– maximise
market
growthon-line information
data-centre expansion
specialised resources
on-line data analytics
http://www.harness-project.eu/
2
20th May 2015
Driving Use Cases
basis for demonstration and validation
shared memory
CPUs
cache
Delta Merge for SAP HANA
cache
…
CPUs
I/O
CPUs
20% of cycles and 10s of
in-memory OLTP and OLAP
seconds
queries forlocking
“big data” analytics
Reverse Time Migration (RTM)
two weeks on 300
…
fn
…
Parallelize
entries in daily
predict
http://www.harness-project.eu/
AdPredictor Machine Learning
Preprocess
y E {−1,1}
…
open-source “map/reduce”
Web
visitdistributed
log computation
data-flow
Iterate
O(109)
f1
scientific computation for the
multi-core
nodes
geosciences
Share state
update
…
cache
Aggregate
3
20th May 2015
Goal: Programmable and Manageable
the HARNESS challenge set
GPU-based
parallel-thread
engines
Solid-state
disk drives
Middleboxes for
in-network aggregation
and storage
ASIC-based
OpenFlow
switching fabric
FPGA-based
shared dataflow
engines
http://www.harness-project.eu/
4
20th May 2015
State of-the-Practice: OpenStack
1. Data-centres built
around commodity
resources
VM
CPU cores: 16
RAM: 32 GB
server
2. Tenants create
server instances (VMs)
based on pre-defined
configurations
VM
vCPU: 1
RAM: 256MB
VM
3. Resource extensions
must fit the “server-oriented”
view
GPGPU
server
server
VM
CPU cores: 24
RAM: 64 GB
1
GPGPU: 0
VM
vCPU: 3
RAM: 512MB
http://www.harness-project.eu/
5
20th May 2015
Modeling with OpenStack
the HARNESS challenge set
1. MPC-X is not a “Server” node
different resource characterisation
2. MPC-X is not part of a “Server” node
shared by multiple hosts
3. MPC-X is a different class of
resource!
CPU cores: 24
RAM: 64 GB
A
Server
Infiniband
# DFEs: 8
Model: MAIA
Infiniband
B
Server
Infiniband
C
Server
MPC-X: FPGA-based shared
data-flow engines
http://www.harness-project.eu/
6
20th May 2015
Big Idea
Support different types of resources as first-class resources
cloud
tenant
“I want the following resources:
3 VMs, 1 vDFE, 1 vGPGPU, 1 Storage volume”
VM1-3 [4 cores, 512MB] | vDFE [2 DFE, MaxRing] | Storage [3GB, SEQ:100MB/s]
HARNESS
Cloud
deploy and
launch app
RESOURCE
INSTANCES
“6 resource instances created. Cost: 3€/s”
http://www.harness-project.eu/
7
20th May 2015
Big Idea
Support different types of resources as first-class resources
cloud
tenant
“I want to run my job in less than 10s”
HARNESS
Cloud
deploy and
launch app
RESOURCE
INSTANCES
VM [12 cores,1GB] | vDFE [4 DFE, MaxRing] | Storage [3GB, SEQ:100MB/s
“3 resource instances created. Cost: 2€/s”
http://www.harness-project.eu/
8
20th May 2015
HARNESS challenges
RI
RI
RI
RI
Resource
Management
Resource
Resource
Resource
Resource
Resource
Resource
1. How to make effective use of
specialised resources instances?
2. How to allow specific resources to be
effectively shared in a multi-tenant
environment?
3. How to integrate different resource
managers into a coherent cloud platform?
http://www.harness-project.eu/
9
20th May 2015
HARNESS: Integration
CRS: Cross-Resource Scheduler
CRS
IRM: Infrastructure Resource Manager
Discovery?
Reservation?
IRM-Y
IRM-X
HARNESS
integration
IRM-Z
IRMs translate
agnostic discovery and
reservation requests to
specific requests to
resource managers
How to integrate
different runtime
management
systems in the cloud?
Resource
Management
API
Runtime
Drivers
API
Runtime
Drivers
API
Runtime
Drivers
operating system
Physical
devices
X
Y
10
http://www.harness-project.eu
Z
Agnostic Management
Make HARNESS resource-agnostic
1.
2.
3.
Scheduling
requires intimate
knowledge
about resources and
their characterisation
Resources expose
different levels of
abstraction and
interfaces
Capacity is
interpreted
differently
http://www.harness-project.eu/
HARNESS
Cloud
RESOURCE
INSTANCES
11
20th May 2015
Scheduling Process:
Heterogeneous Resources
RI = Resource
Instance
server
Capacity:
15
CPU cores: 16
31 GB
RAM: 32
VM
Capacity:
processors: 400
300
memory: 8GB
6GB
RI
Resource capacity
 Representation of resource state
with the appropriate level of abstraction
 is characterised differently
 changes when reserving and
releasing instances
Capacity:
Capacity:
storage: 250GB
storage: 150GB
performance: 25,000 iops
performance: 15,000 iops
concurrent users: ...
RI
Capacity:
#DFEs: 6
Available_dfes: [[1,2,5],[4,6,4]]
RI
http://www.harness-project.eu/
12
20th May 2015
IRM-SHEPARD: Hardware Accelerators
getResourceTypes
IRMSHEPARD
CRS
Discovery:
Resource
characterisation
Type DFECluster
attributes:
size:<integer>
topology: <[SINGLETON, GROUP, MAXRING]>
MPC-Xs
reserveResources
Size:2, SINGLETON
Size: 2, GROUP
CRS
Reservation:
Creating resource
instances
Size: 4, MAXRING
1. Exploit specialised features
of MPC-X
2. CRS is oblivious to the
nature of the MPC-X
DFECluster
13
http://www.harness-project.eu
Resource-Agnostic Allocation
“If an XtreemFS storage resource has 500MB available,
and we reserve 150MB while releasing 100MB,
how much storage would we have left?”
IRMXtreemFS
CRS
“The answer is 450MB”
“Not possible”
If the MPC-X cluster has 2 DFEs available,
and we reserve a vDFE with 5 DFEs, how many
DFEs would we have left?
CRS: allocation without knowing
the semantics of resource capacity?
IRMs help the CRS to solve
resource-specific problems during
the allocation process
14
http://www.harness-project.eu
IRMSHEPARD
Network Management
IRM-NET: expose network services as
resource instances – unified interface?
Expose to cloud tenants:
1. General network functions
L2 switching, L3 forwarding,
security group (set of ACLs),
load balancing
RI
network
resource instance
RI
RI
RI
RI
RI
supported by
OpenStack
storage/compute
resource instances
2. Network links of specific transmission
rate and propagation delay
3. Application-specific network functions
in-network aggregation
http://www.harness-project.eu/
RI
15
not supported by OpenStack
not supported by OpenStack
20th May 2015
OpenStack View
IP: 192.168.0.38
Server
RI
IP: 192.168.0.38
RI
Server
RI
RI
Server
IP: 192.168.0.36
RI
RI
RI
Server
RI = Resource
Instance
IP: 192.168.0.35
Server
IP: 192.168.0.32
http://www.harness-project.eu/
16
20th May 2015
HARNESS View
IP: 192.168.0.38
GPGPU
DFE Cluster
RI
IP: 192.168.0.38
RI
Server
RI
RI
RI
RI
IP: 192.168.0.36
RI
Device X
RI = Resource
Instance
XtreemFS
Storage
IP: 192.168.0.35
IP: 192.168.0.32
http://www.harness-project.eu/
17
20th May 2015
Placement Constraints
IP: 192.168.0.33
GPGPU
IP: 192.168.0.38
placement
constraints
RI
RI
Host
DFE Cluster
RI
RI
RI
RI
IP: 192.168.0.36
RI
Device X
RI = Resource
Instance
XtreemFS
Storage
IP: 192.168.0.35
R1
R2
R1
-
5
R2
2
-
IP: 192.168.0.32
http://www.harness-project.eu/
18
20th May 2015
Application Deployment
IP: 192.168.0.38
GPGPU
DFE Cluster
RI
IP: 192.168.0.38
RI
RI API
Host
API
RI
RI
APP
API
API
RI
IP: 192.168.0.36
RI
Device X
RI = Resource
Instance
XtreemFS
Storage
IP: 192.168.0.35
IP: 192.168.0.32
http://www.harness-project.eu/
19
20th May 2015
HARNESS Architecture
Users
Users
submit
app, manifest, SLO
feedback
ConPaaS
submit
application
Platform
Layer
feedback
Application
ApplicationManager
Manager(AM)
(AM)
configuration
feedback
Cross-Resource Scheduler
IRM-NET
VMs + switches
IRM-NOVA
IRM-SHEPARD
IRM-XtreemFS
VMs
hardware accelerators
storage devices
Infrastructure
Layer
OpenStack
Nova
Controller
OpenStack
Neutron
Controller
XtreemFS
Reservation
Scheduler
MaxelerOS
Orchestrator
available
DFEs
DFE
reservation
volume
reservation
XtreemFS
Directory
MPC-X
MPC-X
Neutron Agent
InfiniBand
devices
Nova Compute
available
OSDs
status
DFE
PCIe device reservation
GPGPU
SHEPARD
Compute
switches
interconnect
FPGA
local PCIe
devices
OSDs/
MRCs
servers
execute
task
AlphaData
OpenCL
MaxelerOS
Executive
Compute
DB
read/write
operations
XtreemFS client
POSIX
Service
Layer
Application
Module
ConPaaS agent
Virtual Machines
deploy /execute
services + applications
20
HARNESS Platform
Login
Profiling
to PaaS
Application
Production
http://www.harness-project.eu/
21
20th May 2015
Conclusion
Unique features of the HARNESS cloud
Glue-logic API that combines different types of resource managers
Different types of heterogeneous resources as first-class resources
Allocation algorithms that are agnostic to resource types
Reservation requests with placement constraints
Planned results until the end of the project (by end of 2015)
HARNESS deployment in: Grid5K and EGI
Effective management of batch jobs in the case of resource contention
Demonstrate effectiveness of allocation algorithms in
large-scale/heterogeneous data-centres
http://www.harness-project.eu/
22
20th May 2015