Transcript Slides
Building Virtual Scientific Computing
Environment with Openstack
Yaodong Cheng, CC-IHEP, CAS
[email protected]
ISGC 2015
Contents
Requirements of scientific computing
IHEP cloud platform
Virtual machine types
Virtual computing cluster
Dirac distributed computing
Conclusion
International Symposium on Grids and Clouds (ISGC) 2015
2/25
Large science facilities
IHEP: The largest fundamental research center in
China
IHEP serves as the backbone of China’s large
science facilities
Beijing Electron Positron Collider BEPCII/BESIII
Yangbajing Cosmic Ray Observatory: ASg & ARGO
Daya Bay Neutrino Experiment
China Spallation Neutron Source (CSNS)
Hard X-ray Modulation Telescope(HXMT)
Accelerator-driven Sub-critical System (ADS)
Jiangmen Neutrino Underground Observatory (JUNO)
Under planning: BAPS, LHAASO, XTP, HERD, …
International Symposium on Grids and Clouds (ISGC) 2015
3/25
BEPCII/BESIII
36 Institutions from China, US, Germany, Russian,
Japan,…
> 5PB in next 5 years
~ 5000 CPU cores
simulation, reconstruction, analysis, …
long-term data preservation
data sharing between partners
International Symposium on Grids and Clouds (ISGC) 2015
4/25
Other experiments
Daya Bay Neutrino Experiment
~200TB per year
JUNO: Jiangmen Neutrino Experiment
~500TB per year
LHAASO
2PB per year after 2017,
accumulate 20PB+ in 10 years
Atlas and CMS Tier2 site
940TB disk, 1088 CPU cores
CSNS, HXMT, …
5PB data one year!!
International Symposium on Grids and Clouds (ISGC) 2015
5/25
Computing resources status
~ 12000 CPU cores
~ 50 queues, managed by Torque/PBS
difficult to share
~ 5PB disk
Lustre, Glustre, dCache/DPM, …
~ 5PB LTO4 tape
two IBM 3584 tape libraries
modified CERN CASTOR 1.7
Tape libraries
International Symposium on Grids and Clouds (ISGC) 2015
PC farm built
with blades
6/25
In the future, …
More HEP experiments, need to manage twice or
more servers as today
but, no possibility of significant increase in staff
numbers
Is cloud a good solution ?
Is cloud suitable for Scientific Computing?
Time to change IT strategy!!
International Symposium on Grids and Clouds (ISGC) 2015
7/25
What is Cloud?
NIST: Best Definitions
Essential characteristics
On-demand self-service
Broad network access
Resource pooling
Rapid elasticity
Measured service
Service models
IaaS, PaaS, SaaS
Deployment models
Public, private, hybrid
http://csrc.nist.gov/publ
ications/nistpubs/800145/SP800-145.pdf
Is cloud beneficial to scientific computing ?
International Symposium on Grids and Clouds (ISGC) 2015
8/25
Easy to Maintain
Hardware: services become independent of
underlying physical machine
Cloud services: single set of services for
managing access to computing resources
Scientific platforms: become separate layer
deployed, controlled and managed by domain
experts
International Symposium on Grids and Clouds (ISGC) 2015
9/25
Customized Environment
Operating systems suited to your application
Your applications preinstalled and preconfigured
CPU, memory, and swap sized for your needs
International Symposium on Grids and Clouds (ISGC) 2015
10/25
Dynamic Provisioning
New storage and compute resources in minutes
(or less)
Resources freed just as quickly to facilitate
sharing
Create temporary platforms for variable
workloads
International Symposium on Grids and Clouds (ISGC) 2015
11/25
IHEPCloud: a Private IaaS platform
Launched in May 2014
http://cloud.ihep.ac.cn
Three use scenario
User self-Service virtual machine platform
User register and destroy VM on-demand
Virtual Computing Cluster
Job will be allocated to virtual queue
automatically when physical queue is
busy
Distributed computing system
Work as a cloud site: Dirac call cloud interface to
start or stop virtual work nodes
International Symposium on Grids and Clouds (ISGC) 2015
12/25
IHEPCloud services
Who can use?
any user who has IHEP email account
How many resources for user?
By default, each user has 3 CPU cores and 15GB memory
VM types
testing machine
full root privilege, no public storage
UI node
AFS authentication, No root privilege, public storage
No some limitations like memory, CPU time, process
OS types
SL 55, SL 58, SL 65, SL 7 64 bits, SL 65 32 bits, Win7 64 bits
add new types depends on user requirement
VM IP address
internal IP address (192.168.*.*) is allocated automatically
foreign IP address (202.122.35.*) need the approval of administrator
International Symposium on Grids and Clouds (ISGC) 2015
13/25
Why does end user need IHEPCloud?
Virtual testing machine
Develop program or do some testing
generates VM in a few minutes
Login VM via ssh/VNC, remote desktop
Virtual UI node
debug program in computing
environment
login node: lxslcxx.ihep.ac.cn
Limitations: Memory, CPU time,
user processes, …
cputime > 45m && %CPU > 60%
KILL it!
Affected by other users
VMs: owned only by
one user; no limitations
International Symposium on Grids and Clouds (ISGC) 2015
14/25
Virtual computing cluster
If a job queue is busy, the jobs can be allocated
to a virtual queue
plan to run the service this year
Cloud Scheduler
Submit job
Check
Queue load
junoq: 128 CPU cores
Start/stop VM
Forward job
IHEPCloud
Virtual queue
Detailed: see haibo’s talk
International Symposium on Grids and Clouds (ISGC) 2015
15/25
Distributed computing
Distributed computing has integrated cloud
resources based on pilot schema,
implementing dynamic scheduling
Cloud resources used can be shrunk and
extended dynamically according to job
requirements
No Jobs
Cloud
Distributed Computing
Distributed Computing
Distributed Computing
User Job
Submission
Job1, Job2, Job3…
Create
Job
Finished
Get Job
Cloud
VM1, VM2, …
International Symposium on Grids and Clouds (ISGC) 2015
No Jobs
Delete
Cloud
VM
16/25
Cloud sites
5 cloud sites from Torino, JINR, CERN and IHEP have
been set up and connected to distributed
computing system
About 320 CPU cores, 400GB Memory, 10TB disk
International Symposium on Grids and Clouds (ISGC) 2015
17/25
Cloud tests
More than 4500 jobs have been done with 96%
success rate
Failure reason is lack of disk space
Disk space will be extended in IHEP cloud
International Symposium on Grids and Clouds (ISGC) 2015
18/25
Performance and Physics validation
Performance tests has
shown that running time
in the cloud sites are
comparable with other
production sites
Simulation,
Reconstruction,
Download random
trigger data
Physics validation has
proved that physics
results are highly
consistent between
clusters and cloud sites
International Symposium on Grids and Clouds (ISGC) 2015
19/25
Get info.
API
Architecture
interactive
Storage path
Push info.
Dashboard
Dirac
Virtual Cluster
API
API
Host Monitor
OpenStack
Log Analysis
authentication
LDAP
Service monitor
UMT
(IHEP EMAIL)
Register
Nagios
Get VM info.
DNS
Interoperation
Configuration management
Register
DNS
NetworkDB
Register
Puppet
International Symposium on Grids and Clouds (ISGC) 2015
UMT
(CAS CLOUD)
Backend
storage
CEPH
20/25
IHEPCloud components
Core middleware: openstack
open source cloud management system
most popular
Configuration management tool: Puppet
create VM image
manage applications in VM
keep the consistency of VM and computing environment
Authentication
IHEP Email account and password
AFS authentication for UI node
Network management
centric NetworkDB record MAC, IP, hostname, user, …
each VM has a hostname, *.v.ihep.ac.cn
network traffic accounting
External storage
Currently, images and instances are stored in local disk
evaluating CEPH to support GLANCE, NOVA and Cinder
International Symposium on Grids and Clouds (ISGC) 2015
21/25
Network in IHEPCloud
multiple IP subnets on one
physical machine
Vlan mode (Just L2, no router) in
Openstack neutron
IP gateway and 802.1Q in hardware
switch
Problem: trunck
Big mac table
Pre-config
Risk of Broadcast storm
Future network
Core layer: Vxlan(hardware-based)
Access layer: Openstack vlan mode
International Symposium on Grids and Clouds (ISGC) 2015
22/25
IHEPCloud Current status
Control Node
Released in 18 November, 2014
Built on openstack icehouse
1 control node
VM
VM
VM
7 computing nodes
112 physical CPU cores / 224
VM
VM
VM
Computing
Node
Virtual CPU cores, 896GB
memory totally
Active 96 VMs, 172 CPU cores,
628GB memory by 11 March
International Symposium on Grids and Clouds (ISGC) 2015
23/25
Conclusion
cloud computing is widely accepted by industrial and
scientific domain
Scientific computing are preparing the move to cloud
IHEPCloud aims at providing self-service virtual machine
platform for IHEP user
IHEPCloud also supports virtual cluster and distributed
computing
One small Cloud platform has been built and open to IHEP
user freely
More resources (1000+) will be added to IHEPCloud this year
Investigate shibboleth to build federated cloud
International Symposium on Grids and Clouds (ISGC) 2015
24/25
Thank you!
Any Questions?