Running Statistical Analysis with R on Longhorn

Download Report

Transcript Running Statistical Analysis with R on Longhorn

Introduction to Data Analysis with R on HPC
Texas Advanced Computing Center
Feb. 2015
Agenda
• 8:30-9:00 Welcome and introduction to TACC
resources.
• 9:00–9:30 Getting started with running R at TACC.
• 9:30–10:00 Practice and coffee break.
• 10:00-11:00 R basics
• 11:00-11:30 Data analysis support in R
• 11:30-1:00 Lunch break
• 1:00-1:30 Scaling up R computations
• 1:30-2:00 A walkthrough with parallel package in R
• 2:00-3:00 hands on lab session
• 3:00-4:00 Understand the performance of R program
Introduction to TACC Resource
About TACC
• TACC is a Research Division at the University of Texas at
Austin
– Origins go back to 1960s Cray CDC 6600 support
– TACC started in 2001 to support research beyond UT needs
• TACC is a service provider for XSEDE on several key systems
– Currently providing between 80 to 90% of HPC cycles in XSEDE
– Not limited to supporting NSF research
• TACC is also supported by partnering with UT Austin, UT
System, Industrial Partners, Multi-institutional research grants,
and donations
• TACC is 110+ people (40+ PhDs) bringing enabling
technologies and techniques to drive digital research
– Many collaborative research projects and mission specific proposals to
support open research
– Consulting to bring TACC expertise to other communities
Stampede
Maverick
HPC Jobs
6400+ Nodes
10 PFlops
14+ PB Storage
Vis & Analysis
Interactive
Access
132 K40 GPUs
Wrangler
Lonestar
HTC Jobs
1800+ Nodes
22000+ Cores
146 GB/node
Corral
Data Intensive
Computations
10 PB Storage
High IOPS
Stockyard
Shared Workspace
20 PB Storage
1 TB per user
Project Workspace
Data Collections
6 PB Storage
Databases
IRODS
Rodeo/
Chameleon
Cloud Services
User VMs
Vis Lab
Ranch
Immersive Vis
Colaborative
Touch Screen
3D
Tape Archive
160 PB Tape
1+ PB Access
Cache
Stampede
• Base Cluster (Dell/Intel/Mellanox):
–
–
–
–
–
6,400 nodes
Intel Sandy Bridge processors
Dell dual-socket nodes w/32GB RAM (2GB/core)
56 Gb/s Mellanox FDR InfiniBand interconnect
More than 100,000 cores, 2.2 PF peak performance
• Max Total Concurrency:
– exceeds 500,000 cores
– 1.8M threads
– #7 in HPC top 500
• 90% allocated through XSEDE
Additional Features of Stampede
• 6800 Intel Xeon Phi “MIC” Many Integrated Core
processors
– Special release of “Knight’s Corner” (61 cores)
– 10+ PF peak performance
• Stampede includes 16 1TB Sandy Bridge shared
memory nodes with dual GPUs
• 128 of the compute nodes are also equipped with
NVIDIA
Kepler K20 GPUs
• Storage subsystem driven by Dell storage nodes:
– Aggregate Bandwidth greater than 150GB/s
– More than 14PB of capacity
– Similar partitioning of disk space into multiple Lustre
filesystems as previous TACC systems ($HOME, $WORK
and $SCRATCH)
What does this mean?
• Faster processors
• More memory per node
• Starting hundreds of analysis jobs in batch.
• Access to latest “massive parallel” hardware
– Intel Xeon Phi
– GPGPU
Automatic offloading with latest
hardware
• R is originally designed as for single thread
execution.
– Slow performance
– Not scalable with large data
• R can be built and linked to library utilizes
latest multiple core technology for automatic
parallel execution for some operations, most
commonly, linear algebra related
computations.
Getting more from R
• Optimizing R performance
on Stampede
– Intel compiler vs. gcc was a
factor of 2 improvement
– MKL significantly improved
performance
– Some Xeon Phi performance
enhancement too
– Supporting common parallel
packages
Maverick Hardware
• 132 Node dual core Ivy Bridge
based cluster
– Each node has NVIDIA Kepler K40
GPU
– 128 GB of memory
– FDR Interconnect
– Shares Work file system with
Stampede (26 PB unformatted)
– Users get 1 TB of Work to start
• Intended for real time analysis
• TACC system, 50% provided to
XSEDE in kind, 50%
discretionary
Visualization and Analysis Portal
R and Python
• Can launch RStudio Server
and iPython Notebook
– Introducing capabilities, best
practices, and forms of
parallelism to users
– Simplifying UI with web
interface
– Improving visualization
capabilities with
Shiny package and GoogleVis
Hadoop Cluster: Rustler
• A Hadoop cluster with 64 Hadoop Data Nodes
– 2 x 10 core Ivy Bridge processors
– 128 GB memory
– 16x1TB disks (1 PB usable disk, 333 TB replicated)
• Login node, 2 Name nodes, 1 Web Proxy node
• 10 Gb/s Ethernet network with 40 Gb/s
connectivity to TACC backbone
• In early user period today
• A pure TACC resource (All discretionary
allocations)
Wrangler
• Three primary subsystems:
– A 10PB disk storage system
– Lustre based – (2 R720 dual E52680 MD servers, 45 C8000 OSF
servers with 6 TB drives)
– An embedded analytics
capability of several thousand
cores.
– 96 Dell R620 Haswell E5-2680v3 nodes with dual IB FDR/40
Gb/s Ethernet
– A high speed global object store
• 500 TB usable Flash via PCI to
all 96 analytics nodes
• 1TB/s IO rate &250M+ IOPS
Mass Storage Subsystem
10 PB
(Replicated)
IB Interconnect 120 Lanes
(56 Gb/s) non-blocking
Access &
Analysis System
96 Nodes
128 GB+ Memory
Haswell CPUs
40 Gb/s
Ethernet
Fabric
PCI Gen 3 Fabric; Allto-all connection, 1TB/s
High Speed Storage System
500+ TB
1 TB/s
250M+ IOPS
Data Intensive Computing Support at
TACC
• Data Management and Collection group
– Providing data storage service
• Files, databases, irods,
– Collection management and curation
• Data Mining and Statistics group
– Collaborating with users to develop and implement
scalable algorithmic solution.
– In addition to general data mining and analysis method,
also expertise in R, Hadoop and visual analytics..
• We are here to help:
– [email protected]