Transcript Slide 1
Best Practices for Setting Up
Computer Hardware in a Grid
Environment
Tom Keefer
Performance Analyst, SAS
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Cheryl Doninger
R&D Director, SAS
Recipe for Success
SAS Grid Computing
lots of SAS users
review different grid architectures
• different OS’s, network connectivity, storage solutions
show scalable through-put and sustained
I/O as number of grid nodes increase
create reference architectures of successful grid
configurations to help answer your questions
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What is Grid Computing?
“Grid computing integrates, virtualizes, and
manages resources (software and hardware) to
provide a much larger, powerful distributed
computing infrastructure."
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Benefits of SAS on a Grid
increases scalability
increases availability
facilitates provisioning
increases flexibility
reduces costs
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
=
Virtual
Data
Center
Running SAS on a Grid
SAS Grid Manager
Distributed Enterprise
Scheduling
Distribute jobs within
workflows to range of
hosts.
Automatically find and use
the best available resource
for each job.
Workload Balancing
Parallelized Workload
Balancing
Distribute workloads to a
shared pool of resources.
Distribute parallelized
SAS workloads to a
shared pool of resources.
Automatically find and
use the best available
resource.
Automatically find and use
the best available
resource
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What products can leverage SAS Grid Manager?
SAS Grid Manager
Distributed
Enterprise
Scheduling
SAS Data Integration
Studio
SAS Web Report
Studio
Workload Balancing
Any SAS program
(with wrapper)
including stored
processes and SAS
Enterprise Guide
programs
SAS Marketing
Automation
SAS Marketing
Optimization
Any SAS program
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Parallelized Workload
Balancing
SAS Data Integration
Studio
SAS Enterprise Miner
SAS Risk Dimensions
Any SAS program
(with modification)
SAS Grid Architecture Topology
SASApp
Central File Server for:
• Job Deployment Directories
• Source and Target Data
• SAS Log files
Grid Client
+
Metadata Server
Management Console
(Grid Manager plug-in)
DIS or EM
SAS Program LSF
Platform Grid
Management Service
Platform LSF
...
Platform LSF
Platform LSF
Platform LSF
Platform Process Mgr
Base SAS
SAS/Connect
Metadata Server
3
SAS Workspace
Server
SAS Grid Server
SAS Data Step
Batch Server
Grid Control
Machine
Base SAS 2
SAS/Connect
1
SAS Grid Server
SAS Data Step
Batch Server 1
Grid Node 1
Base SAS 2
SAS/Connect 1
Base SAS 2
SAS/Connect 1
Grid Node 2
Grid Node n
SAS Grid Server
SAS Data Step
Batch Server 1
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Server
SAS Data Step
Batch Server 1
Keys To Success – Areas To Focus
node configuration
• heterogeneous or homogeneous
number and type of processors
memory
storage/data access
no different than single server - just more systems.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Data Storage is The Key
sharable
throughput across the grid
scalable
locality of data
• input files
• output files
• temporary files
• external data access
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Shared File System Testing Efforts
Operating System
File Sharing Technology
Red Hat Linux (RHEL 4)
EMC Celerra Multi-Path File System on
iSCSI (MPFSi)
Red Hat Linux (RHEL 4)
Network Appliance (NFS)
Sun Solaris 10
Sun StorageTek QFS
Red Hat Linux (RHEL 4)*
Global File System (GFS)
Windows*
Polyserve / HP Matrix
AIX*
IBM Global Parallel File System (GPFS)
HP-UX*
Veritas Clustered File System (CFS)
*Efforts ongoing
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Steps to Success With Grid
determine your system requirements
• what does your application do?
• data flow diagram
architect your system
test throughput outside of SAS first
• third party tools
• replicate your applications behavior (i/o pattern)
single node SAS tests, then scale out
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
EMC MPFSi
Architecture
/work
IP Traffic
Switch
Notes:
NAS
MPFSi client on nodes
“The Directory”
Conversion
network “managers”
Fiber Channel
leverage existing net
/work
EMC Storage
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
/data
EMC MPFSi Discussion Points
based on previous “Highroad” product
SAS data integration benchmarking scenario
40 Linux grid nodes
• dual core, dual Ethernet per node for data
• up to 160 simultaneous SAS processes
performance tips:
• analyze throughput from node to storage – data flow!!
• watch placement of disk volumes for performance
• don’t allow non-grid activity on network
• separate client and admin network
• monitor director and data mover throughput
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Network Appliance
NFS Architecture
Linux Nodes
Notes:
NAS
Network Switch
NFS client on nodes
ALL Ethernet
leverage existing network
NFS everywhere
/data
NetApp FAS6030
(network storage)
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
/work
Netapp NFS Discussion Points
pure network file system implementation (NFS)
SAS data integration benchmarking scenario
10 Linux grid nodes
• quad core* - single Ethernet per node for data
performance tips:
• check throughput from node to storage – data flow!!!
• don’t allow non-grid activity on network
• separate client and admin network
• watch placement of disk volumes for performance
* important note: core to throughput per node ratio
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Sun QFS
Architecture
server nodes
fibre channel
Notes:
FC Switch
SAN
QFS software on nodes
fibre channel
QFS server “master”
fibre channel – node to disk
Sun storage
/data
/work
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Sun QFS Discussion Points
pure fibre channel (SAN)
SAS data integration benchmarking scenario
up to 4 Solaris server nodes
• 48 to 64 core grid nodes (144 total on grid)
• up to 180 simultaneous SAS processes
• up to 20 fiber channel connections per server
performance tips:
• check throughput from node to storage – data flow!!!
• watch placement of disk volumes for performance
• setup of QFS master server
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other Shared File System Technologies
SAN based – fibre channel
• Multi-Path File System (MPFS)
NOT iSCSI
• IBM Global Parallel File System (GPFS)
• Polyserve / HP Matrix
− only one available for windows!!
• Linux Global File System (GFS)
• Veritas Clustered File System (CFS)
NAS - Ethernet
• NetApp with iSCSI
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS is continuing its
testing efforts with
various partners.
Overall Best Practices for Shared File Systems
data flow diagram
• understand your applications throughput requirements
before you talk to a storage vendor
monitoring and management tools are a must!
test throughput OUTSIDE of SAS first!
some technologies have volume placement
limitations!
• i.e. can you span all the arrays with a single volume?
analyze throughput per $ before you buy
availability…. backups….future scalability….
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Scalable
Performance Data Server
on a Grid
each server / grid
node runs its own
instance of SAS
and SPDS Server
server / grid nodes
/spds/index
SAN or NAS
/spds/meta
/spds/data2
bottom line: myspdslib.mysastable
is available on any server!
shared file systems
/spds/data1
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SPDS directories
SAS Really Scales in a Grid
scalable I/O throughput
lots of choices for OS, storage solution, etc.
our work will continue...
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
More to See and Do...
“A Throughput-Intensive Compute and Storage
Grid Using SAS® Grid Manager”
• Somantak Chanda, American Express
• Tues 1:30-2:20, Northern Hemisphere E-2
SAS Grid demo booth #16
IT Intelligence for Grid Optimization- demo booth
#53
Platform Computing – Alliance Café booth #87
various storage partners – Alliance Café
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
For More Information...
scalability website:
http://support.sas.com/rnd/scalability/grid
today’s presentation
http://support.sas.com/rnd/scalability/grid/gridpapers.html
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.