Transcript Slide 1

Computing & Networking
User Group Meeting
Roy Whitney
Andy Kowalski
Sandy Philpott
Chip Watson
17 June 2008
1
Users and JLab IT
• Ed Brash is User Group Board of Directors’ representative
on the IT Steering Committee.
• Physics Computing Committee (Sandy Philpott)
• Helpdesk and CCPR requests and activities
• Challenges
– Constrained budget
• Staffing
• Aging infrastructure
– Cyber Security
2
Computing and Networking
Infrastructure
Andy Kowalski
3
CNI Outline
• Helpdesk
• Computing
• Wide Area Network
• Cyber Security
• Networking and Asset Management
4
Helpdesk
• Hour 8am-12pm M-F
– Submit a CCPR via http://cc.jlab.org/
– Dial x7155
– Send email to [email protected]
• Windows XP, Vista and RHEL5 Supported
Desktops
– Migrating older desktops
• Mac Support?
5
Computing
• Email Servers Upgraded
– Dovecot IMAP Server (Indexing)
– New File Server and IMAP Servers (Farm Nodes)
• Servers Migrating to Virtual Machines
• Printing
– Centralized Access via jlabprt.jlab.org
– Accounting Coming Soon
• Video Conferencing (working on EVO)
6
Wide Area Network
• Bandwidth
– 10Gbps WAN and LAN backbone
– Offsite Data Transfer Servers
• scigw.jlab.org(bbftp)
• qcdgw.jlab.org(bbcp)
7
Cyber Security Challenge
• The threat: sophistication and volume of attacks continue
to increase.
– Phishing Attacks
• Spear Phishing/Whaling are now being observed at JLab.
• Federal, including DOE, requirements to meet the cyber
security challenges require additional measures.
• JLab uses a risk based approach that incorporates
achieving the mission while at the same time dealing with
the threat.
8
Cyber Security
• Managed Desktops
– Skype Allowed From Managed Desktops On Certain Enclaves
• Network Scanning
• Intrusion Detection
• PII/SUI (CUI) Management
9
Networking and IT Asset Management
• Network Segmentation/Enclaves
– Firewalls
• Computer Registration
– https://reggie.jlab.org/user/index.php
• Managing IP Addresses
– DHCP
• Assigns all IP addresses (most static)
• Integrated with registration
• Automatic Port Configuration
– Rolling out now
– Uses registration database
10
Scientific Computing
Chip Watson & Sandy Philpott
11
Farm Evolution Motivation
• Capacity upgrades
– Re-use of HPC clusters
• Movement to Open Source
– O/S upgrade
– Change from LSF to PBS
13
Farm Evolution Timetable
Nov 07: Auger/PBS available – RHEL3 - 35 nodes
Jan 08: Fedora 8 (F8) available – 50 nodes
May 08: Friendly-user mode; IFARML4,5
Jun 08: Production
– F8 only; IFARML3 + 60 nodes from LSF IFARML alias
Jul 08: IFARML2 + 60 nodes from LSF
Aug 08: IFARML1 + 60 nodes from LSF
Sep 08: RHEL3/LSF->F8/PBS Migration complete
– No renewal of LSF or RHEL for cluster nodes
14
Farm F8/PBS Differences
• Code must be recompiled
– 2.6 kernel
– gcc 4
• Software installed locally via yum
– cernlib
– Mysql
• Time limits: 1 day default, 3 days max
• stdout/stderr to ~/farm_out
• Email notification
15
Farm Future Plans
• Additional nodes
– from HPC clusters
• CY08: ~120 4g nodes
• CY09-10: ~60 6n nodes
– Purchase as budgets allow
• Support for 64 bit systems when feasible & needed
16
Storage Evolution
• Deployment of Sun x4500 “thumpers”
• Decommissioning of Panasas
(old /work server)
• Planned replacement of old cache nodes
17
Tape Library
• Current STK “Powderhorn” silo is nearing end-of-life
– Reaching capacity & running out of blank tapes
– Doesn’t support upgrade to higher density cartridges
– Is officially end-of-life December 2010
• Market trends
– LTO (Linear Tape Open) Standard has proliferated since 2000
– LTO-4 is 4x density, capacity/$, and bandwidth of 9940b:
800 GB/tape, $100/TB, 120 MB/s
– LTO-5, out next year, will double capacity, 1.5x bandwidth:
1600 GB/tape, 180 MB/s
– LTO-6 will be out prior to the 12 GeV era
3200 GB/tape, 270 MB/s
18
Tape Library Replacement
• Competitive procurement now in progress
– Replace old system, support 10x growth over 5 years
• Phase 1 in August
– System integration, software evolution
– Begin data transfers, re-use 9940b tapes
• Tape swap through January
• 2 PB capacity by November
• DAQ to LTO-4 in January 2009
• Old silo gone in March 2009
End result: breakeven on cost by the end of 2009!
19
Long Term Planning
• Continue to increase compute & storage capacity
in most cost effective manner
• Improve processes & planning
– PAC submission process
– 12 GeV Planning…
20
E.g.: Hall B Requirements
Event Simulation
SPECint_rate2006 sec/event
Number of events
Event size (KB)
% Stored Long Term
Total CPU (SPECint_rate2006)
Petabytes / year (PB)
Data Acquisition
Average event size (KB)
Max sustained event rate (kHz)
Average event rate (kHz)
Average 24-hour duty factor (%)
Weeks of operation / year
Network (n*10gigE)
Petabytes / year
st
1 Pass Analysis
2012
2013
2014
2015
2016
1.8
1.00E+12
20
1.8
1.00E+12
20
1.8
1.00E+12
20
1.8
1.00E+12
20
1.8
1.00E+12
20
10%
5.7E+04
2
25%
5.7E+04
5
25%
5.7E+04
5
25%
5.7E+04
5
25%
5.7E+04
5
20
0
0
0%
0
1
0.0
20
0
0
0%
0
1
0.0
20
10
10
50%
0
1
0.0
20
10
10
60%
30
1
2.2
20
20
10
65%
30
1
2.4
2012
2013
2014
2015
2016
SPECint_rate2006 sec/event
Number of analysis passes
Event size out / event size in
Total CPU (SPECint_rate2006)
Silo Bandwidth (MB/s)
Petabytes / year
1.5
0
2
0.0E+00
0
0.0
1.5
0
2
0.0E+00
0
0.0
1.5
1.5
2
0.0E+00
900
0.0
1.5
1.5
2
7.8E-03
900
4.4
1.5
1.5
2
8.4E-03
1800
4.7
Total SPECint_rate2006
SPECint_rate2006 / node
# nodes needed (current year)
Petabytes / year
5.7E+04
600
95
2
5.7E+04
900
63
5
5.7E+04
1350
42
5
5.7E+04
2025
28
12
5.7E+04
3038
19
12
LQCD Computing
• JLab operates 3 clusters with nearly 1100 nodes,
primarily for LQCD plus some accelerator modeling
• National LQCD Computing Project
(2006-2009: BNL, FNAL, JLab; USQCD Collaboration)
• LQCD II proposal 2010-2014 would double the hardware
budget to enable key calculations
• JLab Experimental Physics & LQCD computing share
staff (operations & software development) & tape silo,
providing efficiencies for both
22