presentation source

Download Report

Transcript presentation source

SLAC Networking
Les Cottrell
for the SLAC network group
SLAC
<[email protected]>
Presented by Charley Granieri at the SLAC Computing External Review, June 1999
6/24/99
1
Outline of talk
•
•
•
•
•
•
•
LAN - architecture, assets, monitoring
Residential access to SLAC
WAN - connectivity and monitoring
Email - servers, spam, majordomo etc.
Other network services such as News ...
Advanced technology pilots
Summary - challenges etc.
6/24/99
2
Mission etc.
• Provide leadership and support in data
communications to the Laboratory as a whole and to
physics research in particular.
– Network engineering & management - 4.3 FTEs, 1 open
slot
– Network monitoring:
• LAN 1.5 FTEs
• WAN 2.7 FTEs
– Network services (email, news, VMS etc): 2.5 FTEs
– NetOps: 3 + 1 open slot
• Telecommunications also under same hat (helps
coordination and convergence):
– 2.5FTEs + contractor
6/24/99
3
Network Drivers
• Deployment of computers to new areas/farms/people
• Faster interfaces, more capable, easier to use
computers
• New applications (BaBar, BSD, multimedia, VoIP
…)
• Increased reliance
• Increased security
• World wide collaborations - distance independent
• New technologies (media, interfaces, protocols,
applications)
6/24/99
4
Growth of SLAC LAN
6/24/99
5
Principles for LAN design
• Simplicity
– Enet 10/100/1000 Mbps, phase out FDDI, LocalTalk etc.
– Reduce number of protocols in core to IP only, limit
bridging, keep smart stuff at edges,
• Stay away from edges of performance envelope
– over-provision, double aggregate every 18 months
• shared to switched 10Mbps => 100Mbps for desktop
• 100Mbps switched => 1Gbps for core & high vol. Servers
• Provide high availability
– redundancy of core components so can schedule outages
– UPS
• Invest in network management tools
6/24/99
6
Network Architecture
• Structured wiring started 1995, complete outside
radiation fence this fiscal year, i.e. 90% completed
• Increasingly switched network (from shared media)
– Based on mass market Ethernet
– improved error isolation, ability to know where assets
are, and security
– scalable
6/24/99
7
SLAC Switched LAN Summer 1999
Modems,
ISDN
xDSL
ESA
Old
Servers
10BaseT
Internet
FDDI/CDDI
100BaseFL
DMZ
SSRL
100BaseT
1Gbit FL
FDDI Ring
4Gbit FL
Legacy
Concentrator
Routers
Switches
Core
Gigaswitch
Router
BSD
Switch
Hub
4 Farms
3 Servers
16 Building
switches
IR2
MCC1
BaBar
MCC3
MCC2
SSRL
6/24/99
8
Current state - availability
• Switched segmentation reduces impact of many
problems, simplifies identification
• UPS for core components
• Redundant core devices
• Redundant power supplies on core switches &
routers
• Redundant trunks
• Cisco Hot Standby Routing Protocol
6/24/99
9
Current state - performance
• Just been through major upgrade, switch fabric
occupancy ~ 60%, 1000Mbps in core + high
performance servers
– 46% of available bandwidth in 1000Mbps links, 47% in
100Mbps links
– 2.6 hosts / collision domain (down from 3.6 at last
review)
• Close collaboration with BaBar & systems to
improve/optimize performance for trigger farms and
data collection
6/24/99
10
BaBar
• Make sure network is not the bottleneck
• Measured > 400 Mbps (UDP or TCP - with extended
windows) Gbps to Gbps
• Measured ~ 400Mbps aggregate from Gbps to 4 *
100 Mbps between CC & IR2
• Provide real-time web accessible monitoring page
showing thruputs for various components & drill
down
6/24/99
11
Real time BaBar thruput Monitoring
6/24/99
12
LAN assets inventory
• Oracle Database of network equipment, linked to
property control, phone etc.
• Much of network info gleaned automatically and
entered into dB:
– connectivity from router ARP tables, from bridge/switch
CAM tables, from CDP
• gives MAC level addresses etc
• create “model” of router/switch/hub & host connections
– MIBs in nodes provide make & model, S/N, swr/hdw rev
level, port type, speed
• Other info is entered manually:
– when host registered it gets property control number, IP
address, owner, admin
– DNS entered into dB then automatically updates DNS
6/24/99
tables
13
LAN performance monitoring
• Read MIBs from routers & switches & plot:
– octets, errors
– generate alerts (outside thresholds, e.g. heavy
multi/broadcast activity, heavy utilization, high error
rates)
– graphical Web reports using Java & other (MRTG) tools
with history for baselines
6/24/99
14
DMZ monitoring
• FDDI probe monitors traffic coming in via ESnet,
data is read out at intervals (typically once/hour) and
logged to database.
• Reports are generated daily.
– Report on common protocol utilization and suspicious
use
– top 20 nodes, conversation pairs, reports by domain,
complete list of conversations
6/24/99
15
Residential & dialup services
Dialup/ Dialup /
ARA
PPP
DSLISDN Covad
144k,
384k,
128 1.5M /
kbps 384k
Max speed 33kbps 56kbps
Inside SLAC
Yes
Yes
Yes
Firewall
Clients
Mac
Any
Any
Opt.
Opt.
Location
Local Opt. Local Local
70
150
80
Users
12
46=>69
Ports
No
Any
~80%
BayArea
14
Campus
DSL-PBI
384k /
128k, 1.5M
/ 384k
No
Any
~60%
BayArea
2
ISP
• Use PPTP VPN for security, have NT, Win98,
Mac clients, also useful for travelers
6/24/99
16
Utilization
Tracking use, keeping logs for more detailed auditing
6/24/99
17
WAN Challenges
•
•
•
•
No single management responsible for Internet
Exponential growth
HEP critically dependent on WAN for collaborations
HEP/Research & Education competing with
commodity usage in many cases
• Internet extremely complex, changing rapidly,
internal behavior hard to predict
• HEP use is very diverse, collaborators, vendors,
services
6/24/99
18
Connectivity
• Use of ESnet link (43Mbps) up by factor of 2 in last
4 months
– 5 minute averages up to 10-15 Mbps / direction fairly
typical/ day
– In process of upgrading to 155 Mbps
• 40Gbytes/day IP traffic, roughly 50% TCP, 50%
UDP
– FTP, AFS, ssh, http, xwin are top protocols
• Campus link just upgraded from 10 Mbps to 155
Mbps
• Working to get NTON reconnected
6/24/99
19
Internet End-to-end Monitoring
www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
• Within ESnet connectivity excellent, Internet 2
good, after that only acceptable to poor
• Monitor to set “user” expectations, help with
problem detection, get planning information &
trends, identify problem areas, optimize routes
• Collaborative effort to provide HEP-wide & ESnet
wide monitoring requested by ICFA, ESnet
– Partially funded by DOE/MICS FWP
– Involves many HEP sites, led by SLAC & HEPNRC
6/24/99
20
Main tool (PingER) currently
uses Ping
• Treats Internet as black box
• Provides useful real world measures of network
round trip response time, loss, reachability, jitter
• Low cost/lightweight tool
– ping “universally available”, easy to understand
• no software for clients to install
• no special privileges needed for monitor sites
– resources: 100bps/link, ~600kBytes/month/link
• Agrees well with more complex measurements
6/24/99
21
Extent of measurements
• 18 Monitoring sites - 7 in US (5 ESnet, 2
vBNS), 2 in Canada, 7 in Europe (ch, de,
dk, hu, it, uk(2)), 2 in Asia (jp, tw)
• 1261 monitoring-remote-site pairs
PingER pair distribution by
global area
• 379 unique hosts, 272 sites
South
Russian
America
Fed
1%
• 50 beacon sites, 27 countries
4%
Edu
Japan
33%
3%
• Metrics include response, jitter, loss,
reachability
Europe
Com
38%
2%
Gov
• Data goes back > 4 years
7%
China
Mil
2%
Org 0%
• 1 Million probes of Internet / day Canada Australasia Asia 1%
5%
6/24/99
1%
2%
22
Results 1/2
% median monthly
packet loss
Comparison of median packet loss
for Mar-99 for various communities
10
75%
1
median
25%
0.1
0.01
6/24/99
Community
ESnet vBNS XIWT ELab ESnet (31) vBNS (18) XIWT (140) ELab (14)
23
Results 2/2
TCP bandwidth < (1470/RTT) * (1/sqrt(loss))
Bandwidth in kbytes/sec
10000
1000
Canada (18 pairs)
Edu/US (138 pairs)
ESnet (31 pairs)
Japan (12 pairs)
Europe (95 pairs)
100% improvement / year
Expon. (ESnet (31 pairs))
Expon. (Europe (95 pairs))
Expon. (Edu/US (138 pairs))
Expon. (Canada (18 pairs))
Expon. (Japan (12 pairs))
100
10
Jun-94
6/24/99
Oct-95
Mar-97
Jul-98
Dec-99
24
Email
• Gateway processes about 40K msg/day (growing
25% / year, doubled since last review)
– monitor & alerts (email, pager) on exceptions
– 95% trivial email delivered in < 1 min
• ~ 2700 email users
–
–
–
–
–
Support generic addresses fname.lname or [email protected]
700 POP users, 30 IMAP, Quickmail gone, VM gone
separated IMAP & POP servers
dedicated internal SMTP server
IMAP pilot - Netscape & pine most popular clients
• See6/24/99
www.slac.stanford.edu/comp/net/email/futures.html
25
Current Email system
VAX clusters
SLACAX SSRL SLD SLC
Offsite
Mail
gateways
Redundant
mail servers
SMTP
Listserv
PC/Mac
Email users
SMTP
SMTP
SMTP
Serv01..2
Non-authenticated relay
SMTP
SMTP
SMTPserv
Eudora
SMTP
Screening
Router
Netscape
NFS
SMTP
Unix
Email users
NFS mode
6/24/99
NFS
NFS
POPserv
IMAPserv
POP
&
IMAP
Outlook
Unix
Problem areas in red: cleartext
NFS server
IMAP users
cleartext passwords
NFS mounted spool
non-authenticated SMTP
26
Pine
Backup
Proposed Email System
VAX clusters
SLACAX SSRL SLD SLC
Offsite
Mail
gateways
Redundant
mail servers
SMTP
Listserv
PC/Mac
Email users
SMTP
SMTP
SMTP
SMTP
SMTPserv
Authenticated SMTP
SMTP
Screening
Router
Eudora
SMTP
Netscape
POPserv
IMAPserv
NFS
SMTP
Unix
Email users
NFS mode
6/24/99
NFS
NFS server
POP
&
IMAP
SSL
Outlook
Unix
IMAP users
Pine
27
Mail list server (majordomo)
• 215 lists (up from 155 last review)
• On a separate server
• Have web forms for requesting lists, maintaining
subscriptions and querying the lists
6/24/99
28
Spam / Viruses
• Actively provide anti-spam support:
– last review was growing (factor of 16 in 9 months) up to
40 spam actions/week
– now stable ~ 10 actions/week
– ~ 2100 sites blocked (was 90 two years ago)
– prepared to restore domain upon user request
• Since Melissa remove any Excel or Word
attachment with a macro on SLAC incoming email
• Also strip out well known viruses / worms (e.g.
happy99, explore.zip.exe)
6/24/99
29
Dynamic Host Configuration Protocol
• Provide DHCP for fixed hosts & roamers
• Tension between easy walk-up use & security
– require registration for accountability
• this is for connection inside the site firewall
• an issue is whether to provide anonymous DHCP outside
firewall (i.e. what you are using today)
– seek guidance on how to strike the balance
6/24/99
30
DHCP
• Is in production but barely
• Web forms for adding to DHCP database
– needs to allow editing, deleting, more restrictive
availability, better integration with Enterprise DB
• Work in progress or queued
–
–
–
–
automate log file pruning, restrict who can register hosts
convert to use Enterprise DB as master
convert DHCP server from SunOS to Solaris
increase information logged about user, location etc.
• Needs resources (aka part of new hire) to focus on it
and fix current problems
6/24/99
31
Lightweight Directory
Access Protocol
• Microsoft is embracing LDAP in Windows 2000
• Email vendors are migrating towards password DBs
in LDAP
• Have an LDAP-v3 server
– loaded with the SLAC user directory information,
– read only at the moment
• Starting to coordinate with other HEP labs (e.g.
CERN), there is a HEP LDAP email list
6/24/99
32
News, NTP, DNS
• News down to 20 groups, out-sourced to campus
• NTP: driven from GPS on-site
• DNS: driven from Oracle network database of hosts
6/24/99
33
VMS central support
• Driven by SLD, has SLACVX for SLD offline
– AlphaServer 8400 + 10 smaller alphas & + 6 VAXes (for
X support hosts & legacy code)
– ~ 6000 SpecInt92
– 500 Gbytes disk, RAID controller, STK connection
– HSM, Oracle etc.
– Software & hardware basically stable
– Supported by SCS staff (~0.5FTE)
– Support folks autopaged
6/24/99
34
Advanced technology exploration
• ESnet IPv6 collaborator
• NGI proposal (Particle Physics Data Grid) and high
performance WAN networking (China Clipper)
• NTON project (480 Mbps disk to application
SLAC<>LBNL)
• Internet monitoring (IEPM)
• VoIP pilot with CERN, FNAL, DESY, ESnet/LBNL
6/24/99
35
Major challenges
•
•
•
•
Tracking topology & configuration
Monitoring a switched network
Staying at right point in technology curve
Constraining complexity,
– phase out of legacies, Appletalk, Macs, DECnet IV,
FDDI (user resistance)
– embracing new needs: e.g. VPNs, xDSL, IPv6, video,
VoIP, IMAP, DHCP, QoS, new routing protocols
6/24/99
36
Major challenges
• Balancing security vs. usability & simplicity
• Increasing purposes for and dependence on the net
– video, VoIP, multicast
– outages hard to schedule, upgrades hard to do
• Finding & keeping staff
6/24/99
37
Summary
• LAN: well positioned, architecture scales, follows
industry practices, will need continued growth
• WAN: little control, yet must understand, track,
monitor and collaborate with others inside & outside
HEP, nationally & internationally
• VMS: central support reducing, stable, goes away
with SLD
• Network services, technologies & protocols keep
emerging
• People / skills resources are major gating factor
6/24/99
38