Networking Update for ITSC

Download Report

Transcript Networking Update for ITSC

University of Washington
Computing & Communications
Networking Update
Terry Gray
Director, Networks & Distributed Computing
University of Washington
UW Medicine IT Steering Committee
16 January 2004
20 February 2004
University of Washington
Computing & Communications
Outline
• In our last episode…
– Context
– Expanded Partnership
– Recent Problems
• Today
– Systemic Problems and Progress
– Network Security Chronology
– Design Issues
University of Washington
Computing & Communications
Context: A Perfect Storm










Increased dependency on network apps
Decreased tolerance for outages
Decades of deferred maintenance...
Inadequate infrastructure investment
Some old/unfortunate design decisions
Some extraordinarily fragile applications
Fragmented host management
Increasingly hostile security environment
Increasing legal/regulatory liability
Importance of research/clinical leverage
University of Washington
Computing & Communications
Key Elements of the Partnership
 Changed: C&C now responsible for...
 In-building network implementation and
operational support for med ctrs, clinics
 Med center network design “for real”
 Not Changed: C&C still responsible for...
 Network backbone, routers
 Regional and Internet connectivity
 SoM and Health Sciences networking
University of Washington
Computing & Communications
Why the Partnership Makes Sense







Consistency, interoperability, manageability
Leverage C&C networking expertise
Clinical/research hi-performance network needs
24x7 Network Operations Center (NOC)
Advanced network management tools
Avoid design/build organizational conflicts
Beyond the network...
hope to share distributed system architecture
and network computing expertise
University of Washington
Computing & Communications
Recent Problems
 Oct 29: Partial router failure reveals
escalation procedure problems
 Oct 30: Security breach triggers
connectivity and server problems
 Nov 12: 13 minute power outage triggers
extended server outage
 Dec 12: Router upgrade uncovers wiring
error, which triggers multicast storm
(None of these were related to the network
transition, save perhaps timing of #4)
University of Washington
Computing & Communications
System Elements






Environmentals (Power, A/C, Physical Security)
Network
Client Workstations
Servers
Applications
Personnel, Procedures, Policy, and Architecture
Failures at one level can trigger problems at
another level; need Total System perspective
University of Washington
Computing & Communications
Reasonable Questions
 What’s up with C&C’s alarm system vendor?
 If power was out for only 14 minutes, why
was service out for multiple hours?
 What can we say about an app so fragile that
a net interruption of a few seconds requires a
server reboot?
 What can we say about thin clients built on
top of thick (WinXP) operating systems?
 What can we say about a network where one
wiring fault can disable most of the net?
University of Washington
Computing & Communications
Systemic Problems and Progress
University of Washington
Computing & Communications
Systemic Network Problems
(NB: these pre-date Tom et al)







Old infrastructure (e.g cat 3 wire)
Non-supportable technologies (e.g. FDDI)
Non-supportable (non-geographic) topology
Expensive shortcuts (e.g. cat5 mis-terminated)
Security based on individual IP addresses
Subnets with clients and critical servers
Documentation deficiency
 Contact database
 Device location database
 Critical device registry
University of Washington
Computing & Communications
Systemic General Problems





Ever-increasing system complexity, dependencies
Departmental autonomy
Un-controlled hosts
Un-reliable power and A/C in equipment rooms
No net-oriented application procurement standards
 Are HA and DRBR expectations realistic?
 Are backup plans workable?
University of Washington
Computing & Communications
Some Numbers
UW Total
(incl UW
Medicine)
Subnets
1022
Devices
70,000
Health
Sciences
Medical
Centers
(incl SoM)
52
>8,000
145
10,000
University of Washington
Computing & Communications
Network Device Growth
Note: Most dips reflect lower summer use; last one is a measurement anomaly
University of Washington
Computing & Communications
Network Traffic Growth (linear)
University of Washington
Computing & Communications
Network Traffic Growth (log)
University of Washington
Computing & Communications
Near-term Progress and Plans







Agreement on standard maintenance window
Created “Top 10” list --creeping to Top 20 :)
Static addressing work-around (success!)
FDDI, VLAN elimination
Subnet splits/upgrades (1500 computers)
Equipment upgrades
Router consolidation, dedicated subnets,
separate med center backbone
 Equipment, outlet location database updates
 Initial wireless deployment
University of Washington
Computing & Communications
Design Review and Cost Estimates
 Biggest cost: physical infrastructure &
wireplant upgrades
 NetVersant engaged for cost estimation project
 Cisco engaged for network architecture review
 We recommend similar reliability/design
assessment for servers, apps & procedures
University of Washington
Computing & Communications
Design Issues
University of Washington
Computing & Communications
Design Tradeoffs





Networks = Connectivity; Security = Isolation
Fault Zone size vs. Economy/Simplicity
Reliability vs. Complexity
Prevention vs. (Fast) Remediation
Security vs. Supportability vs. Functionality
Differences in NetSec approaches relate to:
 Balancing priorities (security vs. ops vs. function)
 Local technical and institutional feasibility
University of Washington
Computing & Communications
Tradeoff Examples
• Defense-in-depth conjecture (for N layers)
– Security:
MTTE (exploit)
 N**2
– Functionality: MTTI (innovation)  N**2
– Supportability: MTTR (repair)
 N**2
• Perimeter Protection Paradox (for D devices)
– Firewall value  D
– Firewall effectiveness  1 / D
• Border blocking criteria
– Threat can’t reasonably be addressed at edge
– Won’t harm network (performance, stateless block)
– Widespread consensus to do it
• Security by IP address
University of Washington
Computing & Communications
Network Security Credo
• Focus first on the edge
(Perimeter Protection Paradox)
• Add defense-in-depth as needed
• Keep it simple (e.g. Network Utility Model)
• But not too simple (e.g. offer some policy choice)
• Avoid
– one-size-fits-all policies
– cost-shifting from “guilty” to “innocent”
– confusing users and techs (“broken by design”)
University of Washington
Computing & Communications
Preserving the Net Utility Model
•
•
•
•
•
What is it?
Why important?
Incompatible with perimeter security?
Too late to save?
NUM-preserving perimeter defense
– Logical Firewalls
– Project 172
• Foiled by static IP addressing…
– Requires all hosts be reconfigured
University of Washington
Computing & Communications
Lines of Defense
•
•
•
•
•
•
Network isolation for critical services.
Host integrity. (Make the OS is net-safe.)
Host perimeter. (Add host firewalling)
Server sanctuary perimeter.
Network perimeter defense.
Real-time attack detection and containment.
University of Washington
Computing & Communications
Network Security Chronology
•
•
•
•
•
•
•
•
•
•
•
•
•
1990: Five anti-interoperable networks
1994: Nebula shows network utility model viable
1998: Defined border blocking policy
2000: Published Network Security Credo
2000: Added source address spoof filters
2000: Proposed med ctr network zone
2000: Proposed server sanctuaries
2001: Ban clear-text passwords on C&C systems
2001: Proposed pervasive host firewalls
2001: Developed logical firewall solution
2002: Developed Project-172 solution
2003: Slammer, Blaster… death of the Internet
2003: Developed flex-net architecture
University of Washington
Computing & Communications
Next-Gen Network Architecture






Parallel networks; more redundancy
Supportable (geographic) topology
Med center subnets = separate backbone zone
Perimeter, sanctuary, and end-point defense
Higher performance
High-availability strategies
 Workstations spread across independent nets
 Redundant routers
 Dual-homed servers
University of Washington
Computing & Communications
Success Metrics
 Tom’s
 Nobody gets hurt
 Nobody goes to jail
 Terry’s
 “Works fine, lasts a long time”
 Low ROI (Risk Of Interruption)
 Steve’s
 Four Nines or bust!
University of Washington
Computing & Communications
Success Metrics II
 We all want:
 High MTTF, Performance and Function
 Low MTTR and support cost
 The art is to balance those conflicting goals
 we are jugglers and technology actuaries
University of Washington
Computing & Communications
Success Metrics III
 How many nines?
 Problem one: what to measure?
 How do you reduce behavior of a complex net to a
single number?
 Difficult for either uptime or utilization metrics
 Problem two: data networks are not like phone
or power services…
 Imagine if phones could assume anyone’s number
 Or place a million calls per second!
University of Washington
Computing & Communications
Concerns, Future Challenges
 Mitigating impact of closed networking:
 Needs of the many vs. needs of the few
 Pressure to make network topology match
administrative boundaries
 Complex access lists
 False sense of security
 Increased MTTR




Next-generation threats: firewalls won’t help
Security vs. High-Performance
Wireless
Balancing innovation, operations, & security
University of Washington
Computing & Communications
Lessons










Five 9s is hard (unless we only attach phones?)
Even host firewalls don’t guarantee safety
Perimeter firewalls may increase user confusion, MTTR
Nebula existence proof: security in an open network
Even so… defense-in-depth is a Good Thing
It only takes one compromise inside to defeat a firewall
Controlling net devices is hard --hublets, wireless
The cost of static IP configuration is very high
Net reliability & host security are inextricably linked
Never underestimate non-technical barriers to progress
University of Washington
Computing & Communications
Questions? Comments?