Networking Update for ITSC
Download
Report
Transcript Networking Update for ITSC
University of Washington
Computing & Communications
Networking Update
Terry Gray
Director, Networks & Distributed Computing
University of Washington
UW Medicine IT Steering Committee
16 January 2004
20 February 2004
University of Washington
Computing & Communications
Outline
• In our last episode…
– Context
– Expanded Partnership
– Recent Problems
• Today
– Systemic Problems and Progress
– Network Security Chronology
– Design Issues
University of Washington
Computing & Communications
Context: A Perfect Storm
Increased dependency on network apps
Decreased tolerance for outages
Decades of deferred maintenance...
Inadequate infrastructure investment
Some old/unfortunate design decisions
Some extraordinarily fragile applications
Fragmented host management
Increasingly hostile security environment
Increasing legal/regulatory liability
Importance of research/clinical leverage
University of Washington
Computing & Communications
Key Elements of the Partnership
Changed: C&C now responsible for...
In-building network implementation and
operational support for med ctrs, clinics
Med center network design “for real”
Not Changed: C&C still responsible for...
Network backbone, routers
Regional and Internet connectivity
SoM and Health Sciences networking
University of Washington
Computing & Communications
Why the Partnership Makes Sense
Consistency, interoperability, manageability
Leverage C&C networking expertise
Clinical/research hi-performance network needs
24x7 Network Operations Center (NOC)
Advanced network management tools
Avoid design/build organizational conflicts
Beyond the network...
hope to share distributed system architecture
and network computing expertise
University of Washington
Computing & Communications
Recent Problems
Oct 29: Partial router failure reveals
escalation procedure problems
Oct 30: Security breach triggers
connectivity and server problems
Nov 12: 13 minute power outage triggers
extended server outage
Dec 12: Router upgrade uncovers wiring
error, which triggers multicast storm
(None of these were related to the network
transition, save perhaps timing of #4)
University of Washington
Computing & Communications
System Elements
Environmentals (Power, A/C, Physical Security)
Network
Client Workstations
Servers
Applications
Personnel, Procedures, Policy, and Architecture
Failures at one level can trigger problems at
another level; need Total System perspective
University of Washington
Computing & Communications
Reasonable Questions
What’s up with C&C’s alarm system vendor?
If power was out for only 14 minutes, why
was service out for multiple hours?
What can we say about an app so fragile that
a net interruption of a few seconds requires a
server reboot?
What can we say about thin clients built on
top of thick (WinXP) operating systems?
What can we say about a network where one
wiring fault can disable most of the net?
University of Washington
Computing & Communications
Systemic Problems and Progress
University of Washington
Computing & Communications
Systemic Network Problems
(NB: these pre-date Tom et al)
Old infrastructure (e.g cat 3 wire)
Non-supportable technologies (e.g. FDDI)
Non-supportable (non-geographic) topology
Expensive shortcuts (e.g. cat5 mis-terminated)
Security based on individual IP addresses
Subnets with clients and critical servers
Documentation deficiency
Contact database
Device location database
Critical device registry
University of Washington
Computing & Communications
Systemic General Problems
Ever-increasing system complexity, dependencies
Departmental autonomy
Un-controlled hosts
Un-reliable power and A/C in equipment rooms
No net-oriented application procurement standards
Are HA and DRBR expectations realistic?
Are backup plans workable?
University of Washington
Computing & Communications
Some Numbers
UW Total
(incl UW
Medicine)
Subnets
1022
Devices
70,000
Health
Sciences
Medical
Centers
(incl SoM)
52
>8,000
145
10,000
University of Washington
Computing & Communications
Network Device Growth
Note: Most dips reflect lower summer use; last one is a measurement anomaly
University of Washington
Computing & Communications
Network Traffic Growth (linear)
University of Washington
Computing & Communications
Network Traffic Growth (log)
University of Washington
Computing & Communications
Near-term Progress and Plans
Agreement on standard maintenance window
Created “Top 10” list --creeping to Top 20 :)
Static addressing work-around (success!)
FDDI, VLAN elimination
Subnet splits/upgrades (1500 computers)
Equipment upgrades
Router consolidation, dedicated subnets,
separate med center backbone
Equipment, outlet location database updates
Initial wireless deployment
University of Washington
Computing & Communications
Design Review and Cost Estimates
Biggest cost: physical infrastructure &
wireplant upgrades
NetVersant engaged for cost estimation project
Cisco engaged for network architecture review
We recommend similar reliability/design
assessment for servers, apps & procedures
University of Washington
Computing & Communications
Design Issues
University of Washington
Computing & Communications
Design Tradeoffs
Networks = Connectivity; Security = Isolation
Fault Zone size vs. Economy/Simplicity
Reliability vs. Complexity
Prevention vs. (Fast) Remediation
Security vs. Supportability vs. Functionality
Differences in NetSec approaches relate to:
Balancing priorities (security vs. ops vs. function)
Local technical and institutional feasibility
University of Washington
Computing & Communications
Tradeoff Examples
• Defense-in-depth conjecture (for N layers)
– Security:
MTTE (exploit)
N**2
– Functionality: MTTI (innovation) N**2
– Supportability: MTTR (repair)
N**2
• Perimeter Protection Paradox (for D devices)
– Firewall value D
– Firewall effectiveness 1 / D
• Border blocking criteria
– Threat can’t reasonably be addressed at edge
– Won’t harm network (performance, stateless block)
– Widespread consensus to do it
• Security by IP address
University of Washington
Computing & Communications
Network Security Credo
• Focus first on the edge
(Perimeter Protection Paradox)
• Add defense-in-depth as needed
• Keep it simple (e.g. Network Utility Model)
• But not too simple (e.g. offer some policy choice)
• Avoid
– one-size-fits-all policies
– cost-shifting from “guilty” to “innocent”
– confusing users and techs (“broken by design”)
University of Washington
Computing & Communications
Preserving the Net Utility Model
•
•
•
•
•
What is it?
Why important?
Incompatible with perimeter security?
Too late to save?
NUM-preserving perimeter defense
– Logical Firewalls
– Project 172
• Foiled by static IP addressing…
– Requires all hosts be reconfigured
University of Washington
Computing & Communications
Lines of Defense
•
•
•
•
•
•
Network isolation for critical services.
Host integrity. (Make the OS is net-safe.)
Host perimeter. (Add host firewalling)
Server sanctuary perimeter.
Network perimeter defense.
Real-time attack detection and containment.
University of Washington
Computing & Communications
Network Security Chronology
•
•
•
•
•
•
•
•
•
•
•
•
•
1990: Five anti-interoperable networks
1994: Nebula shows network utility model viable
1998: Defined border blocking policy
2000: Published Network Security Credo
2000: Added source address spoof filters
2000: Proposed med ctr network zone
2000: Proposed server sanctuaries
2001: Ban clear-text passwords on C&C systems
2001: Proposed pervasive host firewalls
2001: Developed logical firewall solution
2002: Developed Project-172 solution
2003: Slammer, Blaster… death of the Internet
2003: Developed flex-net architecture
University of Washington
Computing & Communications
Next-Gen Network Architecture
Parallel networks; more redundancy
Supportable (geographic) topology
Med center subnets = separate backbone zone
Perimeter, sanctuary, and end-point defense
Higher performance
High-availability strategies
Workstations spread across independent nets
Redundant routers
Dual-homed servers
University of Washington
Computing & Communications
Success Metrics
Tom’s
Nobody gets hurt
Nobody goes to jail
Terry’s
“Works fine, lasts a long time”
Low ROI (Risk Of Interruption)
Steve’s
Four Nines or bust!
University of Washington
Computing & Communications
Success Metrics II
We all want:
High MTTF, Performance and Function
Low MTTR and support cost
The art is to balance those conflicting goals
we are jugglers and technology actuaries
University of Washington
Computing & Communications
Success Metrics III
How many nines?
Problem one: what to measure?
How do you reduce behavior of a complex net to a
single number?
Difficult for either uptime or utilization metrics
Problem two: data networks are not like phone
or power services…
Imagine if phones could assume anyone’s number
Or place a million calls per second!
University of Washington
Computing & Communications
Concerns, Future Challenges
Mitigating impact of closed networking:
Needs of the many vs. needs of the few
Pressure to make network topology match
administrative boundaries
Complex access lists
False sense of security
Increased MTTR
Next-generation threats: firewalls won’t help
Security vs. High-Performance
Wireless
Balancing innovation, operations, & security
University of Washington
Computing & Communications
Lessons
Five 9s is hard (unless we only attach phones?)
Even host firewalls don’t guarantee safety
Perimeter firewalls may increase user confusion, MTTR
Nebula existence proof: security in an open network
Even so… defense-in-depth is a Good Thing
It only takes one compromise inside to defeat a firewall
Controlling net devices is hard --hublets, wireless
The cost of static IP configuration is very high
Net reliability & host security are inextricably linked
Never underestimate non-technical barriers to progress
University of Washington
Computing & Communications
Questions? Comments?