Caging the Wild Resnet

Download Report

Transcript Caging the Wild Resnet

Caging the Wild Resnet
Dennis Rice, Director, Infrastructure Support Group
Dave Edick, Systems Administrator
Computer And Technology Services,
Saint Mary’s College of California
Overview
• Our Residential Network suffered frequent
outages locally and globally due to the actions of
users, and infected and defective computers
• User equipment and practices difficult to control.
• Many solutions were looked at; Most required
client installation etc. – Intrusive, user technical
skills low, and required resources we didn’t have
• Novel solution proposed by one of our senior
engineers: Use capabilities of Layer 3 switching
to eliminate the problems of shared networks.
Saint Mary’s College of California
• Located in Moraga, California (East Bay).
• Founded in San Francisco in 1863 by
Archbishop Joseph Alemany.
• Operated by the De La Salle Christian Brothers
since 1868.
• Current Campus in Moraga built and occupied in
1927 and 1928.
• Current student enrollment approximately 4400
full and part time undergraduate and graduate
students
• Full and Part-time Staff and faculty number
around 980
SMC Residence Halls
•
•
•
•
1660 Residential Beds
22 Residence Halls
Most have two students per room.
Freshmen guaranteed housing, Sophomores
can renew; Juniors and Seniors in a Lottery for
remaining beds
• Students buy and maintain (or more regularly fail
to maintain) their own computers – no in-house
service for non-SMC owned computers
Residential Network (ResNet)
• Up to 2000 Network access in the resident
halls was via dial-up
• Summer of 2000:
– First fiber optic infrastructure and internal
Ethernet wiring installed to two newly built
residence halls and two older dorms – only
dorms with internal Ethernet wiring.
– The rest: TUT
Residential Network (ResNet)
• Tut Systems Long Run Technology - Permitted using
existing copper phone line infrastructure to carry data to
each telephone jack along with voice. 1MB bandwidth
• Gift to College
• Required modem
• 1350 beds outfitted with TUT modems
that summer
• Better than dial-up, but failure prone
• Very time-consuming to service and
keep running
• We used TUT High-speed DSL
modems for uplinks to network core
Residential Network (ResNet)
• After 2000, gradually started replacing
TUT with Ethernet and fiber optic
infrastructure in residence halls.
• Started wiring residential halls one by one
for Fast Ethernet.
• Extended fiber cable runs area by area as
the work fit into other improvement
projects funded by Bond.
Problems
• User generated problems started showing up in
this mixed architecture. Entire residence Halls or
larger areas of ResNet would experience
outages for a variety of reasons. This kept us
hopping.
– User connects a wireless router to ResNet
backwards, paralyzing the hall network because two
DHCP servers are giving out addresses. Difficult to
find in a 100 bed dorm
– User loops a Layer 2 switch by plugging the same
cable into both data jacks in their room, causing traffic
overloads in network backbone.
– Illegal routing broadcasts caused switches to
intermittently stop passing traffic.
Problems
• Effects of Malware Infections
– Broadcast and ARP storms from one or two
infected computers could fill up the bandwidth
of a TUT DSL uplink, blocking all other traffic.
– Easy worm propagation over shared network
• students ignorant of measures to protect their
computers
– Authentication Appliance (Bluesocket) also
vulnerable to actions of malware exploits,
affecting the whole Residential Network
Welshia
• August 2003 “Blaster” worm and variant
“Welshia” released on an unsuspecting world.
• August - September 2003: SMC Residential
students arrive with computers already infected
with Welshia, bypassing firewall etc.
– Worm rapidly spreads over entire Residential network
and to academic-staff network
– Infected computers generate ever larger packet
storms.
– ResNet grinds to a halt.
Welshia
• Severe restrictions (and patches from
Bluesocket) enabled us to return some limited
network functioning after a one day complete
outage, but insufficient for academic mission.
• All other IT projects and most services halted to
devote time of entire staff to cleanup of
residential students’ computers.
• Cleanup lasted into November.
• Damaged the confidence of faculty in web and
network based learning technologies as
students struggled with a poorly functioning
computers and network during half a semester.
What to do?
• Welshia outbreak and difficult recovery
focused our attention on the vulnerability of
ResNet to interruption by unprotected and
unmanaged student computers, the majority
of which are owned and maintained by
users with minimal knowledge or interest in
properly securing them.
What to do?
• Three paths clearly identified for action
– Policy
– Publicity
– Technology
• Several vendor and open source solutions
explored
– Client-based Network Access Control systems
• Example: Cisco Secure Access
– IPS/IDS systems for perimeter security
• Example: Tipping Point
What to do?
• Compliance solutions presented many
difficulties
– Disruption at beginning of Fall term
• all computers would have to install client
• Large additional support load on Help Desk at
busiest time of year.
– Computers that did not comply would have to
be fixed – no staff to do this task.
– Potential for frustration and more bad publicity
high.
Inspiration
• One of our Senior Systems Administrators,
Dave Edick, thought about these problems
and came up with a unique solution, one
that promised:
– the elimination of the means by which our
network could be made to fail or be
compromised.
– did not rely on any compliance by the users or
cause any disruption to their use of the
Residential Network.
The Idea
• Use the capabilities of Layer 3 switching to
subnet each jack in every Resnet room
– Broadcast traffic limited to jack network and
local switch; packet storms eliminated
– DHCP provided in local switch
– ACL’s block dangerous ports
– Any misconnected personal equipment (like a
wireless router) can only affect the local jack,
rather than a whole building.
Other Benefits of Layer 3 switching
• L3 switching makes sniffing useless. Sniffer will
only see their own traffic. Not even broadcasts
from other machines in the ResNet can be seen
• L3 switching prevents one user from
impersonating another user.
• L3 blocks non IP traffic (ipx, netbios, etc) which
would be considered "unauthorized".
• L3 prevents users accessing open file shares on
other user's machines.
• Hackers can’t capture packets by overloading
switch tables (common trick to compromise L2
switches)
Network Design
• Criteria for switch selection
–
–
–
–
Layer 3 capable
Able to provide local DHCP and VLANS
Robust and reliable
24 10/100 ports, two gigabit ports (must have LX, SX
and gigabit twisted pair GBIC’s available) in a 1RU
package as the basic building block
– Stackable at gigabit speeds
– Able to store and run a large amount of code (current
configuration is over 400 lines).
– Must fit in a “Hubbell can.”
Switch selected: Cisco 3550
Hubbell ReBox Wall Enclosure with
Mounted Cisco Switch
Network Design
• Additional elements
– Leverage installed and planned installation of fiber
optic cable to every residence hall to provide gigabit
speed up link connections to core.
– Eliminate choke point of Bluesocket authentication
appliance - Authentication no longer needed.
• Ports and room jacks to be renumbered to be identical with
the assigned phone extension
• IP address ranges assigned to each jack can be
cross checked to identify user for DMCA etc.
Switch Configuration
• Five sections of code:
– IP address range configuration
– Per port DHCP configuration
• Each port has five usable addresses
– Port IP configuration
– Incoming packet access control list
– Outgoing packet access control list
• Configuration is scripted
Configuration Scripting (in English)
• get switch name, subnet number, and list of
phone extensions
• write out basic IP config substituting subnet (for
management ip 4th octet) and switch name
• write out both acls
• loop for each port (1-24)
– gateway=(port-1)*8+1
– get phone extension from list
– write port config substituting port, subnet, gateway,
and phone number
– write dhcp config substituting port, subnet and
gateway
• repeat
Configuration examples:
• IP block configuration
ip subnet-zero
ip routing
ip domain-name stmarys-ca.edu
ip name-server 10.1.1.101
ip name-server 10.1.1.102
!
interface Vlan1
ip address 10.1.127.[subnet] 255.255.255.0
!
ip classless
ip route 0.0.0.0 0.0.0.0 10.1.127.254
Configuration examples:
• DHCP configuration
ip dhcp excluded-address 10.1.[subnet].1
!
ip dhcp pool port-[port].dhcp
network 10.1.[subnet].[gateway]
255.255.255.248
default-router 10.1.[subnet].[gateway]
domain-name stmarys-ca.edu
dns-server 10.1.1.101 10.1.1.102
Configuration examples:
• Port IP Configuration
interface FastEthernet0/[port]
description [phone]
no switchport
ip address 10.1.[subnet].[gateway]
255.255.255.248
ip access-group 101 in
ip access-group 102 out
Configuration examples:
• Incoming packet access control list
access-list 102 remark access list for incoming packets
access-list 102 permit ip host [insert file server address] any
access-list 102 deny udp any any eq netbios-ns
access-list 102 deny udp any any eq netbios-dgm
access-list 102 deny udp any any eq netbios-ss
access-list 102 deny udp any any eq 445
access-list 102 deny tcp any any eq 137
access-list 102 deny tcp any any eq 138
access-list 102 deny tcp any any eq 139
access-list 102 deny tcp any any eq 445
access-list 102 permit ip any any
Configuration examples:
• Outgoing packet access control list
access-list 101 remark outgoing packet access list
access-list 101 permit tcp any any established
access-list 101 permit tcp any host [insert mail server address] eq smtp
access-list 101 permit ip any host [insert file server address]
access-list 101 deny tcp any any eq smtp
access-list 101 deny tcp any any eq 137
access-list 101 deny tcp any any eq 138
access-list 101 deny tcp any any eq 445
access-list 101 deny udp any any eq netbios-ns
access-list 101 deny udp any any eq netbios-dgm
access-list 101 deny udp any any eq netbios-ss
access-list 101 permit ip any any
Monitoring and Backup
• Scripts running on Network Services
server
– Status Page with a table of ports for each
switch.
• Port number and corresponding 4 digit data and
phone number assigned to jacks in room.
• Port link status (active connection or not)
• Port IP info - Gateway and usable addresses
– Configuration backup script – TFTP server
• Telnet’s into each switch and TFTP’s the
configuration back to the server.
Monitoring and Backup
• Port status page
Monitoring and Backup
• Port and Switch on-line reset utility
Finances
• The project required over 100 switches, plus
GBIC’s and other equipment, cables etc.
• With the help of our Advancement office and led
by our CTO, Ed Biglin, we wrote a grant request
to the Fletcher-Jones Foundation in June of
2004. Fortunately, the project was funded by
them at $250,000.
• This amount purchased the needed equipment,
but there was no funding for implementation
labor.
Installation
• Because of the small amount of internal
resources available to this project, installation
was one residence hall at a time
• Installation of switches was originally scheduled
to be launched and completed over the Summer
break of 2005
– Problems and scheduling conflicts were encountered
– Twelve residence halls completed by beginning of Fall
term 2005
– Fiber optic infrastructure build out completed and the
last TUT equipped dorms rewired for Fast Ethernet by
the beginning of Fall term 2005 as well.
– Switch installation in all residence halls was finally
completed during Spring term 2006
Results
• The stability and reliability of the Residential
Network has improved dramatically, as has
customer satisfaction
• Time spent on ResNet network problems by
network engineers and techs has been reduced
tremendously.
– One tech said, “ResNet used to run me ragged. This
year, most of the calls have gotten me to help
individual students having trouble connecting, but
none of them involved tracking down dozens of
infected computers or bad router connections – the
sort of problem that used to take days or a week to
fix.”
Results
• Comparison of Trouble tickets from Fall of 2004
and Fall 2006.
– In Fall 2004, we had 36 tickets for building or larger
area outages – many of these represented multiple
customer calls.
– In Fall 2006, we had only two similar tickets, for a
floor rather than a building outage, both caused by
the failure of a switch that was quickly replaced.
• Comparison of student surveys also indicated a
large increase in satisfaction with network
performance.
Questions?