Firewalls at Rhodes

Download Report

Transcript Firewalls at Rhodes

Firewalls at Rhodes
David Siebörger
System Administrator, I.T. Division
• Why use firewalls, where?
• Open source firewall technologies
– About ipfw
– About dummynet
– About pf
– About altq
• Lessons learnt from ResNet
Where the firewalls are
Internet
Admin
Servers
Campus
network
ResNet
Why firewall the Internet?
• Force users to use our HTTP proxies and
SMTP relays
• Protect users’ PCs from the Internet
• Prevent users from using certain
applications or providing services
• Monitor traffic use
ResNet firewalls
• Each residence is a separate subnet on a
separate VLAN.
• A FreeBSD machine routes between
subnets, so its firewall can control all interres traffic.
Why firewall ResNet?
• Stop the spread of viruses/worms that use
network exploits to propagate.
– e.g., Nimda, Code Red, Sasser
– Though we try to force students to use
antivirus software and SUS, many don’t use it,
or don’t keep it up-to-date.
– So when a virus breaks out, it’s worst on
ResNet.
– We can’t predict when the next outbreak will
happen, but we’re better prepared for it.
Why firewall ResNet?
• Keep untrusted users away from important
machines.
• Prevent users running their own proxy
servers.
• Allows more detailed monitoring.
What we’re using
• FreeBSD as the
operating system
• Firewalls:
– ipfw
– pf
• Traffic shaping:
– dummynet
• Monitoring:
– home-grown scripts +
rrdtool
– net-snmpd + rrdtool +
Cacti
ipfw
• Appeared in FreeBSD 2.0 (1994)
• Quite easy to use:
– Numbered ruleset; first match wins
– deny tcp from any to me http
• Disadvantages:
– IPv6 support isn’t quite there yet.
– Some strange new syntax.
dummynet
• Originally intended as a network simulator.
• Creates pipes with bandwidth, delay,
packet loss parameters.
• ipfw rules direct traffic flows through the
pipes.
• For high-bandwidth (ab)users, we force
their traffic through a low-bandwidth.
pf
• OpenBSD dropped ipf, and developed pf
to replace it. pf has developed
substantially since.
• pf has since been imported into FreeBSD.
• Slightly counter-intuitive: last match wins.
• Otherwise elegant, regular syntax.
– block proto tcp to self port http
pf
• Interesting features:
– pfsync – synchronises dynamic rules across two
firewalls.
– carp – two firewalls bind the same IP address (a
patent-free alternative to VRRP).
– Combined, they allow failover within seconds.
– pfflowd – provides NetFlow traffic accounting.
– authpf – network access after authentication.
– scrubbing – tidy up fragments, etc.
– All-in-one configuration file, with macros, etc.
altq
• Handles queuing on particular interfaces.
• Has a number of algorithms:
– CBQ: Class-Based Queuing
• RED: Random Early Detection
• RIO: Random Early Drop
– HFSC: Hierarchical Packet Scheduler
– PRIQ: Priority Queuing
Firewall performance: a case
study
• The problem: our ResNet firewall is a chokepoint (by design), and we have lots of
bandwidth-hungry users on ResNet.
• When the firewall can’t handle the traffic,
network bandwidth lies idle and everyone
suffers.
• It’s most noticeable for interactive applications
– And unfair: the SSH users aren’t causing the problem.
Relevance in other situations
• But does firewall performance matter
when your Internet connection is < 10
Mbps?
• Yes:
– Local users might be able to DoS your
firewall.
– Some day, we will have more bandwidth
(SANREN?) and we’d rather design our
firewalls now to handle higher volumes.
Possible solutions
• Ignore the problem. We’d be providing
poor service to our paying customers.
• Reduce the demand for bandwidth. It
means creating restrictive rules, and it
starts an arms-race.
• Increase the available bandwidth. It’s
taken some work, but we’re ahead of
demand now.
Results
• From… (April 2005)
Flat-topped all day at
Flat-topped at ~22 kpps
~13.5 kpps in both directions in either direction
Results
• To… (now)
No flat-topping: short spikes up to ~75kpps in some cases.
Roughly bell-shaped over the day.
Lessons learnt
1. Monitor traffic for informed decisions
2. Look for bottlenecks elsewhere
3. Watch for interface errors
4. Reduce interrupt rate…
5. … but increase HZ
6. Consider PCI bandwidth
7. Reduce rule set size
8. Stateful rules save CPU
9. Performance testing is hard
10. Design for scalability
1. Monitor traffic to inform decision
making
• We’ve used a home-grown system and,
more recently, Cacti to monitor network
traffic flows.
• Cacti gathers data from a number of
sources (primarily SNMP) and stores and
graphs the data using rrdtool.
– http://www.cacti.net/
– http://www.rrdtool.org/
1. Monitor traffic to inform decision
making
• For the ResNet firewall, we graph
– bits/second,
– packets/second,
– errors/second on the Ethernet interface,
– bits/second on each VLAN,
– CPU utilisation,
– number of connected PCs.
• As well as the aggregation switches.
• (and much more)
2. Look for bottlenecks
elsewhere
• We found that some aggregation switches on
100 Mbps uplinks were saturating their uplinks.
• We’ve upgraded a number of them to 1 Gbps.
3. Watch for interface errors
• We were seeing up to 800 input errors per
second on the Ethernet interface.
systems@slug$ netstat -I em0
Name
Mtu Network
Address
Ipkts Ierrs
Op
em0
1500 <Link#1>
00:0c:f1:c3:78:99 2065687990 8457 154
• The machine wasn’t processing packets
fast enough, so the NIC was discarding
incoming packets.
3. Watch for interface errors
• Packet loss causes TCP to throttle back
large transfers, and makes SSH really
jittery.
• With every performance improvement
we’ve made, we’ve checked to see that
the number of input errors decreases.
4. Reduce interrupt rate…
• Generating an interrupt per packet is too
inefficient at > 100 Mbps.
– Can lead to live-lock.
• Device polling sets the NIC to not
generate interrupts, and the OS polls on
every clock interrupt.
• Certain device drivers (fxp, em, etc.) can
delay interrupt generation. Sometimes
called interrupt mitigation.
4. Reduce interrupt rate…
• The result is that the OS can receive
multiple frames from the NIC and process
them as a batch.
• Which saves context switch overheads.
5. … but increase HZ
• Clock interrupt frequency is determined by
the “HZ” value compiled into the FreeBSD
kernel.
• In FreeBSD 5.X, the default is 100.
• We found that polling reduced CPU
utilisation, but maximum PPS was still
limited.
– Either because the NIC’s buffers were full,
– Or the OS would only fetch so many at a time.
5. … but increase HZ
• In FreeBSD 6.X, the default has been
increased to 1000. Consensus seems to
be that 100 is too low for modern
computers, never mind routers.
• We’ve increased HZ to 2500 on our
firewall.
– Though we may be able to lower that now
since we’ve made other improvements.
6. Consider PCI bandwidth
• Using an Intel 1000baseSX NIC, we found
that we were limited to ~ 400 Mbps
inbound and ~ 400 Mbps outbound
(simultaneously).
• The NIC has a 64-bit, 66 MHz PCI-X
interface…
• But the Intel S875WP1 board only has 32bit, 33 MHz PCI slots.
6. Consider PCI bandwidth
• 32 × 33000000 = 1056 Mbps
• That’s the theoretical maximum, but bandwidth
is shared with everything else on the bus.
• The onboard NIC is connected via a
“Communications Streaming Architecture” link –
effectively a dedicated 32-bit, 66 MHz PCI bus.
• Using that interface has allowed us to peak at up
to 730 Mbps in both directions – possibly higher.
7. Reduce rule set size
• In ipfw, every packet is evaluated against
every rule until it matches.
– Which uses a lot of CPU time when you’re
handling tens of thousands of packets a
second.
• Reduce the number of rules.
• Optimise the rule set by placing frequently
matched rules higher, and/or skipping over
less frequently matched rules.
7. Reduce rule set size
• We started with 200 rules…
• Using skipto rules, we reduced the number
of rules that most packets would be tested
against to about 50.
• We’re now down to 28 rules, and almost
all packets will have matched after about
12 evaluations.
8. Stateful rules save CPU time
• Stateful/dynamic firewall rules create a
dynamic rule to allow a specific connection
as that connection is established.
• The firewall can search the dynamic rule
set far quicker than it can evaluate the
static rules.
• So the firewall only evaluates the static
rules once per connection.
9. Performance testing is hard
• Generating traffic with iperf, netperf,
tcpblast, etc. doesn’t do a very good job of
simulating the usage patterns of 1000s of
PCs.
• We have to try our changes on the live
system (but be fairly careful).
10. Design for scalability
• Sooner or later, our one machine will
reach its limits (probably PCI)
• We’re about to deploy a second firewall.
• The current firewall will advertise certain
prefixes into iBGP; the new firewall will
handle the rest.
• And we get better redundancy too.