FlowScan - LIVE! A Network Traffic Reporting and Visualization Tool

Download Report

Transcript FlowScan - LIVE! A Network Traffic Reporting and Visualization Tool

UW-Madison - FlowScan and
Rate Limiting Adventures
I2 Techs Conference
May 17, 2001
Michael Hare
Presentation Overview
FlowScan
Controlling ResNet traffic: Some experiences
with rate limits
FlowScan: A Network Traffic
Reporting and Visualization Tool
FlowScan is a software package for open systems that is
freely
available under the terms of the GNU General Public
License.
Primarily developed by Dave Plonka of UW-Madison.
FlowScan analyses and reports on flow data exported by
IP
routers.
FlowScan produces graph images which provide a
continuous,
near real-time view of network traffic across a its border
Background on Flows
• The notion of flow profiling was introduced by
the research community.
• Today, for performance and accounting reasons,
flow profiling is built into some networking
devices.
• Not yet standards-based, FlowScan utilizes
flows defined and exported by Cisco's NetFlow
feature.
Sample Flows - FTP
An IP flow is a unidirectional series of IP packets of a
given protocol, travelling between a source and
destination, within a certain period of time.
FlowScan
• FlowScan maintains counters based upon flow classifications and
periodically exports information into databases.
• Counters are currently maintained based on these flow attributes:
– Protocols (ICMP, TCP, UDP)
– Services (FTP, SMTP, HTTP, P2P Apps)
– Subnets (if desired)
– AS pairs
•
•
Works with most Cisco and RiverStone RS routers
Compatibility with Juniper's routers and packet-sampling-based flows is
in the planning stages (More on this later)
Some Uses For Flowscan
• Short term network analysis lets you discover recent
changes in network behavior. Graphs over a short time
frame are based upon five-minute intervals.
• Long term network analysis aids in capacity planning
and traffic shaping efforts.
Short-Term Network Analysis:
Redhat 7.1 Release
Events, such as the release of RedHat 7.1, are visible as jumps in
outbound traffic patterns. Outbound Computer Science traffic
increased from 10 Mb/s to nearly 80Mb/s almost instantly.
Short-Term Network Analysis:
DoS Detection
Network abuse, such as flood-based Denial of Service attacks, are visible as
"stalagmites" and "stalactites". These would be hidden in coarser-grained longterm graphs. Since one flow is created for each series of packets between a source
and a destination, portscans are common culprits for these “Flow Explosions”.
Short-Term Network Analysis:
DoS Detection (cont)
Difference in the number of hosts talking out vs. being talking to in a
5 minute period. Another scheme for detecting portscans unearths
the huge amounts of probes initiated in a 48 hour period.
Long Term Analysis
Input/Output totals, 730 days prior to 12 May 2001
• The academic calendar year dramatically influences campus traffic levels, mostly
notably in ResNet. Since the beginning of data collection in early 1999, ResNet users
have typically been larger providers than consumers of Internet content.
• Outbound traffic consistently exceeds our inbound traffic level, but this academic
year’s inbound / outbound traffic patterns haven’t experienced the typical ‘doubling’
effect; access links at or near capacity.
Long Term Analysis
Application totals, 365 days prior to 12 May 2001
Here, we get a glimpse of the rise and fall of Napster, the first ‘killer’ p2p app.
Although Napster usage has declined, outbound from traffic from ResNet has not.
UW-Madison
Napster vs. Gnutella Usage
Mid-Dec through Mid-Jan was a quiet time on campus for Napster, as the primary
users are not utilizing the network. Here, we clearly illustrate the declining usage of
Napster and the increased usage of Gnutella. As was with Napster, the campus
appears to be a larger provider than consumer of Gnutella data.
UW-Madison
Napster vs Gnutella Usage (cont)
For the first time, Gnutella overtakes Napster as UW-Madison’s
most popular P2P file swapping application.
Long Term Analysis
Peering, 730 days prior to 12 May 2001
FlowScan lets you monitor the effectiveness of your peering by reporting the
next-hop source or destination AS’s of your traffic. Our biggest peers are
WiscNet and Abilene.
CampusIO Extension Modules
Top ASNs
Flowscan can help you make informed peering and provisioning decisions by reporting
the amount of traffic that other AS’s sources, sinks, or carries for your institution.
Above, our most popular origin (endpoint) peer is @Home. We are currently working on
a peering arrangement.
CampusIO Extension Modules
Alerts
To deal with DoS floods, alerts via pager and email were introduced. Currently based
on tolerances set in a configuration file. Looking for ways to utilize AI-type heuristics
to automate tolerances.
CampusIO Extension Modules
Top Talkers
This output of FlowScan’s Top Talkers module (anonymized sample shown
here) lets you see top bandwidth consumers and providers.
Implementing Flowscan in
Large Scale (ISP) Networks
WiscNet, Wisconsin’s
statewide educational
network, is currently
researching several
challenges of utilizing
FlowScan in a large
environment.
– Limitations of the flow
processor itself
(FlowScan)
– Limitations of the
exporting hardware
(Routers)
Limitations of Flow Processing:
Flow Processing
UW-Madison campus collects flows from a Cisco 7507
and processes them on a 700Mhz P3. FlowScan almost
falls behind during peak usage times, because there are
too many flows to process.
WiscNet handles 2~3x the amount of traffic of
Madison, and will be collecting flows from multiple
border routers and processing them on a 1Ghz machine.
Without some course of action, it is doubtful that the
processor will be able to keep up.
Limitations of Flow Processing:
Flow Exporting
Large ISPs tend to have devices with high-speed
interfaces. Because of router CPU utilization, current
hardware is not able to support full flow export on heavily
utilized high-speed interfaces (OC12+).
Running FlowScan in an environment with multiple edge
routers, possibly with mixed vendors, adds complexity.
Juniper routers do not support full scale flow exporting,
but they do support a concept known as packet sampling.
Packet Sampling
In order to reduce the CPU demands on their routers,
Juniper utilized the concept of packet sampling; instead of
considering each packet for flow export, they only
examine a configurable percentage.
UW-Madison campus recently evaluated a Juniper router,
and found that with its current interfaces and amount of
traffic processed, a sampling rate of 1 out of every 96
packets had to be set, otherwise the Juniper would become
overburdened in flow export duties.
Packet Sampling (cont)
With packet sampling, the produced graphs looked similar to the
graphs produced during non-sampled periods.
The Bright Side of Packet Sampling
As an added bonus, we saw the amount of flows being exported
from our router drop nearly 90%. FlowScan could easily keep
up with this level of flows.
The Ugly Side of Packet Sampling
Packet sampling broke some things we expected and more.
• Our security team relied on the logs produced by the 1 to 1
flow exporting when investigating network abuse and technocrimes. We no longer could provide a completely accurate
view of our network traffic.
• We lost the ability to detect DoS attacks based on the
"stalagmites" and "stalactites“ in the flow graphs, because we
were only catching about 1/96th of the usually single-packet
flow portscans.
More Ugly Things about Packet
Sampling
• FlowScan itself relies on the 1 to 1 flow exporting for
application classification. The Napster and Passive FTP
detection modules determined users by looking for
patterns in the packets;
• For Napster, look for a client talking to an index server before
counting port 6699 traffic as Napster data.
• For Passive FTP, look for established port 20 connections.
Packet sampling gives us no guarantee that these packets
will be sampled.
Statistical Accuracy of Flow
Sampling: Non Sampled Model
I was surprised to find that
more than 88% of our flows
consisted of only twelve
packets or less. 76% were
six packets or less.
Upon investigation, these
were typically SYNs, ACKs,
UDP, and small bits of web
content.
Statistical Accuracy of Flow
Sampling: Sampled Model
In a sampled model,
only 39% of our flows
consisted of twelve
packets or less, and
only 27% of our flows
were six packets or
less.
This was compared to
88% and 76%
respectively in our
non-sampled model.
Conclusions on Sampling
•
•
•
In our short eval period, the sampled application and input/output graphs appeared
representative of the campus traffic, but the nature of the traffic being reported
dramatically shifted.
Larger flows were over-represented, and smaller flows were under-represented.
Longer studies need to be done.
The Future of Flow Accounting:
– FlowScan is currently coded in Perl for easy maintenance
and portability. Further speed improvements may come in
a rewrite to C, or creating codebase that can utilize
multiple processors.
– Running multiple FlowScan instances and aggregating
totals collected by each flow processor.
• Breaks stateful inspection.
– Vendor support in hardware for 1 to 1 flow accounting.
FlowScan
Information on flowscan can be found here:
http://net.doit.wisc.edu/~plonka/FlowScan/
The UW-Madison Campus uses FlowScan to
graph traffic patterns. The live site is available
here.
http://wwwstats.net.wisc.edu
FlowScan
This concludes this portion of the
presentation.
Controlling ResNet Traffic
We started investigating rate limits in order
to get a handle on ResNet usage.
Napster outbound at times compromised
50% of our outbound traffic. We first tried
educating users to remove server functions
of their Napster clients, but no change in
network behavior was observed.
Rate Limiting
• Once UW-Madison had FlowScan in place
for measurement instrumentation, it became
a great tool by which to gauge the
effectiveness of configuration changes.
• We needed to attain predictability for
network costs, including bandwidth,
engineering, and equipment resources.
Basic Types of Rate Limiting:
Traffic Shaping
• Traffic shaping - Traffic comes into a queue
and is released at a specified rate, thereby
smoothing the flow of traffic. This queuing
introduces latency into the flow.
(Juniper Networks)
Basic Types of Rate Limiting:
Traffic Shaping
– Advantages
• Prevents congestion at aggregation points.
• Available in a number of routers.
– Disadvantages
• Doesn't necessarily allow all available network
capacity to be utilized.
• Doesn’t allow "bursting" beyond the configured ratelimit, even if the average rate would conform to the
limit.
Basic Types of Rate Limiting:
Traffic Policing
Traffic policing - Traffic comes into an interface,
and a decision is made either to drop, pass, or
mark the traffic (best effort/less than bet effort).
Queuing is not involved so it doesn't degrade
performance of conformant traffic.
(Juniper Networks)
Basic Types of Rate Limiting:
Traffic Policing
Hard policing causes an immediate drop of the
packet, which causes retransmissions.
Soft policing is the ability to defer the decision about
whether or not to drop a given packet until that
packet reaches a downstream router which is better
informed as to whether or not congestion currently
exists.
Practical Rate-Limit Methods:
Aggregate Rate-Limiting
Aggregate rate limits are usually enforced at
some central point in the network. The ratelimit is applied to either a physical interface
or to traffic defined by addressing or by
application, for example, as can be defined
using a Cisco Access Control List (ACL).
Aggregate limits can be implemented with
policing and/or shaping techniques.
Aggregate Rate-Limiting:
Pros and Cons
– Advantages
• Relatively simple to configure.
• Simple to enforce for the router hardware because most ratelimit implementations of this sort do not need to track the state
of individual connections.
– Disadvantages
• Inability to track individual users, hosts, or application
sessions. As such they can unfairly punish some users or
applications by indiscriminately dropping their packets rather
than others.
• Decreases goodput by causing retransmissions
Aggregate Rate-Limiting:
Experiences
• UW-Madison has experimented with Cisco's
Committed Access Rate (CAR) limits on a
7507 router at our campus border. Although
it effectively limited traffic to the specified
level, it was reported that ftp users in the
outside world were unable to even establish
a connection to the rate-limited ftp servers
because the all of the returning ACK packets
were dropped during high congestion.
Aggregate Rate-Limiting:
Example
• The following commands configured CAR on our
Cisco border router to limit a user population's
outbound traffic to 10Mb/s:
access-list 125 permit tcp 10.10.0.0
0.0.255.255 any
interface (your interface)
rate-limit output access-group 125 10000000
1000000 1000000 conform-action transmit
exceed-action drop
Practical Rate-Limit Methods:
Flow-Based Rate-Limiting
• Flow based rate limiting conforms individual
traffic flows to a predetermined allocation of
bandwidth. They are most effective nearest the
population one wishes to control. As with
aggregate limits, the rate-limit is applied to either
a physical interface or to traffic defined by
addressing or by application, and can also be
implemented with policing and/or shaping
techniques.
Flow-Based Rate-Limiting:
Pros and Cons
– Advantages
• Somewhat fair in that they distribute packet drops across individual
application sessions of users.
• One user's session doesn't impinge on another's since each flow gets its
own allocation.
• There is a fine level of granularity of control, because each direction of
individual streams can be affected.
– Disadvantages
• Individual users can't burst traffic within a single session.
• Retranmissions are caused when packets are dropped, leading to
decreased goodput.
• Users that create more simultaneous sessions get more bandwidth. A
local server can get a large percentage of available bandwidth.
• There are some scalability issues, but this is improving with applicationspecific hardware support.
• Doesn’t set any ‘hard’ limits. Bandwidth usage not guaranteed.
Flow-Based Rate-Limiting:
Experiences
• UW-Campus currently has this implemented on an
Riverstone RS to limit residence hall network (ResNet)
traffic. This can also be done with Cisco gear such as
the Catalyst 65xx with the requisite additional cards.
• It was reported that ResNet users had difficulty with
UDP based applications, although we had per-flow
UDP limiting set to 10Mb/s. The problems disappeared
after completely removing the limits.
Flow-Based Rate-Limiting:
Example
Example: configuring rate-limits on a Riverstone aggregation router to limit a user
population's outbound flows TCP flows to 100 Kb/s each. Also, limit flows that are
likely to be Napster to 33.6 Kb/s. Consider outbound flows to campus destinations and
to web servers to be "preferred", so only limit those to 10 Mb/s.
acl resnet_napdata permit tcp 10.10.0.0/16 any 6699 any
acl resnet_napdata permit tcp 10.10.0.0/16 any any 6699
acl resnet_napdata permit tcp 10.10.0.0/16 any 6688 any
acl resnet_napdata permit tcp 10.10.0.0/16 any any 6688
acl resnet_tcp permit tcp 10.10.0.0/16 any
acl resnet_tcp_preferred permit tcp 10.10.0.0/16 any any http
acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.1.0.0/16
acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.2.0.0/16
acl resnet_tcp_preferred permit tcp 10.10.0.0/16 10.3.0.0/16
acl resnet_tcp_preferred permit tcp any 10.10.0.0/16
rate-limit resnet_tcp input acl resnet_tcp rate 100000 exceed-action drop-packets sequence 1
rate-limit resnet_napdata input acl resnet_napdata rate 33600 exceed-action drop-packets sequence 3
rate-limit resnet_tcp_preferred input acl resnet_tcp_preferred rate 10000000 exceed-action drop-packets
sequence 4
rate-limit resnet_napdata apply interface backbone
rate-limit resnet_tcp apply interface backbone
rate-limit resnet_tcp_preferred apply interface backbone
Results of Rate-Limit
Implementations
•
•
•
Aggregated based hard policing causes steady 20 Mb output from ResNet
Winter break: ResNet traffic very low
Flow based hard policing lowers traffic amount, but level is not steady.
Practical Rate-Limit Methods:
TCP Rate Control
• TCP rate shapes flows of TCP traffic at the same
fine granularity available with other flow-based
rate limits by manipulating TCP header fields,
which are used to negotiate window sizes
information, and by pacing response ACKs.
• Sites such as UW-Whitewater have experience
with the Packeteer PacketShaper product. Last
weekend, UW-Madison began experimenting
with a PacketShaper.
TCP Rate Control:
Pros and Cons
– Advantages:
• Maximizes goodput by minimizing packet retransmissions.
• A mature commercial product implementing it is available.
• TCP rate control could offer some protection against some
obscure denial-of-service attacks which generate nonconforming TCP packets.
– Disadvantages:
• As the name implies, TCP rate control is TCP specific, and
therefore must be augmented with other rate-limiting
mechanisms.
• There are scalability issues. PacketShaper, for example, must
track state of connections and manipulate packets on the fly.
• Patented, closed-source implementations.
TCP Rate Control:
Experiences
– At UW-Whitewater, connected via DS3 to WiscNet,
when the PacketShaper was set to 45Mb versus not
being present in the network, the transfer rate was
roughly 2/3 of the non-PacketShaper rate. Simply
having the device in the network slowed transfer
rates.
– Current maximum physical connection rate is
100Mb Ethernet. Using PacketShaper in large
networks is tricky.
– Basically a flow-based rate controller, those
advantages and disadvantages apply as well.
Rate-Limit Alternatives
• A number of institutions have implemented quota
systems. These limits help ensure limited bandwidth is
shared fairly among all of its residents.
• University of Texas implements weekly bandwidth
quotas for their ResNet.
http://resnet.utexas.edu/meter.html
Adaptive Rate Limiting
• Also, some adaptive rate-limit systems have been
prototyped and appear to be somewhat effective. For
example:
http://www.ncne.nlanr.net/training/techs/2001/0128/pre
sentations/200101-kline1_files/v3_document.htm
• "Top Talker" reports have been added to FlowScan to
facilitate the enforcement of adaptive rate-limit
policies. These systems are complicated to implement
because they need to manipulate router configurations
programmatically via CLI.
Rate Limit Links
Policing and Shaping overview:
http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/12cgcr/qos
_c/qcpart4/qcpolts.htm
Rate-limiting and Traffic-policing Features
http://www.juniper.net/techcenter/techpapers/200005.html
Committed Access Rate
http://www.cisco.com/warp/public/732/Tech/car/
TCP Rate Control
http://www.cs.rpi.edu/~karans/report/
Generic Traffic Shaping
http://www.cisco.com/warp/public/732/Tech/gts/ -
The End…?
Thanks to Dave Plonka, UW Madison
[email protected]