The AARNet NOC
Download
Report
Transcript The AARNet NOC
Network Measurement and Security
APAN Bangkok 2005
[email protected]
AARNET’s International Connections
Measurement
• SNMP interface counters measuring bits per
second, packets per second, errors and discards
on all interfaces
• SNMP router CPU utilisation, BGP peers etc.
• Active measurements. Performance metrics
measuring trip times and throughput testing iperf.
• NetFlow measurements down to individual IP flow
based metrics (approx 60Gb of data a day).
NetFlow Measurements
• Netflow measurement migrated to customer edge
equipment
• Every flow (IP Address/Port-> IP Address Port)
combination logged.
• Information on source destination
ports/interfaces/ASs/ToS settings kept.
• Hooks into MRTG/RRD for graphing/visualisation.
• Very useful in logging network activity.
You can monitor all the data you
like but…
• Visibility of data is the key issue
– Alarms generated by processes.
– Images generated of network activity.
• From there the ability to drill down to get
relevant data.
Worm/DOS/DDOS Impacts
•
•
•
•
•
•
A worm or DOS/DDOS attack can initially manifest
itself in many ways:
Congestion due to high byte throughput of attack.
High Packet Rate on an interface/s.
High Packet Loss for normal network traffic.
High Router CPU utilisation.
BGP/OSPF routing flaps.
NetFlow information accumulates rapidly.
Network Impact CodeRed v2
•
July 20
2001 Bits
per
second
•
Packets
per
second
•
Flows
per
second
Network Impact
• No backbone packet
loss
• No huge impact on
backbone latency
• Identified excessive flows due to impact on
backbone TCP Port 80 scans – generally seen
within NetFlow data as three packets totalling 144
bytes from particular hosts (infected machines)
outbound. Not fully accurate but very useful
indicator.
Slammer/Sapphire Worm
• 24 July 2002 – Microsoft release notice and patch for Buffer
Overruns in SQL Server 2000.
• 25 January 2003 – the Saturday of a long weekend in
Australia.
• 13:40 - First noticed by a Nagios message that a link was
checked down by ICMP ping failure.
• Checking link utilisation showed a huge amount of traffic
congesting link.
• NetFlow showed huge flow rate – mail was sent by our
daemon process to inform us of this.
• Quick look at Netflow logs showed that there appeared to
be outbound scanning on UDP port 1434.
Slammer Impact
• Bits per
second
• Packets per
second
• Flows per
second
Slammer Impact
•
High backbone
packet loss
•
Increased latency
Slammer Response
• The effect of Slammer was to congest the
network and degrade performance. An infected
100Mb connected host could produce over 30,000
scans/second – bandwidth rather than network
latency limited.
• As a result blocked UDP port 1434 traffic at the
edge to protect traffic.
– Deny udp any gt 1023 any eq 1434
• With the public holiday a number of sites did not
have any staff available.
Slammer Response
• Infected hosts could be identified using NetFlow
logs and that information was propagated to the
sites.
• Where the sites could not respond immediately
these hosts were blocked from sending Port 1434
UDP traffic.
• Within 3 hours most of the problem was relatively
under control.
Slammer – why so much impact?
• Slammer/Sapphire contains a simple, fast
scanner in a small worm with a total size of only
376 bytes. With the requisite headers, the payload
becomes a single 404-byte UDP packet.
• Slammer used UDP and so a single packet could
infect a host – no need to wait for a three way
TCP handshake like CodeRed.
• Two orders of magnitude faster than CodeRed.
Slammer vs CodeRed Propagation
• Slammer
• CodeRedv2
Graphs courtesy of Caida
DDOS Attacks
•
•
•
•
•
Often the result of IRC botnets.
TFN, Trinoo, Stacheldraht and other root kits.
Often short lived – but don’t count on it!
Hard to protect against.
Important to keep a good track of unusual activity
on the network – being a good netizen.
• Isolate your compromised hosts quickly.
• Analyse and report to upstreams
DOS/DDOS Attacks
•
•
•
•
•
TCP SYN attacks.
UDP flood.
ICMP echo request/reply flood.
Amplification attacks.
Source IP address spoofing.
Normal Patterns…
• A lot of packets are junk.
– 90% of packets destined to AARNet are
dropped at the upstream edge!
– 60% of this is NTP requests to non-operational
NTP servers.
– 30% of packets are common scans and
probes.
• A lot of packets are threatening.
• This is “normal” behaviour.
• So, how to distinguish an abnormal pattern?
NTP Services
• CSIRO offers NTP services to Australian users.
• Three servers in three states.
• CSIRO pays differential traffic charges between
international and domestic sources.
• ADSL Router vendor hard coded IP’s of servers
into their product.
• Router is distributed particularly to
Japanese/Korean customers where ADSL uptake
is high.
Effect
• Normally NTP hosts sync every 2 hours
• ACL is put on international connections against
NTP traffic.
• No back off algorithm on router retries every 30
seconds against all 3 servers!
Normal?
• Darknets provide usefule analysis on the
background radiation see:
http://www.cymru.com/Darknet/index.html
The normal day…
• A quiet day in the University break…
• BPS
– SNMP
• PPS
– SNMP
• FPS
– Netflow
Another day…
Another Day – some explanation
• Generally SNMP interface statistics are collected
at five minute intervals.
• NetFlow has a default cache timeout of 30
minutes.
• Using defaults, NetFlow accentuates particular
lengthy single transactions (could be single
machine) as spikes.
• Netflow flow measurements is particularly
susceptible to identifying scan and strobe attacks
covering many hosts/ports.
Inbound DDOS
• Total flows – metric is file
size of collected UDP
Netflow Data
• Individual flows – metric
is processed
transmitted/received
flows per institution
• Now know where to look!
Particular DOS Attacks
•
•
•
Universities Admission Centre on TEE results day.
TCP SYN attack.
Filters placed on international links at 7:45 – fine
because services offered were primarily domestic.
DOS/SYN Attack
• Bytes
• Packets
• Flows
Unusual activity
• Unsolicited ICMP echo replies
– Can indicate machines are using a control
channel after being infected by a root kit.
– Stacheldraht/TFN.
– Can easily check for this type of infection with
NetFlow records.
– Attacks from these machines will generally
spoof addresses within their subnet so
compromised machine(s) are hard to find
during an outbound attack.
Some conclusions…
• Try and ensure early patching of machines!
– Users are still deploying operating systems
and network applications in an insecure
fashion.
• Effective and visible measurement and
monitoring infrastructure needs to be in place to
reduce the effect of worm or DOS/DDOS attacks.
• As far as possible automated alarms and
warnings need to be in place to reduce the time to
response
• Actions must be determined by the
threat/vulnerabilities. Beware of knee jerk
reactions.
Some conclusions…
• The Slammer worm was very simple and
effective, spreading virulently and covering the
globe in approximately 10 minutes.
• Expect more of this type of worm in the future –
possibly with destructive payloads.
• Expect that the base of compromised machines
will be wider.
• With IPv6 rollout, while scanning may be
unprofitable to compromise machines it will
hugely effect Netflow collection – there are some
18446744073709600000 possible hosts per /64
• Only 4294967296 hosts in IPv4
Responses…
• Analyse NetFlow data.
• Port monitoring and capture when required –
tcpdump and ethereal.
• Egress Filtering at the edges.
• Bogon Filtering.
• Back Scatter traffic monitoring.
• Darknets to measure scanning.
• ACLs.
• BGP community tagged black holing.
•
•
•
•
Talk to your upstreams and downstreams
Monitor and watch for unusual activity
Be prepared!
It’s your Network – protect it!
Questions?
Some useful URLs…
http://www.cs.berkeley.edu/~nweaver/sapphire/
ftp://ftp-eng.cisco.com/cons/isp/security/ISP-SecurityBootcamp/
http://www.dshield.org/
http://www.cymru.com
http://www.mynetwatchman.com/
http://www.cymru.com/Darknet/index.html
http://www.ren-isac.net/