20070716-bobyshev-demar

Download Report

Transcript 20070716-bobyshev-demar

Flow Data Tools and Analysis
at Fermilab
Andrey Bobyshev / Phil DeMar
Internet2/ESCC Joint Techs Workshop
Fermilab, July 15-19, 2007
Outline of the talk:




Flow data collection & analysis system at Fermilab
Security tools
Performance estimation tools
Checking of traffic for PBR’d circuits
Netflow Collection and Analysis system


Based on flow-tools (OSU)
Collecting data from:

Border routers:
1min
flow time outs
Internal core routers and
large experiment routers:
Local RAID6
Flow collector
Real-Time Appls



min flow time outs
Specific collector for “near”
real-time tools/applications
Central storage system
accumulating all flow data
Multiple systems for primary
processing

Results stored in SQL tables
EnStore
Long-Term Archiving
real-time
replication

5
NetFlow
Storage
Fermilab Core Services
1min
samples
5min
samples
BlueArc NAS
Border & StarLight
CMS
WorkGroup & Core
Web Presentation
Processing and
Analysis systems
mySQL Server
primary processing data,
Application's data
Data Collection details
~2.5GB
to disk daily
Older data are archived
on EnStore, Fermilab’s
tape storage facility

Complete flow data
collection, not sampled

10GE backbone &
offsite links…

Impact on routers is
minimal

Border
StarLight
CMS
CORE
Daily
600MB
200MB
1.2GB
500MB
Monthly
17GB
5GB
30GB
12GB
Breakdown of traffic and tagging process
Origin: onsite, offsite, local, transit
Target: CMS, D0, CDF
Filter: particular remote site or group of sites. Ex. Caltech,
Tier2, US-Tier2 and etc..
Applications: topN, Network
Weather Map, ...
Raw data sets
accumulated for
1min,5min, 15min
intervals
tableID:
router,origin,target,filter,DNS
Level
Tagging
mySQL
SrcDstOctets
SrcDstFlows
SrcDstPackets
and more
Sources and Destinations are identified
by DNS name (host, top level,second level
and so on or statically assigned labels
Security Tools
AutoBlocker – quasi real-time detection and automatic
block/unblocking onsite and offsite scanners



Automated offsite blocking based on “greedy” data flow pattern
Automated unblocking ‘x’ minutes after behavior stops
Top Scanners GUI
 Slow Scanning detection
 Raw Flow reader – packets exchange

AutoBlocker – automatic detection and
blocking/unblocking of offsite and onsite
scanners
The main idea of AB3 is calculating multiple quantified metrics from netflow data to
use it for making automated decision on blocking and ublocking of offsite and onsite
scanners. In October of this year it will be 5 years since AutoBlocker has been
deployed.
Calculate metrics
RED
Evaluate triggers
to return threat level
BLOCK
ORANGE
WATCH
YELLOW
BLUE
NOTICE
GREEN
NONE
NO scanning – NO actions
Metrics/Triggers/Threats/Actions
Metrics:
Triggers:
Actions:
ipDestinationAddressCount
● excessiveHostCount
● BLOCK/unBLOCK
●ipDestinationPortCount
● excessiveDestinationPort
●ipSourcePortCount
● watch/resetWatch
● flowsResponseInconsistency
●blockCount
● NONE/flushNONE
● portScanFlowsResponse
●activeBlockCount
● excessiveProcessingRate
● NOTICE
●detectionCount
● DatectionRate
●consecutiveDetection
● consecutiveDetection
●consecutiveWatch
● watchRate
●watchRate
● consecutiveWatch
●flowsIn
●flowsOut
●HitByRemotes
●excessivePrcTime
Triggers return the threat identified by a color.
●tcpSourcePortOut
Threats are mapped into actions
●tcpSourcePortIn
BLOCK
●tcpDestPortOut
RED
●tcpDestPortIn
WATCH
ORANGE
●udpSourcePortOut
NOTICE
YELLOW
●udpSourcePortIn
●udpDestPortOut
NONE
BLUE
●udpDestPortIn
GREEN
NO scanning – NO actions
●
AutoBlocker Exceptions System
Events with
originally
assigned
actions
NO exception found
An exception found, original action is
converted into unharmed action such
as NONE, NOTICE.
Exceptions System
Evaluating of events triggered actions
against defined exceptions
Reversed Exception: an “unharmed” action can be converted
into BLOCK: AB has triggered an event but did not meet
BLOCK criterion. However, AB-Exception system determines a
potential dangerous application that needs to be BLOCKed..
Multiple Classes of Exceptions
Network
Core
Servers
Static Definitions
Applications
Definition in terms
of AB metrics +
IP Blocks
Groups of
Applications
Definition in terms
of AB metrics +
IP Blocks
Dynamic Definitions
Traffic
Determine usual traffic behavior
Multiple classes of exceptions:
● Network, based on CIDR IP Blocks
● Applications defined by combination of
event's metrics and specified IP blocks
● Groups of applications
Definitions of applications can be created
statically or dynamically
External AutoBlocker detectors
Several
external AutoBlocker detectors:
DarkNets - analyze traffic to unallocated Fermilab
networks and generate alerts to AB3 via SOAP
 SlowScan – detects slow scanning by analyzing flow for
a longer periods (1hour, 1 day) and generate alerts to AB3

Raw Flows Reader
WEB interface to generate raw flow data based on specified
criteria, such as time range, port, source/destination addresses

Typical use is for
forensic analysis of
computer security
incidents

Access to the tool
(and raw flow data
itself) is restricted

Sample of RawFlow Output
TopScan: Generate tables of topN Scanners
TopScan –
on per origin basis
(onsite, offsite, local,
transit) generate tables
of top scanners for
specified time
intervals: 5min,
1hour, 1day.
Information is
available via
interactive GUI and by
E-Mail notifications
Performance Monitoring & Estimation tools
WEB USCMS Network Weather Map
 topN
 Traffic Summary (ftsumTraffic)
 Traffic asymmetry (bfpsum)
 Multistream flow analysis

USCMS Network Weather Map
Show estimated
rates to various
sites: Tier0, other
Tier1, USCMS
Tier2.
Features:
● popup graphs
● clickable icons to
direct to other
informational
sources
USCMS WM : popup graphs
Place cursor
over UNL icon
- Utilization
graph appears
USCMS WM: Popup graphs, 16Gbps
Place cursor
over central
USCMS icon
- Aggregate
Tier-1 center
traffic graph
appears
USCMS WM: Group's rates
USCMS WM: Clickable icons
Click on BlueArc Icon:
hourly summary tables
for TopN pairs, senders
and receivers
USCMS WM: TopN conversations
Tables of hourly
topN senders,
receivers &
conversations
bfpsum: ByteFlowPacket Summary
bfpsum allows to build
graphs and tables for
traffic of specified targets,
such as USCMS to the
various remote sites.
Single or multiple routers
can be selected as well as
multiple targets and filters.
Traffic can be seen in the
terms of bytes, flows and
packets. Both rates or
amount can seen.
bfpsum: Verifying symmetry of PBR-ed traffic
This tool is used for
interactive inspection of
USCMS PBR-ed traffic to
detect potential
asymmetry. When traffic
is symmetric flow rates of
inbound and outbound
traffic is practically the
same (see graph on the
previous slide).
An example of traffic
asymmetry is graph on
this slide (caused by
Caltech when LS was
shutdown and outbound
traffic was going through
the core network.
Test: detection of traffic asymmetry
r-s-starlight-fnal
(E2E circuits…)
WAN
r-s-bdr
(routed IP via ESnet)
r-cms-fcc2
USCMS Tier1
normal (E2E) traffic flow
LambdaStation is turned off, no PBR
Breakdown of multistreams GridFTP
sessions
ftGftp: detects and
estimates transfer
rates for multistreams
gridFTP sessions.
- Filtering on remote
sites can be selected
first before passing it to
the detector.
Commercial Products
Always
looking for commercial or public domain packages of
comparable functionality:
Most commercial packages have similar capabilities & some useful
features, but not flexible enough for our purposes
 Evaluated AdventNet Netflow Analyzer & NetFlow Tracker from
Crannog-Software

Purchased
AdventNet , ~$1K for 20 interfaces, allows to define IP groups
based on the list of IP blocks
Future flow data developments




Maintaining the existing scope of monitoring
Automate asymmetric path analysis
Integrate flow data analysis into our network performance
troubleshooting methodology
High impact data movement detection


Lesson learned from Lambda Station: application awareness is hard
Wouldn’t it be nice to have the network detect recognizable flow
patterns and modify path/service/whatever, if appropriate?


But it almost certainly would require real time flow data
Would be happy to collaborate with others developing flow
data tools:

Contact us at [email protected]