jobTalk_zhichun - Northwestern University

Download Report

Transcript jobTalk_zhichun - Northwestern University

Towards High Speed Network Defense
Zhichun Li
EECS Deparment
Northwestern University
Agenda
• Briefly introduce my thesis work
• Dive in high performance vulnerability
signature matching
• Future research directions
2
Motivation
Attackers
Botnets
Professional attackers exploit
the enterprise networks for profit
$$$
Worms
3
Network Level Defense
• Network gateways/routers are the vantage
points for detecting large scale attacks
• Only host based detection/prevention is not
enough for modern enterprise networks
– Some users do not apply the host-based schemes
due to the reliability, overhead, and conflicts.
– Many users do not update or patch their system on
time.
– Enterprises cannot only reply on their end users for
security protection
4
Challenges
• Scalable to high speed networks with a
large number of users
• Need to be highly accurate
• Adapt fast to the emerging threats
• Have good attack coverage.
5
Network-based Intrusion Detection,
Prevention, and Forensics System
• Framework
Scalability
Accuracy &
Scalability &
Coverage
Packet
streams
(I) Sketch
based monitoring
& detection
Accuracy &
adapt fast
(III) Signature
matching
engines
(II) Polymorphic
worm signature
generation
Honynet
honeyfarms
(IV) Network
Situational
Awareness
Accuracy &
6
adapt fast
Network-based Intrusion Detection,
Prevention, and Forensics System (I)
• Online traffic monitoring and recording
[INFOCOM 2006, ToN 2007] (cited by 30+)
–
–
–
–
Reversible sketch for data streaming computation
Record millions of flows (GB traffic) in a few hundred KB
Small # of memory access per packet
Scalable to large key space size (232 or 264)
• Online sketch-based flow-level anomaly detection
[IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 2006]
– Detect TCP SYN flooding, horizontal and vertical scans even
when mixed
h (k)
0 1
…
1
K-1
1
…
…
j
H
hj(k)
hH(k)
7
Network-based Intrusion Detection,
Prevention, and Forensics System (II)
• Polymorphic worm signature generation
– Token based Signature [IEEE Symposium on Security and
Privacy 2006] (cited by 40+, code requested by Columbia U. UT
Austin, Purdue, Georgia Tech, UC Davis, etc)
– Network based Vulnerability Signature [IEEE ICNP 2007] [ NSF
Cyber Trust Award]
1010101
Internet
Network
gateway
10111101
11111100
Our network
00010111
8
Network-based Intrusion Detection,
Prevention, and Forensics System (III)
• NetShield Vulnerability Signature based
NIDS/NIPS [under submission] [NSF Cyber Trust
Award] (interested by Cisco and Juniper)
Focus of this talk, details come later
9
Network-based Intrusion Detection,
Prevention, and Forensics System (IV)
• Large-scale botnet and P2P misconfiguration
event situational-aware forensics
– Botnet attack target/strategy inference [ASIACCS09]
– Root cause analysis of the P2P
misconfiguration/poisoning
traffic [under submission]
Peers
File Request Flooding
Innocent Victim
Misconfigured Traffic
DDoS attack Scenario
10
NetShied: Matching a Large
vulnerability Signature Ruleset
for High Performance Network
Defense
11
NetShield Overview
NIDS/NIPS (Network Intrusion
Detection/Prevention System) operation
Signature
DB
Packets
NIDS/NIPS
`
`
`
Security • Accuracy
alerts
• Speed
• Attack Coverage
12
State of the art
Regular expression (regex) based approaches
Example: .*Abc.*\x90+de[^\r\n]{30}
Pros
Cons
• Can efficiently match
multiple sigs
simultaneously,
through DFA
• Can describe the
syntactic context
• Limited expressive
power
• Cannot describe the
semantic context
• Inaccurate
13
State of the art
Vulnerability Signature [Wang et al. 04]
Example:
BIND:
rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00
&& context[0].abstract_syntax.uuid=UUID_RemoteActivation
BIND-ACK:
rpc_vers==5 && rpc_vers_minor==1
CALL:
rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00
&& stub.RemoteActivationBody.actual_length>=40 && matchRE(
stub.buffer, /^\x5c\x00\x5c\x00/)
Pros
Cons
• Directly describe
semantic context
• Very expressive, can
express the vulnerability
condition exactly
• Accurate
• Slow!
• Existing approaches all
use sequential matching
• Require protocol parsing
14
Speed
High
Motivation of NetShield
State of the
art regex Sig
IDSes
NetShield
Theoretical accuracy
limitation of regex
Low
Existing
Vulnerability
Sig IDS
Low
Accuracy
High
15
Motivation
• Desired Features for Signature-based
NIDS/NIPS
– Accuracy (especially for IPS)
– Speed
Cannot capture
vulnerability – Coverage: Large ruleset
condition well!
Regular
Expression
Vulnerability
Accuracy
Relative
Poor
Much Better
Speed
Good
??
Memory
OK
??
Coverage
Good
??
Shield
[sigcomm’04]
Focus of
this work
16
Research Challenges
• Background
– Use protocol semantics to express the vulnerability
– Defined on a sequence of PDUs & one predicate for each
PDU
– Example: ver==1 && method==“put” && len(buf)>300
• Challenges
– Matching thousands of vulnerability signatures
simultaneously
• Sequential matching match multiple sigs simultaneously
– High speed parsing
17
Outline
•
•
•
•
•
Motivation
High Speed Matching for Large Rulesets.
High Speed Parsing
Evaluation
Research Contributions
18
A Vulnerability Signature Example
• Data representations
– For all the vulnerability signatures we studied, we
only need numbers and strings
– number operators: ==, >, <, >=, <=
– String operators: ==, match_re(.,.), len(.).
• Example signature for Blaster worm
Example:
BIND:
rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00
&& context[0].abstract_syntax.uuid=UUID_RemoteActivation
BIND-ACK:
rpc_vers==5 && rpc_vers_minor==1
CALL:
rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00
&& stub.RemoteActivationBody.actual_length>=40 && matchRE(
stub.buffer, /^\x5c\x00\x5c\x00/)
19
Matching Problem Formulation
• Consider single PDU matching first
• Suppose we have n signatures, defined on
k matching dimensions (matchers)
– A matcher is a two-tuple (field, operation) or a
four-tuple for the associative array elements.
– Translate the n signatures to a n by k table.
Rule 6: URI.Filename=“fp40reg.dll” && len(Headers[“host”])>300
20
Matching Problem Formulation
• Challenges for Single PDU matching
problem (SPM)
– Large number of signatures n
– Large number of matchers k
– Large number of “don’t cares”
– Cannot reorder matchers arbitrarily -- buffering
constraint
– Field dependency
• Arrays, associative arrays
• Mutually exclusive fields.
21
Matching Algorithms
Candidate Selection Algorithm
1.Pre-computation decides the rule order and
matcher order
2.Divide-and-conquer comparison w/
matchers and iteratively combine the results
efficiently
22
Step 1: Pre-Computation
• Matcher reoder: Put the non-selective matchers
later based on buffering constraint & field arrival
order
RB1
• Rule reorder:
Don’t care of Matcher 1
Matcher 1
RB1
RB2
Extended by Don’t care of both
Matcher 2
Matcher 1 & 2
.
..
RB1
RB2
RB3 RB4
...
Don’t care of all
Matcher 1 to n
23
Step 2: Iterative Matching
PDU={Method=POST, Filename=fp40reg.dll, VARs: name="file"; value~".*\.\./.*",
Headers: name="host"; len(value)=450}
RB1: 1 2 3
S1= {3}
RB1: 1 2 3
RB2: 4 5 6
S2 = S1 A2+B2 = {3} {}+{6} = {}+{6} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7
S3 = S2 A3+B3 = {6} {}+{} = {6}+{} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7 RB4: 8
S4 = S3 A4+B4 = {6} {4}+{} = {6}+{} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7 RB4: 8 RB5: 9
S5 = S4 A5+B5 = {6} {6}+{} = {6}+{} = {6}
24
Candidate merge operation
Si  Ai 1
Don’t care
matcher i+1
Si
Si  Ai 1
require
matcher i+1
In Ai+1
25
Refinement and Extension
• SPM improvement
– Allow negative conditions
– Handle array case
– Handle associate array case
– Handle mutual exclusive case
– Report the matched rules as early as possible
• Extend to Multiple PDU Matching (MPM)
– Allow checkpoints.
26
Outline
•
•
•
•
•
Motivation
High Speed Matching for Large Rulesets.
High Speed Parsing
Evaluation
Research Contribution
27
Observations
• PDU  parse tree
PDU
• Leaf nodes are
integers or strings
array
• Vulnerability signatures
mostly based on leaf
nodes
• Observation 1: Only need to parse the
fields related to signatures.
• Observation 2: Traditional recursive
descent parsers which need one function
28
call per node are too expensive.
Efficient Parsing with State Machines
• Studied eight protocols: HTTP, FTP, SMTP,
eMule, BitTorrent, WINRPC, SNMP and DNS
as well as their vulnerability signatures.
• Pre-construct parsing state machines based
on parse trees and vulnerability signatures.
• Common relationship among leaf nodes.
Var
Var
derive
Var
Sequential
Branch
Loop
Derive
(a)
(b)
(c)
(d)
29
Example for WINRPC
• Rectangles are states
• Parsing variables: R0 .. R4
• 0.61 instruction/byte for BIND PDU
R1-16
8 merge2
1 ncontext
3 padding
Bind-ACK
1
rpc_vers
1 rpc_ver_minor
R0 1
ptype
Header 1
pfc_flags
R0
4 packed_drep
Bind
R1 2 frag_length
6
merge1
merge3
R4
20*R4
2
ID
1 n_tran_syn
1 padding
16 UUID
4 UUID_ver
tran_syn
Bind-ACK
R2 ‹- 0
R3 ‹- ncontext
Bind
R2++
R2£R3
30
Outline
•
•
•
•
•
Motivation
High Speed Matching for Large Rulesets.
High Speed Parsing
Evaluation
Research Contributions
31
Evaluation Methodology
Fully implemented prototype
• 11,704 lines of C++ and
2,706 lines of Python
• Can run on both Linux and
Windows
Deployed at a university DC
with up to 106Mbps
• 26GB+ Traces from Tsinghua Univ. (TH), Northwestern (NU)
and DARPA
• Run on a P4 3.8Ghz single core PC w/ 4GB memory.
• After TCP reassembly and preload the PDUs in memory
• For HTTP we have 794 vulnerability signatures which covers
973 Snort rules.
• For WINRPC we have 45 vulnerability signatures which
32
covers 3,519 Snort rules
Parsing Results
TH
DNS
TH
NU
TH
WINRPC WINRPC HTTP
0.31
3.43
1.41
16.2
1.11
12.9
2.10 14.2 1.69
7.46 44.4 6.67
11.2
Max. memory per 15
11.5
15
11.6
15
3.6
14
Trace
Throughput
(Gbps)
Binpac
Our parser
Speed up ratio
NU
HTTP
3.1
14
DARPA
HTTP
3.9
14
connection
(bytes)
33
Matching Results
Trace
Throughput (Gbps)
Sequential
CS Matching
Matching only time
speed up ratio
TH
NU
TH
WINRPC WINRPC HTTP
NU
HTTP
10.68
14.37
9.23
10.61
0.34
2.63
2.37 0.28
17.63 1.85
4
1.8
11.3
11.7
1.48
27
0.033 0.038 0.0023
20
20
20
Avg # of Candidates 1.16
Max. memory per
connection (bytes)
27
DARPA
HTTP
8.8
34
Other Results
Rule scaling results
Throughput (Gbps)
0
1
2
3
4
Performanc
Decrease
gracefully
Compare with Regex
• Memory for 973 Snort
rules: DFA 5.29GB (XFA
863 rules1.08MB),
NetShield 2.3MB
• Per flow memory: XFA
36 bytes, NetShield 20
bytes.
• Throughput: XFA
756Mbps, NetShield
1.9+Gbps
*XFA [SIGCOMM08][Oakland08]
0
200
400
600
# of rules used
800
35
Research Contributions
• Demonstrate vulnerability signatures can
be applied to NIDS/NIPS, which can
significantly improve the accuracy of
current NIDS/NIPS
• Propose the candidate selection algorithm
for matching a large number of
vulnerability signatures efficiently
• Propose parsing state machine for fast
protocol parsing
36
• Implement the NetShield
Future work
• Working in process
– In collaboration with MSR. Apply the semantic
rich analysis for cloud Web service profiling.
To understand why slow and how to improve.
• Future work
– Web security (browser security, web server
security)
– Data Center security
– High Speed Network Intrusion Prevention
System with Hardware Support
37
Long Term Research Challenges
• Combat the professional profit-driven
attackers.
• Online applications (including Web 2.0
applications) become more complex and
vulnerable.
• Network speed keeps increasing, which
demands highly scalable approaches.
38
Q&A
Thanks!
39
•
Backup Slides
40
Measure Snort Rules
• Semi-manually classify the rules.
1. Group by CVE-ID
2. Manually look at each vulnerability
• Results
– 86.7% of rules can be improved by protocol semantic
vulnerability signatures.
– Most of remaining rules (9.9%) are web DHTML and
scripts related which are not suitable for signature
based approach.
– On average 4.5 Snort rules are reduced to one
vulnerability signature.
– For binary protocol the reduction ratio is much higher
than that of text based ones.
• For netbios.rules the ratio is 67.6.
41
Motivation
• Network security has been recognized as
the single most important attribute of their
networks, according to survey to 395
senior executives conducted by AT&T
• Many new emerging threats make the
situation even worse
42
System Framework
Accuracy &
Scalability &
Coverage
Sent out for
aggregation
Reversible
k-ary sketch
monitoring
Local
sketch
records
Remote
aggregated
sketch
records
Sketch based
statistical anomaly
detection (SSAD)
Part III
Streaming
packet
data
Signature
matching
Content-based
engines
signature matching
Token Based Signature
Generation (TOSG)
Protocol semantic
signature matching
To unused IP
blocks
Data path
Length Based Signature
Generation (LESG)
Network
Situational
Awareness
Honeynets/
Honeyfarms
Control path
Modules on
the critical
path
Modules on
the non-critical
path
Scalability
Part I
Sketchbased
monitoring
& detection
Accuracy &
adapt fast
Part II
Polymorphic
worm
signature
generation
Part IV
Network
Situational
Awareness
Accuracy &
adapt43fast
Example of Vulnerability Signatures
• At least 75%
vulnerabilities are due to
buffer overflow
Sample vulnerability
signature
• Field length
corresponding to
vulnerable buffer > certain
threshold
• Intrinsic to buffer overflow
vulnerability and hard to
evade
Overflow!
Protocol message
Vulnerable
buffer
44
Old Slides
45
Conclusions
• A novel network-based vulnerability
signature matching engine
– Through measurement study on Snort ruleset,
prove the vulnerability signature can improve
most of the signatures in NIDS/IPS.
– Proposed parsing state machine for fast
parsing
– Propose a candidate selection algorithm for
matching a large number of vulnerability
signature simultaneously
46
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
48
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
49
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for a large number
of vulnerability Signatures.
• Evaluation
• Conclusions
50
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
51
Limitations of Regular Expression
Signatures
Signature: 10.*01
1010101
10111101
Internet
Traffic
Filtering
X
X
11111100
Our network
00010111
Polymorphism!
Polymorphic attack (worm/botnet)
might not have exact regular
expression based signature
52
What we do?
• Build a NIDS/NIPS with much better accuracy
and similar speed comparing with Regular
Expression based approaches
– Feasibility: Snort ruleset (6,735 signatures) 86.7%
can be improved by vulnerability signatures.
– High speed Parsing: 2.7~12 Gbps
– High speed Matching:
• Efficient Algorithm for matching massive vulnerability rules
• HTTP, 791 vulnerability signatures at ~1Gbps
53
Problem Formulation
• Parsing problem formulation
– Given a PDU and the protocol specification as
input, output the set of fields which required
by matching.
54
Publications
•
•
•
•
•
•
Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic
Worms, in the Proc. of IEEE ICNP 2007.
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reversible sketches: Enabling monitoring and analysis over high speed
data streams, in the IEEE/ACM Transaction on Networking, Volume 15,
Issue 5, Oct, 2007
Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with
Provable Attack Resilience, in Proc. of IEEE Symposium on Security and
Privacy, 2006
Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust
Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM
SIGCOMM LSAD 2006
Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion
Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS
2006
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reverse Hashing for High-speed Network Monitoring: Algorithms,
Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 55
Current Status
•
Part I: Sketch based monitoring & detection
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches:
Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM
Transaction on Networking, Volume 15, Issue 5, Oct, 2007
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for
High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the
Proc. Of IEEE INFOCOM 2006 (252/1400=18%)
– Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection
Approach for High-speed Networks, In Proc. Of IEEE International Conference on
Distributed Computing Systems (ICDCS) 2006 (75/536=14%)
(Alphabetical order)
•
Part II: Polymorphic worm signature generation
– TOSG: Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable
Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006
(23/251=9%)
– LESG: Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in
the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007
(32/220=14%)
56
Current Status
• Part III: Signature matching engines
– Work in progress, will be focus of this talk
– Zhichun Li, Gao Xia, Yi Tang, Jian Chen, Ying He, Yan Chen
and Bin Liu, NetShield : Towards High Performance Networkbased Semantic Signature Matching, in submission
• Part IV: Network Situational Awareness
– Work in process
– Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards
Situational Awareness of Large-Scale Botnet Events using
Honeynets, in preparation
– Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic,
P2P Doctor: Measurement and Diagnosis of Misconfigured
Peer-to-Peer Traffic, in submission
57
Current Status
• Part I: Sketch based monitoring & detection
– Result in [Infocom06,ToN,ICDCS06]
• Part II: Polymorphic worm signature generation
– Result in [Oakland06,ICNP07]
• Part III: Signature matching engines
– Work in progress, will be focus of this talk
• Part IV: Network Situational Awareness
– Work in process
58
Limitations of Exploit Based Signature
Signature: 10.*01
1010101
10111101
Internet
Traffic
Filtering
X
X
11111100
Our network
00010111
Polymorphism!
Polymorphic worm might not have
exact exploit based signature
59
Vulnerability Signature
Internet
Vulnerability
signature traffic
filtering
X
X
Our network
X
X
Vulnerability
Work for polymorphic worms
Work for all the worms which target the
same vulnerability
60