Clustering Abnormal behavior of DNS servers

Download Report

Transcript Clustering Abnormal behavior of DNS servers

Clustering Abnormal behavior of DNS servers
Bonnie Kirkpatrick
Simon Lacoste-Julien
Introduction
Wei Xu
([email protected])
Algorithm
Our goal was to detect misconfigurations of DNS servers by data mining
the request log of the DNS A-root server. We represented the request
log in a feature space. After projecting the data onto the principal
components, we applied the k-means algorithm to obtain clusters. Our
clusters revealed four classes of DNS misconfigurations, which were
verified by DNS operators.
Features:
DNS requests log
LOG
About DNS
• Scalable, distributed name-to-IP mappings
• Hierarchically-organized name servers
• Different types of resource records
• DNS query
• Local cache absorbs a large part of the DNS traffic
• 90%, according to previous studies
• Otherwise, requests traverse the DNS hierarchy
PREPROCESSING
- time of slice
- source IP
- total reqests
- unique queries
- max # of repeated queries
- min / max / avg interarrival time
- min / max / avg / std TTL
PCA
3
2
1
feature engineering
and preprocessing
4
1 datapoint = 3min statistis
for a source IP
Can classify new points
Problems with DNS infrastructure
• Local misconfigurations bring extra traffic to DNS infrastructure
• Up to 34% of traffic in our dataset is caused by sources that
behave abnormally
• Scalability and high redundancy hides local misconfigurations
• Attack attempts on DNS-roots everyday
• Use resources of DNS root to attack others (IP spoofing etc.)
HIERARCHICAL
CLUSTERING
Project data into principal
component basis (12d)
K-MEANS
CLUSTERING
5
VISUALIZATION
8
CLASSIFICATION
visualize original data (now
labelled) in Weka to
interpret results
6
LINEAR
DISCRIMINANT
ANALYSIS
Yields discriminating
directions
INTERPRETATION
7
Description of dataset
Results
One day log from tcpdump on the subnet of DNS A-root
Time_stamp source_IP
TTL
EDNS0 Qname / Qclass / Qtype
Green
1094616016.955030 64.4.25.22 114 n www.lelplastic.com/IN/A
Number of Distinct QNames
7000000
Small TTL variations may be caused by slightly
varied paths through the Internet, or the use
of multiple proxies.
Number of Requests Per hour
Large TTL variations may imply other
problems.
6.0E+07
6000000
total number of request
5.0E+07
5000000
4.0E+07
4000000
3.0E+07
3000000
2000000
2.0E+07
1000000
1.0E+07
0
0
2
4
6
8
10
12
14
16
18
20
22
Number of distinct sources
Number of sources
600000
500000
400000
300000
200000
100000
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Time (hour of the day)
0.0E+00
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Time (hour of day)
15
16
17
18
19
20
21
Percentage of QTypes
A
MX
AAAA
A6
ANY
SOA
CNAME
SRV
NS
TXT
PTR
IN
X25
HINFO
22
23
Blue
Sources sending mostly unique requests may not be
caching the results or following the levels of
indirections correctly.
Red
Many repeated queries may indicate more serious
problems such as exponential back-off
misconfigurations, inability to receive DNS
responses, buggy software, etc.
Black
Black points have low numbers of both
total and unique queries.
These points indicate that some sources
are querying the root with repeated
queries at fairly regular intervals, which
may be caused by monitoring traffic.