Domain Reputation
Download
Report
Transcript Domain Reputation
Hussien Othman
Domains for malicious purposes:
Command and Control (C&C)
Malware distribution
Phishing
More.. Malicious domains are essential to the success of nearly all popular attack vectors.
Short lived domains.
Static Black list based technologies cannot keep up with the volume of new domain
names..
There is a need for a dynamic domain reputation systems!
2
• Detect the evidence of malicious content/activities.
• DNS Reputation.
• Predict malicious domains : Life cycle of malicious
domains and re-use of valuable resources.
3
• Malware infection in enterprises is a big problem.
• Majority of the infections happen via malicious domains.
• Build a graph:
• Nodes are the domains and hosts.
• Add edge between host H to Domain D, if there is a transaction from H to D.
4
5
6
7
Low Degree false positive
Unknown domain detection
8
The detection happened AFTER the malicious domains are in
use.
9
A dynamic reputation system.
Goal: Assigning a low reputation score if a domain is involved in malicious activities,
on the other hand , assigning a high reputation score if the domain is associated
with legitimate Internet Services.
Capturing characteristics of domains according to network and zone based statistical
features.
10
Resource Record (RR): Domain name and IP address
www.google.com, 72.14.192.0
2LD and 3LD domain:
for www.example.com , 2LD is example.com and 3LD is www.example.com.
Related Historic IPs (RHIPS) :
All “routable” IPs that historically have been mapped with the domain name in the RR, or
any domain name under the 2LD and 3LD.
Related Historic Domains (RHDNs):
All fully qualified domain names (FQDN) that historically ,have been linked with the IP in
the RR, its corresponding CIDR and AS.
11
12
13
Network based features: Describe how the operators
who own d and the IPs that domain d points to , allocate
their network resources.
• BGP Features , 9 Features: distinct number of BGP prefixes,
countries and organizations related to BGP(A(d)),distinct number of
IP addresses, BGP prefixes and countries related BGP(A3LD(d)) and
BGP(A2LD(d)).
• AS Features,3Features:distinct number of AS related to AS(A(d))
AS(A3LD(d)) and AS(A2LD(d))
• Registration Features , 6 Features: distinct number of registrars in
A(d)), A3LD(d)), A2LD(d)) and the diversity of registration dates in
A(d)), A3LD(d)), A2LD(d)).
14
Zone based features: Measure the characteristics of domain names
historically associated with d. The intuition is that malicious
domain names related to the same spam campaign , for example,
often look randomly generated and share few common
characteristics. On the other hand , legitimate domain names
usually have strong similarities. For example: google.com,
googlewave.com , etc.
• String features , 12 features: distinct number of domain names, average
and standard deviation of their lengths and the mean and median and
standard deviation of the occurrence frequency of single character, 2 and
3 Ngrams.
• TLD Features , 5 features: distinct number of TLDs, ratio between .com
to rest of TLDs and the mean and median and standard deviation of the
occurrence frequency of TLD strings.
15
Evidence based features: To what extent a given
domain is associated with other known malicious
domain names or IP addresses.
Honeypot features , 3 features: distinct number of malware
samples that, when executed, tried to connect to an IP
address in A(d), BGP(A(d)), AS(A(d))
• Blacklist features , 3 features: number of IPs listed on public
blacklist in A(d), BGP(A(d)), AS(A(d)).
•
16
This module first collects statistical data on major categories of domains as follows:
1. Popular domains (google,yahoo,facebook,..)
2. Common domains from top 100 alexa.
3. Akamai domains.
4.Contend Delivery network domains (CDN). Except Akamai.
5.Dynamic DNS domains.
Notos trains 5 classifiers . Each classifier distinguish between one class from all
others based on network based features.
A new domain d is given a score by each one of these classifiers. The output is
NM(d)=c1,c2,..,c5.
17
First Level Clustering:
Using Network Feature Vectors.
Goal: Identify similarities in zones based on their network profile.
Second Level Clustering:
Using Zone Feature Vectors.
Partition domain names within the same cluster (from the first level) based on their zone
properties.
18
19
20
21
NOTOS detected new malicious domains!
22
Uses only RR records.
Needs a lot of records to success in detecting a malicious
domain.
Cannot predict future malicious domain.
23
• Many malicious domains are used for a very short period of time.
• Evading detection
• Low cost
• propose a system that predicts the domain names which are most likely (or about)
to be used for malicious purposes.
• In this way, the predicted malicious domains can be blocked before or at the
beginning of their being used for malicious purposes.
24
25
• The selection of domain names happens before registering the domain names.
• The registration of a domain name happens before the creation of DNS records for
that domain name.
• Obtaining an IP address and using that address to set up a server also happens
before DNS records are created, since the IP address is part of the DNS records.
• In a successful attack, the aforementioned actions all happen before the malicious
domain is activated.
•
The sequence of actions involved in activating a malicious
domain, as well as the time interval between these actions,
makes the prediction of malicious domains possible.
26
• Most of popular attacks success as depends on many resources.
• These resources are often purchased.
• Examples:
• domain names are registered or transferred for a price
• bullet-proof servers are available for rent
• large numbers of infected hosts are also available for rent.
• Some of the purchases are made through legitimate processes; others are made via illegal
channels such as black markets, underground forums, etc.
• Many types of resources are made to be re-usable so that they can be resold multiple times
to maximize financial gain.
• The re-use of resources across different attacks also presents opportunities to find
connections between malicious domains.
27
Re-use of domains names:
Time interval between previous and current use > 1 year.
Score based on: TLD , changes in IP, price for domain transfer.
28
Patterns of DNS queries for malicious domains can be used in prediction.
• Discovered several patterns in the DNS queries of a domain before the domain was
used and/or detected as malicious.
• The patterns indicate different activities related to malicious domains, including
preparing/testing the domain for malicious purposes.
29
Same Name Server
• Same IP address
• Same Registrant information
• Reasons: pseudo identity of an attacker, registered
using same stolen credit cared, etc.
•
30
Detected and collected a list of name servers that provide DNS records for a number
of domains, most if not all of which are malicious.
31
WHOIS information for malicious domains sometimes includes:
Same/similar fake registrant name, same email, same registrant
address, etc.
32
• Collected data over one month.
• Data includes : Passive DNS records, WHOIS records, IP location database.
• Predicted 2172 domain names, 1793 are malicious – 83%.
33
[1] Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, William Horne: Detecting
Malicious Domains via Graph Inference. Proceedings of the 2014 Workshop on
Artificial Intelligent and Security Workshop.
[2] M. Antonakakis and R. Perdisc and D. Dagon and W. Lee and N. Feamster,
"Building a Dynamic Reputation Model for DNS," in USENIX Security Symposium, 2010,
pp. 273-190.
[3] Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA: WE KNOW IT
BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS
[4] Shuang Hao, Nick Feamster, Monitoring the initial DNS behavior of malicious
domains. Internet Measurement 278.
34