Domain Reputation

Download Report

Transcript Domain Reputation

Hussien Othman
 Domains for malicious purposes:
 Command and Control (C&C)
 Malware distribution
 Phishing
 More.. Malicious domains are essential to the success of nearly all popular attack vectors.
 Short lived domains.
 Static Black list based technologies cannot keep up with the volume of new domain
names..
 There is a need for a dynamic domain reputation systems!
2
• Detect the evidence of malicious content/activities.
• DNS Reputation.
• Predict malicious domains : Life cycle of malicious
domains and re-use of valuable resources.
3
• Malware infection in enterprises is a big problem.
• Majority of the infections happen via malicious domains.
• Build a graph:
• Nodes are the domains and hosts.
• Add edge between host H to Domain D, if there is a transaction from H to D.
4
5
6
7
 Low Degree false positive
 Unknown domain detection
8
 The detection happened AFTER the malicious domains are in
use.
9
 A dynamic reputation system.
 Goal: Assigning a low reputation score if a domain is involved in malicious activities,
on the other hand , assigning a high reputation score if the domain is associated
with legitimate Internet Services.
 Capturing characteristics of domains according to network and zone based statistical
features.
10
 Resource Record (RR): Domain name and IP address
 www.google.com, 72.14.192.0
 2LD and 3LD domain:
 for www.example.com , 2LD is example.com and 3LD is www.example.com.
 Related Historic IPs (RHIPS) :
 All “routable” IPs that historically have been mapped with the domain name in the RR, or
any domain name under the 2LD and 3LD.
 Related Historic Domains (RHDNs):
 All fully qualified domain names (FQDN) that historically ,have been linked with the IP in
the RR, its corresponding CIDR and AS.
11
12
13
Network based features: Describe how the operators
who own d and the IPs that domain d points to , allocate
their network resources.
• BGP Features , 9 Features: distinct number of BGP prefixes,
countries and organizations related to BGP(A(d)),distinct number of
IP addresses, BGP prefixes and countries related BGP(A3LD(d)) and
BGP(A2LD(d)).
• AS Features,3Features:distinct number of AS related to AS(A(d))
AS(A3LD(d)) and AS(A2LD(d))
• Registration Features , 6 Features: distinct number of registrars in
A(d)), A3LD(d)), A2LD(d)) and the diversity of registration dates in
A(d)), A3LD(d)), A2LD(d)).
14
 Zone based features: Measure the characteristics of domain names
historically associated with d. The intuition is that malicious
domain names related to the same spam campaign , for example,
often look randomly generated and share few common
characteristics. On the other hand , legitimate domain names
usually have strong similarities. For example: google.com,
googlewave.com , etc.
• String features , 12 features: distinct number of domain names, average
and standard deviation of their lengths and the mean and median and
standard deviation of the occurrence frequency of single character, 2 and
3 Ngrams.
• TLD Features , 5 features: distinct number of TLDs, ratio between .com
to rest of TLDs and the mean and median and standard deviation of the
occurrence frequency of TLD strings.
15
Evidence based features: To what extent a given
domain is associated with other known malicious
domain names or IP addresses.
Honeypot features , 3 features: distinct number of malware
samples that, when executed, tried to connect to an IP
address in A(d), BGP(A(d)), AS(A(d))
• Blacklist features , 3 features: number of IPs listed on public
blacklist in A(d), BGP(A(d)), AS(A(d)).
•
16
 This module first collects statistical data on major categories of domains as follows:
1. Popular domains (google,yahoo,facebook,..)
2. Common domains from top 100 alexa.
3. Akamai domains.
4.Contend Delivery network domains (CDN). Except Akamai.
5.Dynamic DNS domains.
 Notos trains 5 classifiers . Each classifier distinguish between one class from all
others based on network based features.
 A new domain d is given a score by each one of these classifiers. The output is
NM(d)=c1,c2,..,c5.
17
 First Level Clustering:
 Using Network Feature Vectors.
 Goal: Identify similarities in zones based on their network profile.
 Second Level Clustering:
 Using Zone Feature Vectors.
 Partition domain names within the same cluster (from the first level) based on their zone
properties.
18
19
20
21
NOTOS detected new malicious domains!
22
 Uses only RR records.
 Needs a lot of records to success in detecting a malicious
domain.
 Cannot predict future malicious domain.
23
• Many malicious domains are used for a very short period of time.
• Evading detection
• Low cost
• propose a system that predicts the domain names which are most likely (or about)
to be used for malicious purposes.
• In this way, the predicted malicious domains can be blocked before or at the
beginning of their being used for malicious purposes.
24
25
• The selection of domain names happens before registering the domain names.
• The registration of a domain name happens before the creation of DNS records for
that domain name.
• Obtaining an IP address and using that address to set up a server also happens
before DNS records are created, since the IP address is part of the DNS records.
• In a successful attack, the aforementioned actions all happen before the malicious
domain is activated.
•
The sequence of actions involved in activating a malicious
domain, as well as the time interval between these actions,
makes the prediction of malicious domains possible.
26
• Most of popular attacks success as depends on many resources.
• These resources are often purchased.
• Examples:
• domain names are registered or transferred for a price
• bullet-proof servers are available for rent
• large numbers of infected hosts are also available for rent.
• Some of the purchases are made through legitimate processes; others are made via illegal
channels such as black markets, underground forums, etc.
• Many types of resources are made to be re-usable so that they can be resold multiple times
to maximize financial gain.
• The re-use of resources across different attacks also presents opportunities to find
connections between malicious domains.
27
 Re-use of domains names:
 Time interval between previous and current use > 1 year.
 Score based on: TLD , changes in IP, price for domain transfer.
28
 Patterns of DNS queries for malicious domains can be used in prediction.
• Discovered several patterns in the DNS queries of a domain before the domain was
used and/or detected as malicious.
• The patterns indicate different activities related to malicious domains, including
preparing/testing the domain for malicious purposes.
29
Same Name Server
• Same IP address
• Same Registrant information
• Reasons: pseudo identity of an attacker, registered
using same stolen credit cared, etc.
•
30
 Detected and collected a list of name servers that provide DNS records for a number
of domains, most if not all of which are malicious.
31
WHOIS information for malicious domains sometimes includes:
Same/similar fake registrant name, same email, same registrant
address, etc.
32
• Collected data over one month.
• Data includes : Passive DNS records, WHOIS records, IP location database.
• Predicted 2172 domain names, 1793 are malicious – 83%.
33
 [1] Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, William Horne: Detecting
Malicious Domains via Graph Inference. Proceedings of the 2014 Workshop on
Artificial Intelligent and Security Workshop.
 [2] M. Antonakakis and R. Perdisc and D. Dagon and W. Lee and N. Feamster,
"Building a Dynamic Reputation Model for DNS," in USENIX Security Symposium, 2010,
pp. 273-190.
 [3] Wei Xu, Kyle Sanders & Yanxin Zhang Palo Alto Networks, Inc., USA: WE KNOW IT
BEFORE YOU DO: PREDICTING MALICIOUS DOMAINS
 [4] Shuang Hao, Nick Feamster, Monitoring the initial DNS behavior of malicious
domains. Internet Measurement 278.
34