Transcript Botnet

Botnet
Dr. 許 富 皓
1
Botnet [Trend Micro]
2
Historical List of Botnets (1) [wiki]
3
Historical List of Botnets (2) [wiki]
4
Definition of a Botnet

A botnet (zombie army or drone army)
refers to a pool of compromised
computers that are under the command of
a single hacker, or a small group of
hackers, known as a botmaster.
5
Definition of a Bot

A bot refers to a compromised end-host,
or a computer, which is a member of a
botnet.
6
The First Bot Generation
Malware – PrettyPark [F-Secure]



The first bot generation malware, PrettyPark
worm, appeared in 1999.
A critical difference between PrettyPark and
previous worms is that it makes use of IRC as a
means to allow a botmaster to remotely control a
large pool of compromised hosts.
Its revolutionary idea of using IRC as a discrete
and extensible method for Command and
Control (C&C) was soon adopted by the black
hat community.
7
How Fast Could Your Computer Be
Comprised?

Based on the observation of an unpatched version of
Windows 2000 or Windows XP located within a dial-in
network of a German ISP.


Normally it takes only a couple of minutes before it is
successfully compromised.
On average, the expected lifespan of the honeypot is less than
ten minutes.


After this small amount of time, the honeypot is often successfully
exploited by automated malware.
The shortest compromise time was only a few seconds:

Once we plugged the network cable in, an SDBot compromised the
machine via an exploit against TCP port 135 and installed itself on
the machine.
8
Sizes of Botnets[Wikipedia]



Some botnets consist of only a few hundred bots.
In contrast to this, several large botnets with up
to 50,000 hosts were also observed.
Botnets with over several hundred thousands
hosts have been reported in the past.
 Kraken
botnet
 On April 13, 2008, there were 495,000 computers
in the Kraken botnet[Damballa].
 Storm botnet [Enright]
 Conficker: 10,000,000 [F-Secure]
9
A Hosts May be Infected by
Several Botnets Simultaneously

A home computer which got infected by 16
different bots has been found.
10
Taxonomy of Botnets
Attacking behavior
 C&C models
 Rally mechanisms
 Communication protocols
 Observable botnet activities
 Evasion Techniques

11
Attacking Behavior [Paul Bächer et al.]









Distributed Denial-of-Service Attacks
Spamming
Sniffing Traffic
Keylogging
Spreading new malware
Installing Advertisement Addons
Google AdSense abuse
Manipulating online polls/games
Mass identity theft
12
Distributed Denial-of-Service Attacks (1)


Often botnets are used for Distributed Denialof-Service (DDoS) attacks.
A DDoS attack is an attack on a computer
system or network
 that
causes a loss of service to users, typically the
loss of network connectivity and services
by
 consuming
the bandwidth of the victim network
or
 overloading the computational resources of the victim
system.
13
Distributed Denial-of-Service Attacks (2)

Further research showed that botnets are even
used to run commercial DDoS attacks against
competing corporations:
 Operation
Cyberslam documents the story of Jay R.
Echouafni and Joshua Schichtel alias EMP.
 Echouafni was indicted on August 25, 2004 on
multiple charges of conspiracy and causing damage
to protected computers.

He worked closely together with EMP who ran a botnet to
send bulk mail and also carried out DDoS attacks against the
spam blacklist servers.
addition, they took Speedera - a global on-demand
computing platform - offline when they ran a paid
DDoS attack to take a competitor's website down.
 In
14
Proxy

Some bots offer the possibility to open a
SOCKS v4/v5 proxy on a compromised
machine.
 SOCKS
v4/v5 proxy : a generic proxy protocol
for TCP/IP-based networking applications
(RFC 1928).
15
Spamming

After having enabled the SOCKS proxy, this
machine can then be used for nefarious tasks
such as spamming.
 With
the help of a botnet and thousands of bots, an
attacker is able to send massive amounts of spam
mails.
 Often that spam you are receiving was sent from, or
proxied through, an old Windows computer at home.
 In addition, this can of course also be used to send
phishing-mails since phishing is a special case of
spam.

Some bots also implement a special function to
harvest email-addresses.
16
Botnets Guilty for 87% of 2009
Global Spam Mail [Yahan Wu ]

According to a report released by
Symantec, Botnets send out more than 87
percent of all unsolicited mail, equating to
around 151 billion emails a day.
17
Spam Capacity of Some Notorious
Botnets
Name
Est. Bot #
Spam Capacity
Conficker
10,000,000+
10 billion/day
Kraken
495,000
9 billion/day
Srizbi
450,000
60 billion/day
Bobax
185,000
9 billion/day
Rustock
150,000
30 billion/day
Cutwail
125,000
16 billion/day
Storm
85,000 (only 35,000 send email)
3 billion/day
Donbot
80,000
500 million/day
Grum
50,000
2 billion/day
Onewordsub
40,000
?
Mega-D
35,000
10 billion/day
Nucrypt
20,000
5 billion/day
Wopla
20,000
600 million/day
Spamthru
12,000
350 million/day
Crime Art
10,000
250 million/day
SilverNet
Unknown
Unknown
18
Sniffing Traffic



Bots can also use a packet sniffer to watch for
interesting clear-text data passing by a
compromised machine.
The sniffers are mostly used to retrieve sensitive
information like usernames and passwords.
If a machine is compromised more than once
and also a member of more than one botnet, the
packet sniffing allows to gather the key
information of the other botnet. Thus it is
possible to "steal" another botnet.
19
Keylogging


If the compromised machine uses encrypted
communication channels (e.g. HTTPS or POP3S), then
just sniffing the network packets on the victim's computer
is useless since the appropriate key to decrypt the
packets is missing.
With the help of a keylogger it is very easy for an
attacker to retrieve sensitive information.

An implemented filtering mechanism further helps in stealing
secret data.


e.g. "I am only interested in key sequences near the keyword
'paypal.com"
And if you imagine that this keylogger runs on thousands of
compromised machines in parallel you can imagine how quickly
PayPal accounts are harvested.
20
Spreading New Malware
In most cases, botnets are used to spread
new bots.
 This is very easy since all bots implement
mechanisms to download and execute a
file via HTTP or FTP.
 Spreading an email virus using a botnet
is a very nice idea, too.

A
botnet with 10,000 hosts which acts as the
start base for the mail virus allows very fast
spreading and thus causes more harm.
21
Installing Advertisement Addons


Botnets can also be used to gain financial
advantages.
This works by setting up a fake website with
some advertisements:
 The
operator of this website negotiates a deal with
some hosting companies that pay for clicks on ads.
 With the help of a botnet, these clicks can be
"automated" so that instantly a few thousand bots
click on the pop-ups.

This process can be further enhanced if the bot
hijacks the start-page of a compromised
machine so that the "clicks" are executed each
time the victim uses the browser.
22
Google AdSense Abuse

A similar abuse is also possible with Google's
AdSense program:
 AdSense
offers companies the possibility to display
Google advertisements on their own website and earn
money this way.
 The company earns money due to clicks on these ads,
for example per 10,000 clicks in one month.
 An attacker can abuse this program by leveraging his
botnet to click on these advertisements in an
automated fashion and thus artificially increments the
click counter.
 This kind of usage for botnets is relatively uncommon,
but not a bad idea from an attacker's perspective.
23
Loss Caused by Click Fraud
[Catherine Holahan]

On average, consultants estimate that
between 14% and 15% of clicks are
fraudulent.
24
Retrieve a URL from Old Version of Google
Search Results
25
Google Search Page
26
Google Search Result Page
27
Source HTML File of the Google
Search Result Page
28
Ampersands (&'s) in URLs [Liam
Quinn ]
Always use & in place of & when
writing URLs in HTML:
 E.g.:

<a
href="foo.cgi?chapter=1&amp;section=2&amp;copy=3&a
mp;lang=en">...</a>
29
Click Fraud (1) - Use the Browser’s
URL Field
30
Retrieve a URL form Latest Version of
Google Search Results – using Chrome
31

move cursor above the hyperlink
32

Click the right button of the mouse
33

Choose Inspect element of the pop-up menu
34
Click Fraud (2) – Connect to the
Google Server Directly

Attackers could launch the same attacks by
 opening
a HTTP connection to a Google
server
and
 sending the URL in the previous slide to the
above server directly.
35
Click Fraud (3) - Use Fake Page (1)
36
Click Fraud (3) - Use Fake Page (2) [Mr. 東]
37
Click Fraud (3) - Use Fake Page (3)
38
Manipulating online Polls/Games
Since every bot has a distinct IP address,
every vote will have the same credibility as
a vote cast by a real person.
 Online games can be manipulated in a
similar way.

 Currently
we are aware of bots being used
that way, and there is a chance that this will
get more important in the future.
39
Mass Identity Theft


Often the combination of different functionality described
above can be used for large scale identity theft, one of the
fastest growing crimes on the Internet.
Bogus emails ("phishing mails") that pretend to be
legitimate (such as fake PayPal or banking emails) ask
their intended victims to go online and submit their private
information.




These fake emails are generated and sent by bots via their
spamming mechanism.
These same bots can also host multiple fake websites pretending
to be ebay, PayPal, or a bank, and harvest personal information.
Just as quickly as one of these fake sites is shut down, another one
can pop up.
In addition, keylogging and sniffing of traffic can also be
used for identity theft.
40
What Is IRC, and How Does It Work? [David
Caraballo et al.]



IRC (Internet Relay Chat) provides a way of communicating
in real time with people from all over the world.
It consists of various separate networks (or "nets") of IRC
servers, machines that allow users to connect to IRC.
The largest nets are





EFnet (the original IRC net, often having more than 32,000 people at
once),
Undernet,
IRCnet,
DALnet,
and NewNet.
41
IRC Client



Generally, the user (such as you) runs a
program (called a "client") to connect to a
server on one of the IRC nets.
The server relays information to and from other
servers on the same net.
Recommended clients:
 UNIX/shell:
ircII
 Windows: mIRC
 Macintosh clients
42
IRC Bot [wikepedia]
An IRC bot is a set of scripts or an
independent program that connects to
Internet Relay Chat as a client, and so
appears to other IRC users as another user.
 It differs from a regular client in that instead
of providing interactive access to IRC for a
human user, it performs automated functions.

43
IRC Channels





Once connected to an IRC server on an IRC network,
you will usually join one or more "channels" and
converse with others there.
On IRC, channels are where people meet and chat.
You may know them as "chat rooms".
Channel names usually begin with a #, as in #irchelp.
Conversations may be


public (where everyone in a channel can see what you type)
or
private (messages between only two people, who may or may
not be on the same channel).
44
Scheme of an IRC-Network [wikipedia]
normal clients
bots
bouncers
45
Command and Control (C&C) System

C&C works as follows.
A
botmaster sets up a C&C server, typically
an IRC server.
 After a bot virus infects a host, it will connect
back to the C&C server and wait on the
botmaster’s command.
 In a typical IRC botnet, the bot will join a
certain IRC channel to listen to messages
from its master.
46
Categories of C&C

C&C systems can be roughly categorized
into three different models
 the
centralized model,
 the peer-to-peer (P2P) model
 the random model

P.S.:
 But
there is possibility that future botnets may
use new command and control systems that
are completely different from any of them,
noting the quickly evolving nature of botnets.
47
Centralized C&C Model

In the centralized model, a botmaster selects a single
high bandwidth host to be the contacting point (C&C
server) of all the bots.




The C&C server, usually a compromised computer as well,
would run certain network services such as IRC, HTTP and etc.
When a new computer is infected by a bot, it will join the botnet
by initiating a connection to the C&C server.
Once joined to the appropriate C&C server channel, the bot
would then wait on the C&C server for commands from the
botmaster.
Botnets may have mechanisms to protect their communications.

For example, IRC channels may be protected by passwords only
known to bots and their masters to prevent eavesdropping.
48
Popularity of the Centralized C&C
Model
The centralized model is the predominant
C&C model used by early botnets.
 Many well known bots, such as AgoBot,
SDBot and RBot, fall into the category of
the centralized C&C model.

49
Why the Centralized C&C Model (1) ?



Due to the rich variety of software tools (e.g.,
IRC bot scripts on IRC servers and IRC bots),
the centralized C&C model is rather simple to
implement and customize.
Notice that a botmaster can easily control
thousands of bots using the centralized model.
Botmasters are profit driven; hence, they are
more interested in the centralized C&C model
which allows them to control as many bots as
possible and maximize their profit.
50
Why the Centralized C&C Model (2) ?
Messaging latencies in the centralized
model is small.
 Therefore, it is easy for botmasters to
coordinate botnets and launch attacks.

51
Drawback of the Centralized C&C
Model
The C&C server is the crucial place where
most of the conversation happens.
Therefore, the C&C server is the weakest
link in a botnet.
 If we can manage to discover and destroy
the C&C server, the entire botnet will be
gone.

52
Motivation for a P2P-Based C&C Model


Some botnet authors have started to build alternative botnet
communication systems, which are more resilient to
failures in the network.
An interesting C&C paradigm exploits the idea of P2P
communication.


For instance, certain variants of Phatbot have used P2P
communication as a means to control botnets.
References of P2P:







[Kazaa]
[Mac_P2P]
[P2P network]
[CS_NCTU]
[DHT_ACT]
[DHT_Duke]
[DHT_wiki]
53
P2P Applications [ACT]
1999
2000
Napster
Gnutella
2001
2002
…
FastTrack
LimeWire
iMesh&Grokster
Morpheus
Kazaa
eDonkey
OverNet
eDonkey2000
BitTorrent
eXeem
DC++
54
Futures of the P2P-Based C&C
Model



Compared with the centralized C&C model, the
P2P based C&C model is much harder to
discover and destroy.
Since the communication system doesn’t heavily
depend on a few selected servers, destroying a
single, or even a number of bots, won’t
necessarily lead to the destruction of an entire
botnet.
Because of this, the P2P based C&C model has
been used increasingly in botnets.
55
Constraints of the P2P C&C Model (1)
Existing P2P systems only support
conversations of small user groups,
usually in the range of 10-50 users.
 The group size supported by P2P systems
is too small compared to the size of
centralized C&C botnets, in which a botnet
of 1000 compromised hosts is still on the
small side.

56
Constraints of the P2P C&C Model (2)
Existing P2P systems don’t guarantee
message delivery and propagation
latency.
 Therefore, if using P2P communication, a
botnet would be harder to coordinate than
those which use centralized C&C models.

57
Trend of the P2P C&C Model



The above two constraints have limited the wider
adoption of P2P based communication in botnets.
As the knowledge on implementing P2P based
botnets accumulates, new P2P-based botnets,
which overcome the above limitations, may appear.
As such, more and more botnets will move to use
P2P based communication since it is more robust
than centralized C&C communication.
58
Timeline of Peer-to-Peer Protocols
and Bots [Grizzard et al.]
Date
Name
Type
Distinguishing Description
12/1993
EggDrop
04/1998
GTbot
Variants
Malicious Bot
IRC bot based on mIRC executables and scripts
05/1999
Napster
Peer-to-Peer
First widely used hybrid central and peer-to-peer
service
11/1999
Direct
Connect
Peer-to-Peer
Variation of Napster hybrid model
03/2000
Gnutella
Peer-to-Peer
First decentralized peer-to-peer protocol
09/2000
eDonkey
Peer-to-Peer
Used checksum directory lookup for file resources
03/2001
Fast Track
Peer-to-Peer
Use of supernodes within the peer-to-peer
architecture
05/2001
WinMX
Peer-to-Peer
Proprietary protocol similar to FastTrack
06/2001
Ares
Peer-to-Peer
Has ability to penetrate NATs with UDP punching
Non-Malicious Bot
Recognized as early popular non-malicious IRC bot
59
Timeline of Peer-to-Peer Protocols
and Bots
Date
Name
Type
Distinguishing Description
07/2001
BitTorrent
Peer-to-Peer
Uses bandwidth currency to foster quick downloads
04/2002
SDbot
Variants
Malicious Bot
Provided own IRC client for better efficiency
10/2002
Agobot
Variants
Malicious Bot
Incredibly robust, flexible, and modular design
04/2003
Spybot
Variants
Malicious Bot
Extensive feature set based on Agobot
05/2003
WASTE
Peer-to-Peer
Small VPN-style network with RSA public keys
09/2003
Sinit
Malicious Bot
Peer-to-peer bot using random scanning to find peers
11/2003
Kademlia
Peer-to-Peer
Uses distributed hash tables for decentralized
architecture
03/2004
Phatbot
Malicious Bot
Peer-to-peer bot based on WASTE
03/2006
SpamThru
Malicious Bot
Peer-to-peer bot using custom protocol for backup
04/2006
Nugache
Malicious Bot
Peer-to-peer bot connecting to predefined peers
01/2007
Peacomm
Malicious Bot
Peer-to-peer bot based on Kademlia
60
Random C&C Model





In the proposed random C&C model, a bot will not
actively contact other bots or the botmaster.
Rather, a bot would listen to incoming connections
from its botmaster.
To launch attacks, a botmaster would scan the
Internet to discover its bots.
When a bot is found, the botmaster will issue
command to the bot.
Although this C&C model has not been used in real
world botnets, it is potentially interesting to certain
future types of botnets that want high survivability.
61
Constraints of Random C&C Model

While such a C&C model is easy to
implement and highly resilient to discovery
and destruction, the model intrinsically has
scalability problem, and is difficult to be
used for large scale, coordinated attacks.
62
Rallying Mechanisms
63
Rallying Mechanisms

Rallying mechanisms are critical for
botnets to
 discover
new bots
and
 rally them under their botmasters.
64
Hard-coded IP Address

A common method used to rally new bots
works like this:
A
bot includes hard-coded C&C server IP
addresses in its binary.
 When the bot initially infects a computer, the
computer will connect back to the C&C server
using the hard-coded server IP address that is
contained in the binary code.
65
Drawbacks of Hard-coded IP Address

The problem with using hard-coded IP
addresses is that
 the
C&C server can be easily detected
and
 the communication channel easily blocked.


If a C&C server is "disconnected" in this fashion,
a botnet may be completely deactivated.
Because of this, hard-coded server IP addresses
are not as much used now by recent variants of
bots.
66
Dynamic DNS Domain Name

The bots today often include hard-coded
domain names, assigned by dynamical
DNS providers.
67
Benefit of Dynamic DNS Domain
Name (1)

The benefit to use dynamic DNS is that, if a
C&C server is shutdown by authorities, the
botmaster can easily resume his/her control by
creating a new C&C server somewhere else and
updating the IP address in the corresponding
dynamic DNS entry.
 When
connections to the old C&C server fail, the bots
will perform DNS queries and be redirected to the
new C&C server.
 This DNS redirection behavior is often known as
herding.
68
Benefit of Dynamic DNS Domain
Name (2)
Using dynamic DNS names, a botmaster
can retain the control on its botnet when
existing C&C server fails to function.
 Sometimes, a botmaster will also update
the dynamic DNS entry periodically to shift
the locations of the command and control
server, making the detection harder.

69
Distributed DNS Service




Some of the newer botnet breeds run their own
distributed DNS service at locations that are out of the
reach of law enforcement or other authorities.
Bots include the addresses of these DNS servers and
contact these servers to resolve the IP addresses of
C&C servers.
Many times, these DNS services are chosen to run at
high port numbers in order to evade the detection by
security devices at gateways.
The botnets using distributed DNS service to rally their
bots are the hardest to detect and destroy, compared
with other types of botnets discussed.
70
Communication Protocols
71
Communication Protocols


Bots communicate with each other and their
botmasters following certain well-defined
network protocols.
In most cases, botnets don’t create new network
protocols for their communication. Instead, they
use existing communication protocols that are
implemented by publicly available software tools.
 e.g.,
the IRC protocol itself, and already publicly
available software implementations for IRC servers
and clients.
72
The Importance of Understanding
the Botnet Comm. Protocols

First, their communication characteristics
provide an understanding of
 the
botnets’ origins
and
 the possible software tools being used.

Secondly, understanding the communication
protocols help security researchers to decode
the conversations which happen among bots
and their masters.
73
Common Botnet Communication
Protocols
IRC Protocol
 HTTP Protocol
 P2P Protocol
 … and so on.

74
Evasion Techniques
75
Evasion Techniques – for AV and IDS

A variety of techniques are used by
botnets to evade AV and signature based
IDS systems, e.g.,
 sophisticated
 rootkits,

executable packers
etc
These evasion techniques improve
 the
survivability of botnets
and
 the success rate of compromising new hosts.
76
Evasion Techniques –
Communication (1)



Additionally, botnets have also added (and continue to
add) new mechanisms to hide traces of their
communication, e.g. fast-flux.
Some botnets are moving away from IRC, since
monitoring of IRC traffic is increasingly done in an effort
to detecting botnets.
Instead, botnets are starting to use


modified IRC protocols
or
other protocols altogether (e.g., HTTP, VoIP)
for their communication channels.
77
Evasion Techniques –
Communication (2)
Encryption schemes are also being used
to prevent the content from being revealed.
 Certain state-of-the-art botnets even use
covert channel communications such as
TCP and ICMP tunneling, and even IPv6
tunneling.
 There have been technical using SKYPE
and IM to support communication.

78
Observable Activities
79
Other Observable Activities
In order to detect the presence of botnets,
we need to discover abnormal behaviors
exhibited by botnets.
 The botnet observable behaviors can be
categorized into three types:

 network
based behavior
 host-based behavior
 global correlated behavior.
80
Network-based Behaviors
1.
Observable Communication

Botmasters need to


2.
communicate with their bots
and
launch attacks.
Observable Attacking Traffic

When performing these functions, botnets will
generate certain observable network traffic
patterns that we can use to detect


individual bots
and
their C&C servers.
81
Observable Communication (1)

Since botnets often use IRC and HTTP to
communicate with their bots, observable IRC &
HTTP traffic with abnormal patterns can be
used to indicate the presence of bots and the
C&C servers.
 For example,
 inbound/outbound IRC traffic to an interior enterprise network
where IRC service is not allowed
and
 IRC conversations that follow certain syntax conventions that
humans don’t readily understand.
82
Observable Communication (2)


Many botnets use dynamic DNS domain names to
locate their C&C servers. Thus, abnormal DNS queries
may also used to detect botnets.
In some instances, hosts are found to query for improper
domain names (e.g., cheese.dns4biz.org,
butter.dns4biz.org) which can indicate a high
probability that these hosts are compromised.


The next logical step in this methodology would be to attempt to
glean the IP addresses of their C&C servers in observable traffic
streams.
If further detective work reveals that the IP address associated to
a particular domain name keeps changing periodically, it can
provide an even stronger indication the presence of a botnet.
83
Observable Communication (3)

Moreover, botnets may exhibit additional network
abnormalities that allow us to discover them.



One example would be a case in which bots are usually idle
most of the time in a connection, and would response faster than
a human being at the keyboard surfing the web.
Yet another example would be a case of some sort of
communication traffic originated by botnets is more "bursty" than
normal traffic.
So, botnets can potentially be discovered by monitoring
network traffic flow.
84
Observable Attacking Traffic

The traffic generated by botnets allows us to
discover their presence.
 For example,
 When launching DDoS TCP SYN flood attacks, botnets can
send out a large number of invalid TCP SYN packets with
fake source IP addresses.
 Therefore, if a network monitoring device finds a large
number of outbound TCP SYN packets that have invalid
source IP address (i.e., IP addresses that should not come
from the internal network), it would indicate that some
internal hosts may be compromised, and actively
participating in a DDoS attack.
 Similarly, if an internal host is found to send out phishing emails, there is an indication that the host is infected by bots
as well.
85
Host Based Behavior


Bots compromise computers and hide their
presence just like many older computer viruses.
Therefore, they exhibit certain observable
behaviors as viruses do at compromised hosts.
 When
executing, bots will make sequences of
system/library calls, e.g.



modifying system registries and system files
creating network connections
disabling antivirus programs
 The
sequences of system/library calls made by bots
are often different from legitimate programs and
applications.
86
Global Correlated Behaviors




Perhaps botnet behavior observed in a global
snapshot is the most interesting one from the
viewpoint of detection efficiency.
Those global behavioral characteristics are often
tied to the fundamental structures and
mechanisms of botnets.
Consequently, they are unlikely to change from
botnet to botnet unless the structures and
mechanisms of botnets themselves are
redesigned and re-implemented.
As a result, these globally observable behaviors
are the most valuable to detect families of
botnets.
87
Global Correlated Behaviors – DNS
Traffic (1)



Many botnets use dynamic DNS entry to track
their C&C servers.
As a new C&C server is built, the related DNS
entry will be updated to the IP address of the
new C&C server. Therefore, bots will find the
location of the new C&C server.
Botmasters may herd their botnets to different
C&C servers’ locations periodically to prevent
detections.
88
Global Correlated Behaviors – DNS
Traffic (2)

When a botmaster updates its dynamic DNS
entry for C&C server:
 there
would be an observable global behavior on the
Internet
 specifically,


bots are disconnected from the old C&C server
bots will query their DNS server for the new IP address of the
domain name, resulting in an increase of DNS queries to this
DNS entry globally.
89
Global Correlated Behaviors – DNS
Traffic (3)


Therefore, if a network monitor discovers that a
dynamic DNS entry is updated, which follows
significant amount of DNS queries to this entry,
then there is a high probability that this dynamic
DNS domain name is being used by botnet C&C
servers.
Such a feature is unlikely to change whether a
botnet is using IRC for communication or using
HTTP for communication, unless the
communication structure is changed.
90
Domain Name System[wikipedia]
91
Domain Name System




A lookup mechanism for translating hostnames into IP
addresses and vice-versa.
DNS provides the naming standard for IP-based
networks.
A globally distributed, loosely coherent, scalable, reliable,
dynamic database.
Comprised of three components:



A “name space” (domain)
Servers (name servers) making that name space available.
Resolvers (clients) which query the servers about the name
space
92
Domain



Domains are “namespaces”
Everything below .com is in the com domain.
Everything below ripe.net is in the ripe.net
domain and in the net domain.
93
Domain Name Space

The domain name space consists of a
tree of domain names.
94
Zone
The tree sub-divides into zones beginning
at the root zone.
 A DNS zone is a subset of the hierarchical
domain name structure of the DNS.
 Every DNS zone must be assigned a set
of authoritative name servers that are
installed in NS records in the parent zone.

A
single name server can host several zones.
95
96
Delegated Subzone



Administrative responsibility over any zone may
be divided, thereby creating additional zones.
Authority for a portion of the old space is said to
be delegated, usually in form of sub-domains,
to another nameserver and administrative entity .
The old zone ceases to be authoritative for the
new zone.
97
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (1)
Domain name servers store information
about part of the domain name space
called a zone.
 The name servers are authoritative for a
particular zone.
 A single name server can be authoritative
for many zones.

98
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (2)
Understanding the difference between a
zone and a domain is sometimes
confusing.
 A zone is simply a portion of a domain.

99
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (3)

For example,
 the

domain Microsoft.com may contain
all of the data for
Microsoft.com
 Marketing.microsoft.com

and
 Development.microsoft.com.
100
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (4)
 However,
the zone Microsoft.com contains only

information for Microsoft.com

and
references to the authoritative name servers for the
subdomains.
 The


zone Microsoft.com can contain
the data for subdomains of Microsoft.com if they have not
been delegated to another server.
For example,


Marketing.microsoft.com may manage its own delegated
zone.
Development.microsoft.com may be managed by the
parent, Microsoft.com.
101
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (5)
Microsoft.com
Marketing.Microsoft.com
Development.Microsoft.com
Microsoft.com domain
Microsoft.com zone
Marketing.Microsoft.com domain and zone
102
Comparison of a DNS Zone and
DNS Domain [Microsoft] – (6)
If there are no subdomains, then the zone
and domain are essentially the same.
 In this case the zone contains all data for
the domain.

103
Domain Name Formulation (1)

A domain name consists of one or more
parts, technically called labels, that are
conventionally concatenated, and
delimited by dots, such as example.com.
104
Domain Name Formulation (2)
The right-most label conveys the top-level
domain.
 For example, the domain name
www.example.com belongs to the toplevel domain com.

105
Domain Name Formulation (3)



The hierarchy of domains descends from right to
left; each label to the left specifies a
subdivision, or subdomain of the domain to
the right.
For example: the label example specifies a
subdomain of the com domain, and www is a sub
domain of example.com.
This tree of subdivisions may consist of 127
levels.
106
Domain Name Formulation (4)
A hostname is a domain name that has
at least one IP address associated.
 For example, the domain names
www.example.com and example.com
are also hostnames, whereas the com
domain is not.

107
Structure of the Domain Space –
Top Level Domains

Immediately below the root is the Top
Level Domains.
 These

consist of
country specific Top Level Domain (ccTLDs),
and


generic Top Level Domains (gTLDs).
CCNSO and GNSO decides the contents
of ccTLDs and gTLDs respectively.
108
Structure of the Domain Space –
Second Level Domains

Below these domains, you have the second
level domain names.
 These
domain names are usually "delegated" by the
administrators of the relevant TLD which means that
someone else is responsible for administering that
part of the name space.

e.g. the administrators of .ie delegated the domain
linux.ie to the Irish Linux Users Group, which means that
ILUG are now responsible for administering the domain in
any way they see fit without reference to the administrators
of .ie.
 Once
a domain is delegated, the administrators of the
domain are responsible for making changes within
that domain.
109
Top Level Domain (TLD) Types
110
General TLDs (1)
111
General TLDs (2)
112
DNS Servers and Their Layout


The DNS consists of a hierarchical set of DNS
servers.
Each zone (domain) or subzone (subdomain) has
one or more authoritative DNS servers that
publish information about
 that zone (domain), and
 the name servers of any


zones (domains) "beneath" it.
The hierarchy of authoritative DNS servers
matches the hierarchy of zones (domains).
At the top of the hierarchy stand the root servers:
the servers to query when looking up (resolving) a
top-level domain name.
113
DNS Name Server

A DNS name server is a server that
 stores

the DNS records for a zone (domain name)
such as
 address (A) records
 name server (NS) records
and
 mail exchanger (MX) records
and
 responds with answers to queries against its database.
114
DNS Server Categories
Server Type
Root
Authoritative
Resolver
Definition
Any server that acts as a central lookup for
other server to depend on, and does not rely
on other servers for Name Server zone
information
Any server that hosts zones (domains) and
returns zone information publicly
A server that performs domain queries for
end users but does not host zones (domains)
or zone information
115
Root Name Servers

The top of the hierarchy is served by the
root name servers, the servers to query
when looking up (resolving) a top-level
domain name (TLD).
116
Anycast

Anycast is a network addressing and
routing methodology in which datagrams
from a single sender are routed to the
topologically nearest node in a group of
potential receivers all identified by the
same destination address.
117
Names of Root Name Servers





While only 13 names are used for the root nameservers,
there are many more physical servers.
The 13 names are in the form letter.root-servers.net,
where letter ranges from A to M.
Each operator uses redundant computer equipment to
provide reliable service even if failure of hardware or software
occur.
C, F, I, J, K, L and M servers now exist in multiple locations
on different continents, using anycast address
announcements to provide decentralized service.
As a result most of the physical root servers are now outside
the United States,
118
Root Server Addresses [wikipedia][root-servers]
119
Map of all 123 DNS root server instances (including
local Anycast instances) at the end of 2006.
120
Authoritative Name Server


The Domain Name System distributes the responsibility
of assigning domain names and mapping those names
to IP addresses by designating authoritative name
servers for each zone (domain).
Authoritative name servers are assigned to be responsible
for their particular zones (domains), and in turn can
assign other authoritative name servers for their subzones (sub-domains).

This mechanism has made the DNS distributed and fault tolerant
and has helped avoid the need for a single central register to be
continually consulted and updated.
121
Responses of Authoritative Name
Servers

An authoritative name server only returns
answers to queries about domain names
that have been specifically configured by
the administrator of the server.
122
Master and Slave Server



An authoritative name server can either be a
master server or a slave server.
A master server is a server that stores the
original (master) copies of all zone records.
A slave server uses an automatic updating
mechanism of the DNS protocol in
communication with its master to maintain an
identical copy of the master records.
123
Name Server Delegation
Name servers in delegations are identified
by name, rather than by IP address.
 This means that a resolving name server
must issue another DNS request to find
out the IP address of the server to which it
has been referred.

124
Circular Dependencies and Glue
Records




If the name given in the delegation is a subdomain of the
domain for which the delegation is being provided, there
is a circular dependency.
In this case the nameserver providing the delegation
must also provide one or more IP addresses for the
authoritative nameserver mentioned in the delegation.
This information is called glue.
The delegating name server provides this glue in the
form of records in the additional section of the DNS
response, and provides the delegation in the answer
section of the response.
125
Example





Consider the domain example.org. Assume that the authoritative
name server for example.org is ns1.example.org.
A computer trying to resolve www.example.org will first have to
resolve ns1.example.org.
Since ns1 is also under example.org, resolving
ns1.example.org requires resolving example.org—a circular
dependency.
To break the dependency, the nameserver for the org top level
domain includes glue along with the delegation for example.org.
The glue records are A and/or AAAA records that provide IP
addresses for ns1.example.org. The resolver uses one or more of
these IP addresses to satisfy the circular dependency, which allows it
to communicate with ns1.example.org and finish resolving the
DNS query.
126
DNS Resolution Sequence (1)
127
DNS Resolution Sequence (2)
root domain
server
128
Record Caching



Because of the large volume of requests
generated in the DNS for the public Internet, the
designers wished to provide a mechanism to
reduce the load on individual DNS servers.
To this end, the DNS resolution process allows
for caching of records for a period of time after
an answer.
This entails the local recording and subsequent
consultation of the copy instead of initiating a
new request upstream.
129
TTL



The time for which a resolver caches a DNS
response is determined by a value called the
time to live (TTL) associated with every record.
The TTL is set by the administrator of the DNS
server handing out the authoritative response.
The period of validity may vary from just
seconds to days or even weeks.
130
Resource Record


A Resource Record (RR) is the basic data
element in the domain name system.
Each record has
 a type (A, MX, etc.)
 an expiration time limit
a
class
and
 some type-specific data.

Resource records of the same type define a
resource record set.
131
RR (Resource record) Fields
132
TYPE Field



TYPE is the record type.
It indicates the format of the data and it gives a hint of its
intended use.
For example



the A record is used to translate from a domain name to an IPv4
address
the NS record lists which name servers can answer lookups on
a DNS zone
the MX record specifies the mail server used to handle mail for
a domain specified in an e-mail address.
133
Zone File [wikipedia]
A Domain Name System (DNS) zone file
is a text file that describes a DNS zone.
 A zone file is a sequence of entries for
resource records.
 Each line is a text description that defines
a single resource record (RR).

134
A Zone File Example
135
Fast Flux [Riden][SSAC]
136
Fast-flux Service Networks


Fast-flux service networks are a network of
compromised computer systems with public
DNS records that are constantly changing, in
some cases every few minutes.
These constantly changing architectures make it
much more difficult to track down criminal
activities and shut down their operations.
137
Goal of Fast-Flux


The goal of fast-flux is for a fully qualified domain
name (such as www.example.com) to have
multiple (hundreds or even thousands) IP
addresses assigned to it.
These IP addresses are swapped in and out of flux
with extreme frequency, using a combination of
 round-robin
IP addresses
and
 a very short Time-To-Live (TTL) for any given particular
DNS Resource Record (RR).
138
Web Request – Normal Network
139
Web Request – Fast Flux
140
DNS Resolution – Single Flux
141
DNS Resolution – Double Flux
142
DNS Resolution – Double Flux
143
Build a Fast-Flux Service Network (1)


Fast flux users often register domain names for
their illegal activities at an accredited registrar or
reseller.
In one form of attack, the fast flux customer
registers
a
domain name (for a flux service network) to host
illegal web sites (boguswebsitesexample.tld)
and
 a (second or several) domain name(s) for a flux
service network to provide name resolution service
(nameserverservicenetwork.tld).
144
Build a Fast-Flux Service Network (2)


The fast flux service network operator uses automated
techniques to rapidly change name server information in
the registration records maintained by the registrar for
these domains.
In particular, the fast flux service network operator


changes the IP addresses of the domain's name servers to point
to different hosts in the domain
nameserverservicenetwork.tld and
sets the times to live (TTLs) in the address records for these
name servers to a very small value (1-3 minutes is common).
In charge of providing IP info. for hosts in
domain boguswebsitesexample.tld
145
Build a Fast-Flux Service Network (3)

Resource records associated with a name
server domain used in fast flux hosting
might appear in a TLD zone file as:
$TTL 180
boguswebsitesexample.tld. NS NS1.nameserverservicenetwork.tld
boguswebsitesexample.tld. NS NS2.nameserverservicenetwork.tld
…
NS1.nameserverservicenetwork.tld. A 10.0.0.1
NS2.nameserverservicenetwork.tld. A 10.0.0.2
146
Build a Fast-Flux Service Network (4)

Note that the time-to-live (TTL) for the resource records
is set very low (in the example, 180 seconds). When the
TTL expires, the fast flux service network operator's
automation assures that a new set of A records for
name servers replaces the existing set:
$TTL 180
boguswebsitesexample.tld. NS NS1.nameserverservicenetwork.tld
boguswebsitesexample.tld. NS NS2.nameserverservicenetwork.tld
…
NS1.nameserverservicenetwork.tld. A 192.168.0.123
NS2.nameserverservicenetwork.tld. A 10.10.10.233
147
Build a Fast-Flux Service Network (5)

Records associated with the illegal web site
might appear in a zone file hosted on a DNS bot
in the nameserverservicenetwork.tld
network as:
boguswebsitesexample.tld.
boguswebsitesexample.tld.
boguswebsitesexample.tld.
boguswebsitesexample.tld.
180
180
180
180
IN
IN
IN
IN
A
A
A
A
192.168.0.1
172.16.0.99
10.0.10.200
192.168.140.11
148
Build a Fast-Flux Service Network (6)


Note again that the time-to-live (TTL) for each A
resource record is set very low (in the example, 180
seconds).
When the TTL expires, the resource records would be
automatically modified to point to other bots that host this
illegal web site. Only minutes later, the zone file might
read:
boguswebsitesexample.tld.
boguswebsitesexample.tld.
boguswebsitesexample.tld.
boguswebsitesexample.tld.
180
180
180
180
IN
IN
IN
IN
A
A
A
A
192.168.168.14
172.17.0.199
10.10.10.2
192.168.0.111
149