Transcript a,b

Botnets
Botnets
• A botnet is a network of compromised
machines (bots) remotely controlled by an
attacker.
B
B
Commands
Compromise
U
Botmaster
B
Commands
Compromise
U
B
Key
B
U
ot
ncompromised Host
Botnets (Contd.)
•
Three unavoidable factors that are spurring botnet
growth:
–
Infection can occur even through legitimate Web sites
–
Bot exploits/malware delivery mechanisms are gaining
sophistication and better obfuscation techniques
–
Users do not have to do anything to become infected;
simply rendering a Web page can launch a botnet exploit
in 2Q 2008, 10 million bot computers were used to distribute spam and malware across
the Internet each day
[http://www.darkreading.com/document.asp?doc_id=161524]
Botnets- A Significant Threat
4
•
Most significant threats to network operators
•
Source: Worldwide Infrastructure Security Report, Arbor Networks, Sep.
2008
5
6
[T. Holz. A short visit to the bot zoo. IEEE Security & Privacy, 3(3):76–79, 2005]
Botnets- An Overview
7
•
Bots are used for various forms of illegal activity
•
There are many types of bots available in the wild, with a
lot of variants for each type
–
•
Agobot and SDbot are among the most popular
Bots share similar characteristics in general
–
They take advantage of many of the software vulnerabilities
such as software bugs, including those that enable
–
buffer overflow attacks, hacker installed backdoors, and various
memory management problems that allow malicious code to
infect a system.
Botnets- An Overview (Contd.)
8
•
Publicizing bot code is one of the main reasons for the appearance of many bot
variants within short period of time.
•
Making bot’s source code available for hackers enables them to modify it to
obtain customized versions that serve their bad intents.
•
Bots usually start their operation by estimating the infected system’s bandwidth
–
This is typically done by accessing several servers and sending data to them
–
This measurement is of particular importance for the attacker especially when
performing DDoS attack
•
Overall, there are a lot of differences between bots which are due to the variation
in the level of sophistication and features presented in the bot code
•
The common thing about bots is that attackers are eager to integrate new
software
vulnerabilities in their bot code very quickly. This means that bots will continue to
evolve in an unpredictable manner
Botnets- An Overview (Contd.)
9
Source: [T. Holz. A short visit to the bot zoo. IEEE Security & Privacy, 3(3):76–79, 2005]
Botnet Liftime
• Stage one: recruiting members, a botmaster needs
to compromise many computers in the Internet, so
that he/she can control them remotely
• Stage two: forming the botnet, bots need to find a
way to connect to each other and form a botnet
•
The C&C plane where bots receive commands from
the botmaster
Botnet Lifetime (contd)
•
•
Stage three: standing by for instructions, after the botnet is built up,
all bots are ready to communicate with their botmaster for further
instructions, such as launching an attack or performing an update
•
Push style: Bots passively wait for commands to come and will forward
received commands to others
•
Pull style: refers to the manner that bots retrieve commands actively
from a place where botmasters publish commands
The activity plane where bots execute these commands to launch
different types of attacks that include DDoS, spam, click fraud, etc
Botnets C&C
13
•
The structure of a botnet is basically determined by its
C&C plane topology which in turn specifies the way
botmaster delivers commands to botnet members.
•
C&C is usually implemented using one of the following
protocols:
–
IRC (Centralized)
–
HTTP (Centralized)
–
Email (Centralized)
–
P2P (Distributed)
13
Selective well known Botnets
[Source: Goufie Gu, PhD Thesis, 2008]
[G. Gu. et. al., NDSS 2008]
Botnet C&C: Spatial-Temporal
Correlation and Similarity
•
Bots of a botnet demonstrate spatial-temporal correlation and
similarities due to the nature of their pre-programmed response
activities to control commands
•
Bots need to connect to C&C servers in order to obtain commands
•
•
They may either keep a long connection or frequently connect back
Second, bots need to perform certain tasks and respond to the
received commands
Botnet C&C (cntd)
[G. Gu. et. al., NDSS 2008]
Challenges of Detecting IRC &
HTTP Botnets
• Botnet C&C traffic is difficult to detect because:
•
It follows normal protocol usage and is similar to
normal traffic
•
The traffic volume is low
•
There may be very few bots in the monitored network
•
It may contain encrypted communication
[M. A. Rajab, et. al , IMC 2006]
IRC-Based Botnets
• The majority of botnets today use the
Internet Relay Chat (IRC) protocol
• The IRC protocol was specifically designed
to allow for several forms of
communication (point-to-point, point to
multi-point, etc.) and data dissemination
among large number of end-hosts.
IRC-Based Botnets (contd)
•
What features make IRC the protocol of choice for
botmasters?
•
The inherent flexibility of this protocol
•
The availability of several open-source implementations,
enables third parties to extend it in ways that suit their
needs
•
It simplifies the botnet implementation and provides a high
degree of control over the bots
IRC-based Botnet Life Cycle
[Source: M. A. Rajab, et. Al , In IMC '06: Proceedings of the 6th ACM SIGCOMM on Internet
measurement. pp. 41-52. 2006]
Step 1: Exploit
• Exploit software vulnerability of victim host
• Same infection strategies as other malware
– Worms
– Malicious email code
B
B
Commands
Botmaster
B
Commands
B
Exploit
Vulnerable host
Step 2: Download Bot Binary
• Infected host executes shellcode to fetch bot binary from
specified location
– Usually the same machine that infected it
• After the download, the bot binary installs itself so it can auto
start on reboot B
B
Commands
Botmaster
B
Commands
B
Bot download
Vulnerable host
Step 3. DNS lookup
• Bot needs IP address of IRC server
• Perform DNS Lookup
• Better than hard-coding the server IP in case the
IP gets blacklisted B
DNS Server
B
Commands
Botmaster
B
Commands
Vulnerable host
B
Step 4: Join IRC Server
• Join server and channel specified in bot binary
• May use authentication:
1) Bot authenticates to join server using password from bot binary
2) Bot authenticates to join channel using password from bot binary
3) Botmaster authenticates to bot population to send command
B
B
Commands
DNS Server
IRC Server
Botmaster
B
Commands
B
bot
Step 5: Execute Commands
• Bot parses and executes channel topic
• Topic contains default command for all bots
to execute
B
IRC Server
Commands
DNS Server
B
Botmaster
B
Commands
B
bot
IRC-Based Communication
Example
IRC Botnets (Contd.)
•
Botherders are migrating away from IRC botnets because researchers know
how to track them.
•
Drawbacks:

Centralized server

IRC is not that secure by default

Security researchers understand IRC too.
B
IRC Server
Commands
DNS Server
B
Botmaster
B
Commands
B
bot
P2P Botnets
• Distributed Control
• Hard to disable
P2P Systems
Use the vast resources of
machines at the edge of the
Internet to build a network
that allows resource sharing
without any central authority.
More than a system for
sharing pirated music/movies
Characteristics of P2P Systems
•
Exploit edge resources.
•
•
Significant autonomy from any centralized
authority.
•
•
Storage, content, CPU, Human presence.
Each node can act as a Client as well as a Server.
Resources at edge have intermittent connectivity,
constantly being added & removed.
•
Infrastructure is untrusted and the components are
unreliable.
Overlay Network
A P2P network is an overlay network. Each link
between peers consists of one or more IP links.
Overlays : All in the application
layer
• Tremendous design
flexibility
•
•
•
•
Topology, maintenance
Message types
Protocol
Messaging over TCP or
UDP
• Underlying physical
network is transparent to
developer
•
But some overlays exploit
proximity
Overlay Graph
• Virtual edge
•
•
TCP connection
or simply a pointer to an IP address
• Overlay maintenance
•
•
•
•
•
Periodically ping to make sure neighbor is still
alive
Or verify aliveness while messaging
If neighbor goes down, may want to establish new
edge
New incoming node needs to bootstrap
Could be a challenge under high rate of churn
• Churn : dynamic topology and intermittent access
due to node arrival and failure
Overlay Graph
• Unstructured overlays
• e.g., new node randomly chooses existing
nodes as neighbors
• Structured overlays
• e.g., edges arranged in restrictive structure
P2P Applications
• P2P File Sharing
• Napster, Gnutella, Kazaa, eDonkey,
BitTorrent
• Chord, CAN, Pastry/Tapestry, Kademlia
• P2P Communications
• MSN, Skype, Social Networking Apps
• P2P Distributed Computing
• Seti@home
P2P File Sharing
Alice runs P2P client
application on her
notebook computer
Intermittently
connects to Internet
Gets new
IP address
for each
connection
Asks for
“movie
xyz”
Alice chooses one
of the peers, Bob.
Application displays
other peers that have
copy of xyz.
File is copied from
Bob’s PC to Alice’s
notebook
P2P
While Alice downloads,
other users upload
from Alice.
P2P
P2P Communication
• Instant Messaging
• Skype is a VoIP P2P system
Alice runs IM client
application on her
notebook computer
Intermittently
connects to Internet
Gets new
IP address
for each
connection
Alice initiates direct
TCP connection
with Bob, then
chats
P2P
Register
herself with
“system”
Learns from
“system” that Bob
in her buddy list is
active
Promising properties of P2P
• Massive scalability
• Autonomy : non single point of failure
• Resilience to Denial of Service
• Load distribution
• Resistance to censorship
Key Issues
• Management
•
•
How to maintain the P2P system under high rate of
churn efficiently
Application reliability is difficult to guarantee
• Lookup
•
How to find out the appropriate content/resource that
a user wants
• Throughput
•
•
Content distribution/dissemination applications
How to copy content fast, efficiently, reliably
Management Issue
•
A P2P network must be self-organizing.
•
•
•
•
Join and leave operations must be self-managed.
The infrastructure is untrusted and the components are
unreliable.
The number of faulty nodes grows linearly with system size.
Tolerance to failures and churn
•
•
•
•
Content replication, multiple paths
Leverage knowledge of executing application
Load balancing
Dealing with freeriders
•
Freerider : rational or selfish users who consume more than
their fair share of a public resource, or shoulder less than a
fair share of the costs of its production.
Lookup Issue
• How do you locate data/files/objects in a large
P2P system built around a dynamic set of nodes
in a scalable manner without any centralized
server or hierarchy?
• Efficient routing even if the structure of the
network is unpredictable.
•
Unstructured P2P : Napster, Gnutella, Kazaa
•
Structured P2P : Chord, CAN, Pastry/Tapestry,
Kademlia
Kademlia
• Nodes, files and key words, deploy SHA-1 hash
into a 160 bits space.
• Every node maintains information about files,
key words close to itself.
• The closeness between two objects measured
as their bitwise XOR interpreted as an integer.
distance(a, b) = a XOR b
Claims
• Only a small number of configuration
messages sent by the nodes.
• Uses parallel asynchronous queries to
avoid timeout delays of the failed nodes.
Routes are selected based on latency
• Kademlia is symmetric i.e. dist (a,b) = dist
(b,a)
Kademlia Binary Tree
• Treat nodes as leaves of a binary tree.
• Start from root, for any given node,
dividing the binary tree into a series of
successively lower subtrees that don’t
contain the node.
Kademlia Binary Tree
Subtrees of interest for a node 0011……
Kademlia Binary Tree
• Every node keeps touch with at least one
node from each of its subtrees. (if there is
a node in that subtree.) Corresponding to
each subtree, there is a k-bucket.
• Every node keeps a list of (IP-address, Port,
Node id) triples, and (key, value) tuples for
further exchanging information with others.
Kademlia Search
An example of lookup: Binary tree representing the complete 4 bit address
space. The node with id ”0000” performs a node lookup for id ”1111”.
The XOR Metric
• d (x,x) = 0
• d (x,y) > 0 if x ≠ y
• d (x,y) = d (y,x)
• d (x,y) + d (y,z) ≥ d (x, z)
• For each x and t, there is exactly one node
y for which d (x,y) = t
Node state
For each i (0 ≤ i <160) every node
keeps a list of nodes of distance
between 2i and 2(i+1) from itself.. Call
each list a k-bucket. The list is sorted
by time last seen. The value of k is
chosen so that any give set of k
nodes is unlikely to fail within an
hour. The list is updated whenever a
node receives a message.
k = system-wide replication parameter
head
Least recenly seen
Most recenly seen
tail
Node state
The nodes in the k-buckets are
the stepping stones of routing.
By relying on the oldest nodes,
k-buckets promise the probability
that they will remain online.
DoS attack is prevented since
the new nodes find it difficult
to get into the k-bucket
Least recenly seen
Most recenly seen
How is the bucket updated?
Kademlia RPC
•
PING: to test whether a node is online
•
STORE: instruct a node to store a key
•
FIND_NODE: takes an ID as an argument, a recipient returns (IP
address, UDP port, node id) of the k nodes that it knows from
the set of nodes closest to ID (node lookup)
•
FIND_VALUE: behaves like FIND_NODE, unless the recipient
received a STORE for that key, it just returns the stored value.
Kademlia Lookup
• The most important task is to locate the k closest nodes
to some given node ID.
• Kademlia employs a recursive algorithm for node
lookups. The lookup initiator starts by picking α nodes
from its closest non-empty k-bucket.
• The initiator then sends parallel, asynchronous
FIND_NODE to the α nodes it has chosen.
• α is a system-wide concurrency parameter, such as 3.
Kademlia Lookup
•
The initiator resends the FIND_NODE to nodes it has learned
about from previous RPCs.
•
If a round of FIND_NODES fails to return a node any closer
than the closest already seen, the initiator resends the
FIND_NODE to all of the k closest nodes it has not already
queried.
•
The lookup terminates when the initiator has queried and
gotten responses from the k closest nodes it has seen.
Kademlia Keys Store
•
To store a (key,value) pair, a participant locates the k
closest nodes to the key and sends them STORE RPCs.
•
Additionally, each node re-publishes (key,value) pairs as
necessary to keep them alive.
•
For Kademlia’s file sharing application, the original
publisher of a (key,value) pair is required to republish it
every 24 hours. Otherwise, (key,value) pairs expire 24
hours after publication.
New node join
• Each node bootstraps by looking for its own ID
Search recursively until no closer nodes can be found
• The nodes passed on the way are stored in the routing
table
[P. Wang et.al., IEEE ICCCN 2009]
P2P Botnets
56
•
P2P Botnets are classified into:
•
Parasite: All the bots are selected from hosts within an
existing P2P network  use this network for C&C
•
Leeching: All the bots join an existing P2P network  it
uses this available P2P network for C&C
•
Bot-only: All the members are bots (e.g., Stormnet,
Nugache) A P2P network has to be formed
Forming a P2P Network
57
•
•
Current P2P networks provide the following ways for
new peers to join the network (bootstrapping)
•
An initial peer list is hard-coded in each P2P client.
•
There is a shared web cache stored somewhere on the
Internet and the location of the cache is put in the client
code
These methods can be adopted for P2P botnet
construction (eg., Trojan.Peacomm, Stormnet)
P2P-botnets- Standing by for
instructions
58
•
•
Leveraging existing P2P protocols
•
Usually use pull mechanism
•
Eg., Storm botnet utilizes Overnet
Designing new P2P protocols
•
Can use push/pull mechanisms
•
Eg., Avanced Hybrid P2P botnet [C. C. Zou. et. al., DSN
2006], Super botnet [R. Vogt. et. al., NDSS 2007].
Case Study: Storm Botnet
• P2P network architecture
• Content-based publish/subscribe- style
communication
• Unauthenticated communication
[T. Holz, et.al., LEET 2008]
Storm Botnet- Propagation
Mechanism
•
Propagates using email
•
The attackers behind storm change the social engineering quite
often
•
Storm exploits web browsers with specific User-Agent
•
The actual exploit code in the malicious websites is polymorphic
•
The binary itself shows signs of polymorphism
Storm Botnet- System Level Behavior
•
Storm is sophisticated
–
–
–
Uses an advance binary packer
Uses a rootkit to hide its presence
Uses kernel level components to remain undetected
•
During the installation process, the malware also stores a
configuration file on the infected system
•
Storm synchronizes the system time of the infected machine
with the help of the Network Time Protocol (NTP)
Storm Botnet- Network Level Behavior
• The first version of Storm Worm uses
OVERNET
• Kademila-based P2P DHT routing protocol
• Stormnet- New version in October 2007
• Identical to Overnet except
• Each message is XOR encrypted with a 40byte long key
• Each node has 128-bit ID
Storm Botnet- Network Level
Behavior- Routing Lookup
•
A node a forwards a query destined to a node d to the node in its
routing table that has the smallest XOR-distance with d
•
The XOR-distance d(a, b) between nodes a and b is d(a, b) = a b
•
Prefix matching, looks for smallest XOR distance between
destination and contacts it has
•
Contacts: ID, IP, UDP port
•
Iterative lookups. Queries closest node for ID and repeats until
returned ID is further away than ID queried
Storm Botnet- Network Level BehaviorPublishing and Searching
64
•
•
Publishing and Searching
• A “key” is an identifier used to retrieve information
• Keys are stored by 20 nodes close to the key
• Publisher periodically republishes keys
• Botmaster publishes to a list of well known
“mailboxes”
• Each new bot looks for those mailboxes and retrieves
the intended information
Message types:
• Hello
• Kid (KeyID)Route request/response
• Publish request/response
• Key Search request/response
Storm Network
• Storm worm Communication
• Looks for peer by searching for keys
• Key = f(day, rand), rand is a 5 bit number
• Keys can by identified through:
Keys are different
• Reverse Engineering
each day
• Black box testing
Each day has a limited
number of keys
Storm network
66
Publish
Key/ip/port
Botmaster
Commands
Key(s) of the day
Ip/port
Bot
Storm network
67
• Weak authentication
•
•
hard coded key, k = 0x3ED9F146
Resp = challenge XOR k
• Ensuing communication compressed with zlib
• Campaigns:
•
•
Email
• Template + list of emails addresses
• Propagation and spam
DDoS
• SYN or ICMP flood