spamalytics - UCSD CSE - Systems and Networking

Transcript spamalytics - UCSD CSE - Systems and Networking

MURI Kickoff Welcome!
• First, introductions… all around
• Some context and expectations




We’re going to give some informal presentations about our plans
» Project just started, so few results yet
Each will have a leader, but this is a collaborative effort so
expect everyone to chime in
Please ask questions and give feedback anytime
We’ll try to keep to schedule, but we can go where you want
Rough schedule
•
•
•
•
10:00-10:30
10:30-11:00
11:00-11:30
11:30-12:00
UCB/UCSD Project overview/programatics
Botfarm and dynamic containment
Automated binary analysis
NLP of underground communications
• 12:00-1:30
Lunch
• 1:30-3:30
• 3:30-4:00
• 4:00-5:00
Wenke et al. (GATech/Umich/UCSB/Stanford)
Break
Feedback/brainstorming
Infiltration of Botnet
Command & Control and
Support Ecosystems
MURI Kickoff 2009
PIs: Stefan Savage, Geoff Voelker (UCSD)
Vern Paxson, Dawn Song and Dan Klein (UC Berkeley)
3
Key threat transformations
of the 21st century
• Efficient large-scale compromises



Internet communications model
Software homogeneity
User naïveity/fatigue
• Control networks


Cheap scalability for criminal applications
(e.g. spam, info theft, DDoS, etc)
Platform economy
• Profit-driven applications


Commodity resources
(IP, bandwidth, storage, CPU)
Unique resources
(PII/credentials, data exfiltration)
4
Philosophy
• Need to understand and impact botnets from
“the inside” instead of simply via their external actions
• Address real adversary – real bots and real botmasters
• Botnet infiltration (SIGINT)



Intelligence collection (what is the botnet doing?)
Command injection (tell botnet to do this)
Botnet disruption (shutdown and/or takeover botnet)
• Ecosystem intelligence (HUMINT)


Data mining/NLP on underground Web/chat to infer social
relationships in the botnet ecosystem
Who is supplying which resources, what are stress points,
points of attribution, etc
Botnet infiltration
• Key idea: distributed C&C is a vulnerability



Botnet authors like de-centralized communications for
scalability and resilience, but…
… to do so, they trust their bots to be good actors
If you can modify the right bots you can observe and influence
actions of the botnet via their communications
• We have done this once



Infiltrated Storm P2P botnet
Able to track everything botnet did and influence their actions
But… one off, and hard to scale
Kanich, Kreibich, Levchenko, Enright, Paxson, Voelker and Savage,
6
Spamalytics: an Empirical Analysis of Spam Marketing Conversion,
ACM CCS 2008
Botnet infiltration challenges
• Obtaining and grooming bots (tricky in practice)
• Safe execution environment



Must run bots, but contain their negative side-effects
Fine-grained containment control via network, VMs, etc
(informed by past work on Potemkin/GQ honeyfarms)
Especially must control scope of our “attacks”
• C&C extraction from botnet binaries


Extract C&C protocol w/o extensive manual reverse-engineering
Use to feed containment, attacks and C&C proxy
• Attack development and testing

Passive, cooperatively active, adversarially active
• Legal/Policy issues
Ecosystem intelligence
• Key idea: botmasters and bot support ecosystem
(clients, authors, cashiers, etc) social graph is implicit
in underground communications



Underground forums, chat, etc
Marketing, sales, requests, complaints, side-deals, etc
By extracting this graph can relate actors to actions
• We have done something similar once



Analyzed 9mos of #ccpower underground IRC data
Extracted buyer/seller and pricing relationships
Manual, error prone, no notion of specific actor
Franklin, Perrig, Paxson and Savage,
An Inquiry in the Nature and Causes of the Wealth of Internet Miscreants,
ACM CCS 2007
Ecosystem intelligence
challenges
• Pidgen/slang content (Eblish/
)
• Extracting structure from short free-form agrammatical
elements
confirmer
• WU
Identity
aliasing and multiple identities/pseudonyms
can confirm males and females have drops in usa
• AM
Matching
across
multiple sources
VERIFIED
MSG ME
• Limited ground truth knowledge
i am boa cashout
• Access to data
have wells and boa logins and i need to good drop man
.......ripper f#@! off
Have dropper for bots, all ie sploits
whos a good reg for fluxing?
Goal/evaluation
• Botnet infiltration




Can safely execute new botnet software, while still becoming
members of live botnets
Can efficiently extract botnet C&C
Can decode and interpret all commands, inject new commands
(acted upon) and exploit bot vulnerabilities sucessfully
Validate that attacks only impact bots inside containment
• Ecosystem intelligence



Identify actor identities/attributes, inter-actor relationships,
identify supply chain relationships, transactions, and roles
Validate automated mapping to human domain expert
assessment
Correlate with external ground-truth data from other studies
Milestones for this year
• Design work and prototype infrastructure for botfarm
containment/grooming, demonstrate safe hosting of
many bot families
• Prototype binary C&C extractor on one or more bots,
output to feed containment network proxy (interpret
C&C)
• Design work on ecosystem intelligence effort, dataset
gathering and gathering of some “ground-truth” data
(via botnet output, domain registration, spam
campaigns, etc)
Other sponsors/supporters
• Funding and in-kind (data, equipment, access)
• Several more who decline to be identified (industry)
Education elements
• Student training in research

Already have ~10 students involved in different aspects of
project (including 3 undergrads)
• Class integration


Network security courses at Berkeley and UCSD
Internet Crime course at UCSD
• Workforce development


Talks/tutorials to industry
Input to defense contractors in this space
Project management
• Tightly integrated group (many have 3-5yrs of
experience working w/each other)
• Communication via weekly teleconference, students
on IM, physical student exchanges
• Lead on each campus (Stefan, Vern) responsible for
local organization issues, but we cross lines routinely
• We attempt to centralize sensitive 3rd-party data and
protect it there (tricky issues wrt dual NDA negotiation)
• Advance legal review on any issues of risk
• Educational issues delegated to each PI, excepting
distributed courses
Questions?

spamalytics - UCSD CSE - Systems and Networking

Transcript spamalytics - UCSD CSE - Systems and Networking

Directory