Use of AI algorithms in design of Web Application Security Testing

Transcript Use of AI algorithms in design of Web Application Security Testing

Use of AI algorithms in
design of Web Application
Security Testing
Framework
HITCON
Taipei 2006
Or a “non-monkey”
approach to hacking web
applications
By fyodor and meder
[email protected] [email protected]
“No. we are not
writing another web
scanner!!”
Agenda




Why hacking web applications
What scanners do. Why they are useless (or not)
What else could be done, but isn’t (yet)
Introduction to YAWATT





User-session based approach
Distributed
Intelligent (or not?)
Modular
More than “application security scanner” coverage
This work background
STIF, STIF2 automation – agent-based
cooperative automated hacking environment
http://o0o.nu/sec/STIF

So, why going for the web
They learnt to configure their firewalls
 They learnt to disable services they don’t
want
 They finally know how to use nmap (and
even nessus!!)
….
 But they still want web
 And they can’t learn to code

So why web applications
Applications get complex
 Multilayered frameworks make it even more
fun
 Amount of web application based services
grow
 Number of web application programmers
increase (home brewed web applications)
but …

Web application remains a larger
hole into one’s network




Web application programmers skills aren’t
usually the best
Firewalls are there – just to let you in
application firewalls can stop limited number
of web application attacks, but are useless
when it comes to detection of logical
vulnerabilities
IDS systems aren’t smart enough to pick up
on Application attacks
Scanners.. use of..





checking for enumeration ... YES
checking for exectution ... YES
checking if we can drop table YES
checking if we can drop database .. YES ..
CANNOT CONNECT TO APPLICATION
Scanners - summary




Nessus et all – don’t see web applications
beyond the underlying software configuration
Libwhisker/nikto – signature based. Relatively
primitive. Efficient for default bugs
Wikto/e-Or – session aware, coding flaws
scanner
Kavado/Appscan/Webinspect/NStalker/Watchfire Appscan – intelligent
scanners. Session aware. Closed “blackbox”
(some allow scripted plugins)
Why scanners ain’t enough





Single-host based
Commercial scanners are black-box (not
extendable, non-correctable)
Little or no control on “hacking” process
Not easily extendable on the fly with new
‘automation” modules
Often primitive, signature based logic
What would we like to have







Maximum automation of web hacking
process
Minimum of code writing.
Autonomous functionality
Knowledge transfer
Ability to add ‘hacks’ on the fly
Deal with uncertainty in “intelligent way”
Learn from valid user session data
Other good things to have




Be able to test new class of bugs (i.e. session
hijacking)
Be able to attack web application from
multiple-locations (bypass IP restrictions,
improve brute-forcing process)
Be able to automate testing of application
logic bugs
Be able to make intelligent guesses
Introducing YAWATT
method
User sessions


User sessions – collections of user
request/response pairs (url, name/value
pairs, session information and selective
HTTP protocol data)
Classified user session data include semantic
classification of URL, parameters, responses
and HTTP protocol data (server type,
backend system(s) if visible, “unusual” HTTP
headers content)
User sessions

User session data can be obtained from:




Proxy servers (burp, paros, ..)
Web server logs
Browser automation scripts (i.e. WATIR
framework)
Spiders (burp)
Less code, more automation




Application content is learnt from user sessions (data feeders)
Additional application information could be gathered by agent’s
plugins (i.e. directory splitting tests)
User session data is classified by:
 Semantic and functional classification of URL
 HTTP protocol classificators (server type, cookies ..)

Session classificators
 Input data classification – type, semantics
 Output classification (application error detection, redirects,
“bogus’ responses etc)
Test-case suites and executed in groups
 Stateless tests
 Stateful tests
 Mixed
Classification process as new data
arrives into the system
Go Intelligent
Main components:
 Web application components (URL) classification
 Semantic classification for web application input
data
 LSI based mapping and comparison of web content
In response analysers.
 Use of external search engines
 Limited “binary analysis” of downloaded files
(decoding pdf, doc, rtf (other formats later)
Knowledge Transfer to machine



Possibility to create new classification rules
on the fly (and let the system re-learn from it)
Possibility to ‘reclassify’ application
responses
Possibility to add new ‘testing’ plugins on the
fly
How is URL classification used

Vulnerability scenario testing – uses
‘classificators’ subscribtion mechanism.

For example: login page tester will need
‘login’, ‘executable’ and ‘session’
How does input data semantics
identification happen
How the classified user session
data is used
Additional research directions

Other ideas to work on:





Detection of “hidden” parameters
Identification of “hidden” urls
Identification of “negative” and ‘positive”
responses
Detection of application failures, redirects
Evaluation and priority based execution for
plugins
A note on distributed architecture

Cooperative Agents Infrastructure



Design cooperative agent system
Multi-platform
Portable
Distributed architecture
Distributed architecture (another
look)
What distributed approach gives
us:
DDoS – EASY!!!
 Distributed brute-forcing. Bypassing IP based
restrictions, bandwidth limitations
 IDS – more tricks
 Bypass packet filtering restrictions
an agent behind the firewall!

Communication framework

Modified version of spread




Robust
Reliable message delivery
Portable (windows/unix)
Available in C/C++ and Java flavours. Bindings
exist for Python, Ruby!
In progress


Agents communicate with message
Task distribution algorithms – in progress
More on intelligence

Aside from application vulnerabilities, other
things of interest are:



Email addresses, user ids that could be seen
within web content
Domain names (within web pages, comments,
binary files, etc)
Building ‘target-oriented’ dictionary files (used by
brute-force cracking modules)
Other good things

Add your plugin code on the fly (attack
automation plugins via subscription
mechanism, classification plugins etc):

Can’t be simpler:
Look mah, no hands!

No reload is needed, plugins executed next
time the new data is processed
beyond normalities of average
application scanner



Integration and use of other tools to collect
and analyse data (search engine queries, ..)
Integration with other tools (script in python or
ruby, or hack “plugin” in java or C)
If you like your favourite application hax0r
tool – you still can use it (and feed the data to
us!)
Other remainders:

Direct interaction with analyst (not fully
implemented yet):
Other remainders:

Data lookup and data mining services for
plugins (via mySQL database wrapping
DataMiner).
Other ‘nice to have’ things in
progress

Propogation module: manual or automated
agent installation on vulnerable server
(controlled worm spreading capability!)
Demo


Code is spaghetti (sorry about that)
Will demonstrate functional bits
Questions and Answers
Sample questions, pick one: ;---------)
 Why another web hacking tool?
 Can you do X too..?
Thanks
Thanks for your patience
 The code, slides and docs will be available in
a while:
http://o0o.nu/sec

Xcon plug





XCon2006 the Fifth Information Security
Conference will be held in Beijing, China,
during August 22-24, 2006.
Speaking: abit late, but you can try:
[email protected]
Attending: should be possible and interesting
No politics! ;-)
Thanks!

Use of AI algorithms in design of Web Application Security Testing

Transcript Use of AI algorithms in design of Web Application Security Testing

Directory