Taint Tracking Through UTF Extension

Transcript Taint Tracking Through UTF Extension

Taint Tracking Through UTF
Extension
by
Bože Zekan
supervised by
Dr. Mark Shtern, Dr. Vassilios Tzerpos
Computer Science and Engineering Faculty
York University
funded by
NSERC USRA Grant
Topics To Be Covered
•
•
•
•
Some threats from user input
Taint tracking
Previous work
Our work
Topics To Be Covered
Our work
• Unicode
• Implementations
• Results
The Problem We Are Addressing
• Estimated that > 80% of web services contain
security vulnerabilities 1
• Many of these (50 to 82%) are user command
injection vulnerabilities 1
[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java.
In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA.
ACM 978-1-60558-789-9/09/11
Our Goal
Reduce security vulnerabilities that may occur
when dealing with user input
User input:
- input from an actual physical person
- input from another program, file, database, etc
OR
- any data that is not a literal constant in our
program or has not been generated by the
manipulation of literal constants in our program
Some User Command Injection
Threats:
•
•
•
•
SQL injection
Cross-site scripting (XSS)
Path traversal
Shell injection attacks, http response
splitting, ...
SQL Injection
query = "SELECT * FROM students
WHERE name = '" + studentName + "'";
SELECT * FROM students WHERE name = 'bobby'
SQL Injection
From: Exploits of a Mom webcomic at http://xkcd.com/327/
SQL Injection
query = "SELECT * FROM students
WHERE name = '" + studentName + "'";
SELECT * FROM students WHERE name = 'bobby';
DROP TABLE students; --'
Cross-Site Scripting (XSS)
html="" + name + " " + when + " " +
comment + "";
Anonymous 0 Hours Ago Have you noticed that
Soros spelled backwards is still Soros? Coincidence, I
think not!
Cross-Site Scripting (XSS)
html="" + name + " " + when + " " +
comment + "";
Anonymous 0 Hours Ago <script>
window.location="http://www.mybadsite.com/"</script>
Path Traversal
filename = "/srv/www/users/bobby/" + filename;
filename:
/srv/www/users/bobby/myhomework1.doc
Path Traversal
filename = "/srv/www/users/bobby/" + filename;
filename:
/srv/www/users/bobby/../cse3000/tentativetestquestions.doc

/srv/www/users/cse3000/tentativetestquestions.doc
To Prevent the Propagation of
Malicious Data
Possible solution #1: Carefully parse/sanitize/analyze all data being
sent to a sensitive data sink
SELECT * FROM students WHERE name = 'bobby'
SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'
Anonymous 0 Hours Ago Have you noticed that Soros
spelled backwards is still Soros? Coincidence, I think not!
Anonymous 0 Hours Ago <script>window.location =
"http://www.mybadsite.com/"</script>
/srv/www/users/bobby/myhomework1.doc
/srv/www/users/bobby/../cse3000/tentativetestquestions.doc
... and hope that you catch everything from among all the possibly
combinations, and don't discard any valid requests
To Prevent the Propagation of
Malicious Data
Possible solution #2: Carefully parse/sanitize/analyze all user
supplied data being sent to a sensitive data sink
SELECT * FROM students WHERE name = 'bobby'
SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --‘
Anonymous 0 Hours Ago Have you noticed that Soros
spelled backwards is still Soros? Coincidence, I think not!
Anonymous 0 Hours Ago <script>window.location =
"http://www.mybadsite.com/"</script>
/srv/www/users/bobby/myhomework1.doc
/srv/www/users/bobby/../cse3000/tentativetestquestions.doc
... and hope that you catch everything from among all the possibly
combinations, and don't discard any valid requests
Taint Tracking Makes Possible
Solution 2
Taint tracking consists of three main steps:
1. Identifying untrusted input at the point that it enters the program and
marking that it is untrusted (i.e., tainted).
2. Propagating the taint information
At each subsequent computation, mark as tainted all data that is
derived from an untrusted source.
3. Checking all data going into sensitive data sinks (e.g., a database,
or output response, or file)
Use the taint information to identify potential attacks.
Taint Tracking
Taint tracking comes in two possible flavours:
1. String level
– mark the entire string as tainted
2. Character level
- mark individual characters as tainted
- allows for finer granularity
How Can Character Level
Tainting Be Achieved?
One method, by Chin and Wagner, of USC Berkley 1
Expand the structure of the Java String class to
include a boolean array which stores the taint status
for each character in the string.
[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java.
In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA.
ACM 978-1-60558-789-9/09/11
The Chin and Wagner method
Their achievement:
Implementing a solution which minimizes the need to rewrite
existing application code while transparently decreasing the
vulnerability of applications to threats tracking
Their shortcomings:
•
•
•
Specific to Java
Increases the memory required to store a string in Java
The taint status of the java char primitive cannot be determined
•
•
Not readily adapted to other programming languages
Their taint information cannot propagate onwards to a database, or an
application, script, or procedure running in another programming language.
How can character level tainting
be achieved?
Our method:
Expand Unicode to include tainted characters
Our achievements:
· Implement a solution which minimizes the need to
rewrite existing application source code while
transparently decreasing the vulnerability of
applications to threats.
· Is not specific to Java
· Does not increase the memory required to store a
string in Java
· The taint status of the java char primitive can be
determined
· Is readily adapted to other programming languages
· The taint information can propagate onwards to a
database, or an application, script, or procedure
running in another programming language
What is Unicode?
• A scheme that assigns a codepoint to
each character in current use throughout
the world
• Has been implemented in XML, Java,
Microsoft.NET, web browsers, databases,
and modern operating systems.
Unicode
• Can accomodate 1,114,112 codepoints in
17 “planes” of 65,536 characters each
• Most of the codespace is still unassigned
• Mechanisms (ex. UTF-8, UTF-16 ...) exist
that already allow software to manipulate
and store all these codepoints even if no
characters have been assigned to them
Our Design, Part 1
Tainting & Propagating Taint
• We create a “tainted” character for every
character and assign it an unused codepoint
Ex.
Untainted
(ascii: 41hex) A
(Unicode: U+0041)
(ascii: 7Ahex) z
(Unicode: U+007A)

Tainted
A
(Unicode:U+E041)
z
(Unicode:U+E071)
• Now wherever a character’s codepoint goes,
it’s tainted or untainted status goes with it
Tainting Algorithms
• To taint a user input character x:
__codepoint(tainted x) = codepoint(x) + OFFSET
• To check if character x is tainted or not:
if (codepoint(x) is in tainted codepoint range)
___character x is tainted
//is user supplied
else
character x is untainted
• To remove taint from tainted character x:
__ codepoint(x) = codepoint(tainted x) - OFFSET
Our Design, Part 2
The Transparent Protection Framework
Consider a typical vulnerable web application:
Designing The Added Transparent
Protection Framework
Consider a less vulnerable web application:
• User’s OS has fonts which incorporate tainted characters
• Request Intercept Wrapper uses custom taint aware
classes/functions and is generic for a given technology
• Application is on a server w/taint awareness built into its
library functions
• Database Driver Intercept Wrapper uses custom taint aware
classes/functions specific to the database to check for SQL
injection, and drop malicious queries
Implementation Details: The Font
For a final, universally adopted application:
• System fonts would be expanded to include tainted
characters, which would look identical to their
untainted counterparts
Ex. untainted ABCDE ... vs tainted ABCDE ...
For our proof of concept:
• Tainted vs untainted character appear different
– to easily distinguish them on computer screens and
in documents
Ex. untainted ABCDE ... vs tainted
...
Implementation Details: The Font
• We used Type-Light freeware to modify
Window's Courier New font
- installed it by dragging out the original ttf file from
the Fonts directory, and dragging in our new ttf file
Implementation Details:
The Application
• Has no knowledge of taint
• Counts the number of visits of this user
• 1st query to db checks if user’s name is in the db.
If no, then insert name into db and sets visits count to 1
If yes, then increment visits count by 1 in the db
• 2nd query to db outputs the number of visits for the user‘s
_name from the db’s record
Implementation Details:
The Transparent Protection Framework
We implemented our framework on our typical web
application in four different technologies:
1. PHP/Mysql on Apache (under Windows XP)
2. PHP/DB2 on Apache (under Linux)
3. Java Servlet/DB2 on Tomcat7 (under Linux)
4. PHP on Apache (under Linux) calling Java Servlet/DB2 ---on Tomcat7 (under Linux)
To do this we set the UTF-8 or Unicode encoding option
everywhere it was available, and Courier New as the
selected font wherever possible.
Implementation Details:
The Transparent Protection Framework
Implementation Details:
The Form Page
Implementation Details:
The Transparent Protection Framework
Implementation Details:
The Request Intercept Wrapper
• Two versions were used:
1. PHP version which uses cURL to interact with the
application
2. Java Servlet version which uses a connection to interact
with the application
• Both versions handled both the post and get
requests.
• Browser only sees wrapper's url, never the
application page's url
• Both will work with any form, no matter the
combinations of controls
Implementation Details:
The Transparent Protection Framework
Implementation Details:
PHP Application & Db Driver Intercept
• Four applications exist
- essentially the same code with minor variations
• Two Database Driver Intecept Wrappers
exist
- essentially the same code with minor variations
- they are php include files
- each file has taint aware functions that wrap the
_query and fetch array functions of their respective
_databases
Implementation Results:
PHP Application & Db Driver Intercept
• Was not totally transparent
- application needed modification to specify the
include files, and rename two functions
• But we did successfully:
- propagate taint from user input all the way back
to the user output
- transparently detect and stop SQL injection
- show our method work on different databases and
different operating systems
- produce an easy to implement solution to increase
the security of legacy programs
Implementation Results:
PHP Application & Db Driver Intercept
Implementation Results:
PHP Application & Db Driver Intercept
Implementation Results:
PHP Application & Db Driver Intercept
Implementation Details:
Java Application
• One application, reachable in two ways
• Has modified String & Character classes that will
not break application at ("A").equals(" ") or
('A').equals(' ')
Implementation Details:
Java DB2 Database Intercept Wrapper
• Is a collection of custom taint aware classes
• The original ibm.db2.jdbc.app.DB2Driver class is
wrapped with our taint aware Db2DriverIntercept
class
• We then drill down and also wrap the Connection,
PreparedStatement, and ResultSet interfaces and
augment their existing methods to provide
transparent SQL injection protection
Implementation Results:
Java Application & Db Driver Intercept
• Was not totally transparent
- application needs to call our driver instead of the
IBM’s database driver
• But we additionally showed that our
character level taint method could:
- work on different programming languages (php
and java) and paradigms (procedural and OOP)
- propagate between different languages and
different servers
- could be handled transparently by modifying Java’s
String and Character class operations
Application Breaks & Work Arounds
• Java: the char is a primitive
if ('A'==' ') … is as far as we can keep taint
information accurate  Thereafter, taint
information is lost  no further propagation
- if allowed to alter source code then replace
('A'==' ')with taint aware custom method
('A'.equals(' '))to allow taint to propagate
even further within an application.
Application Breaks & Work Arounds
• php: strings are considered primitive
if ("AB"==" ") … is as far as we can keep taint
information accurate  Thereafter, taint
information is lost  no further propagation
- if allowed to alter source code then replace
("AB"==" ") with taint aware custom method
(("AB".equals(" "))to allow taint to
propagate even further within an application.
NB! If our method were to be adopted universally, the
above could be overcome by modifying the JVM or PHP
engine
Other Possible Uses of Our
Character Level Tainting Method
• Tainting and tracking of multiple input sources
– there are a lot of unassigned codepoints
– many tainted character sets could be created to
indicate different data sources (ex. keyboard, file,
database, remote login, ...)
• Storing tainted characters in log files to make
user input immediately recognizable
• Tainted characters can be stored in a
database & retrieved via using taint in queries
Other Possible Uses of Our
Character Level Tainting Method

Taint Tracking Through UTF Extension

Transcript Taint Tracking Through UTF Extension

Directory